[dpdk-dev] [PATCH v1 0/4] vhost: improve ready state

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state
@ 2020-06-18 16:28 Matan Azrad
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
                   ` (4 more replies)
  0 siblings, 5 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-18 16:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Dou to the issue described in "vhost: improve device ready definition"
patch here, we need to change the ready state definition in vhost device.

To support the suggestion improvment there is update for the host notifier control API.

Also need to skip access lock when vDPA device is configured.

Matan Azrad (4):
  vhost: support host notifier queue configuration
  vhost: skip access lock when vDPA is configured
  vhost: improve device ready definition
  vdpa/mlx5: support queue update

 doc/guides/rel_notes/release_20_08.rst |  2 +
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +--
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 24 -----------
 drivers/vdpa/mlx5/mlx5_vdpa.h          |  8 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 58 +++++++++++++++++++++-----
 lib/librte_vhost/rte_vdpa.h            |  8 +++-
 lib/librte_vhost/rte_vhost.h           |  2 +
 lib/librte_vhost/vhost.h               |  3 --
 lib/librte_vhost/vhost_user.c          | 75 +++++++++++++++++++++++-----------
 9 files changed, 118 insertions(+), 68 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
@ 2020-06-18 16:28 ` Matan Azrad
  2020-06-19  6:44   ` Maxime Coquelin
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-18 16:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:

The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.

A `qid` parameter was added to the API arguments list.

Setting the parameter to the value VHOST_QUEUE_ALL will configure the
host notifier to all the device queues as done before this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/rel_notes/release_20_08.rst |  2 ++
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
 drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
 lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
 lib/librte_vhost/rte_vhost.h           |  2 ++
 lib/librte_vhost/vhost.h               |  3 ---
 lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
 7 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index ba16d3b..9732959 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -111,6 +111,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
+  queue and not per device, a qid parameter was added to the arguments list.
 
 ABI Changes
 -----------
diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
index ec97178..336837a 100644
--- a/drivers/vdpa/ifc/ifcvf_vdpa.c
+++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
@@ -839,7 +839,7 @@ struct internal_list {
 	vdpa_ifcvf_stop(internal);
 	vdpa_disable_vfio_intr(internal);
 
-	ret = rte_vhost_host_notifier_ctrl(vid, false);
+	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
 	if (ret && ret != -ENOTSUP)
 		goto error;
 
@@ -858,7 +858,7 @@ struct internal_list {
 	if (ret)
 		goto stop_vf;
 
-	rte_vhost_host_notifier_ctrl(vid, true);
+	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
 
 	internal->sw_fallback_running = true;
 
@@ -893,7 +893,7 @@ struct internal_list {
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
-	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
 		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
 
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 9e758b6..8ea1300 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -147,7 +147,8 @@
 	int ret;
 
 	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
+		ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
+						   false);
 		if (ret != 0) {
 			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
 				"destroyed for device %d: %d.", priv->vid, ret);
@@ -155,7 +156,7 @@
 		}
 		priv->direct_notifier = 0;
 	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
+	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL, true);
 	if (ret != 0)
 		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
 			" device %d: %d.", priv->vid, ret);
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index ecb3d91..2db536c 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -202,22 +202,26 @@ struct rte_vdpa_device *
 int
 rte_vdpa_get_device_num(void);
 
+#define VHOST_QUEUE_ALL VHOST_MAX_VRING
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
- * Enable/Disable host notifier mapping for a vdpa port.
+ * Enable/Disable host notifier mapping for a vdpa queue.
  *
  * @param vid
  *  vhost device id
  * @param enable
  *  true for host notifier map, false for host notifier unmap
+ * @param qid
+ *  vhost queue id, VHOST_QUEUE_ALL to configure all the device queues
  * @return
  *  0 on success, -1 on failure
  */
 __rte_experimental
 int
-rte_vhost_host_notifier_ctrl(int vid, bool enable);
+rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
 
 /**
  * @warning
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 329ed8a..14bf7c2 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -107,6 +107,8 @@
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 #endif
 
+#define VHOST_MAX_VRING			0x100
+#define VHOST_MAX_QUEUE_PAIRS		0x80
 
 /**
  * Information relating to memory regions including offsets to
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 17f1e9a..28b991d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -202,9 +202,6 @@ struct vhost_virtqueue {
 	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
 } __rte_cache_aligned;
 
-#define VHOST_MAX_VRING			0x100
-#define VHOST_MAX_QUEUE_PAIRS		0x80
-
 /* Declare IOMMU related bits for older kernels */
 #ifndef VIRTIO_F_IOMMU_PLATFORM
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 84bebad..cddfa4b 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2960,13 +2960,13 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int rte_vhost_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
 	int vfio_device_fd, did, ret = 0;
 	uint64_t offset, size;
-	unsigned int i;
+	unsigned int i, q_start, q_last;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -2990,6 +2990,16 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 	if (!vdpa_dev)
 		return -ENODEV;
 
+	if (qid == VHOST_QUEUE_ALL) {
+		q_start = 0;
+		q_last = dev->nr_vring - 1;
+	} else {
+		if (qid >= dev->nr_vring)
+			return -EINVAL;
+		q_start = qid;
+		q_last = qid;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_vfio_device_fd, -ENOTSUP);
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_notify_area, -ENOTSUP);
 
@@ -2998,7 +3008,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		return -ENOTSUP;
 
 	if (enable) {
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			if (vdpa_dev->ops->get_notify_area(vid, i, &offset,
 					&size) < 0) {
 				ret = -ENOTSUP;
@@ -3013,7 +3023,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		}
 	} else {
 disable:
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			vhost_user_slave_set_vring_host_notifier(dev, i, -1,
 					0, 0);
 		}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured
  2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
@ 2020-06-18 16:28 ` Matan Azrad
  2020-06-19  6:49   ` Maxime Coquelin
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-18 16:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

No need to take access lock in the vhost-user massage handler when
vDPA driver controls all the data-path of the vhost device.

It allows the vDPA set_vring_state operation callback to configure
guest notifications.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_vhost/vhost_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index cddfa4b..b0849b9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2699,8 +2699,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	case VHOST_USER_SEND_RARP:
 	case VHOST_USER_NET_SET_MTU:
 	case VHOST_USER_SET_SLAVE_REQ_FD:
-		vhost_user_lock_all_queue_pairs(dev);
-		unlock_required = 1;
+		if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
+			vhost_user_lock_all_queue_pairs(dev);
+			unlock_required = 1;
+		}
 		break;
 	default:
 		break;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
@ 2020-06-18 16:28 ` Matan Azrad
  2020-06-19  7:41   ` Maxime Coquelin
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
  4 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-18 16:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Some guest drivers may not configure disabled virtio queues.

In this case, the vhost management never triggers the vDPA device
configuration because it waits to the device to be ready.

The current ready state means that all the virtio queues should be
configured regardless the enablement status.

In order to support this case, this patch changes the ready state:
The device is ready when at least 1 queue pair is configured and
enabled.

So, now, the vDPA driver will be configured when the first queue pair is
configured and enabled.

Also the queue state operation is change to the next rules:
	1. queue becomes ready (enabled and fully configured) -
		set_vring_state(enabled).
	2. queue becomes not ready - set_vring_state(disabled).
	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage was
		handled - set_vring_state(enabled).

The parallel operations for the application are adjusted too.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_vhost/vhost_user.c | 51 ++++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b0849b9..cfd5f27 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1295,7 +1295,7 @@
 {
 	bool rings_ok;
 
-	if (!vq)
+	if (!vq || !vq->enabled)
 		return false;
 
 	if (vq_is_packed(dev))
@@ -1309,24 +1309,27 @@
 	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
 }
 
+#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
+
 static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *vq;
 	uint32_t i;
 
-	if (dev->nr_vring == 0)
+	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
 		return 0;
 
-	for (i = 0; i < dev->nr_vring; i++) {
+	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
 		vq = dev->virtqueue[i];
 
 		if (!vq_is_ready(dev, vq))
 			return 0;
 	}
 
-	VHOST_LOG_CONFIG(INFO,
-		"virtio is now ready for processing.\n");
+	if (!(dev->flags & VIRTIO_DEV_READY))
+		VHOST_LOG_CONFIG(INFO,
+			"virtio is now ready for processing.\n");
 	return 1;
 }
 
@@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
 	int index = (int)msg->payload.state.index;
-	struct rte_vdpa_device *vdpa_dev;
-	int did = -1;
 
 	if (validate_msg_fds(msg, 0) != 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
@@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, index);
 
-	did = dev->vdpa_dev_id;
-	vdpa_dev = rte_vdpa_get_device(did);
-	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
-		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
-
-	if (dev->notify_ops->vring_state_changed)
-		dev->notify_ops->vring_state_changed(dev->vid,
-				index, enable);
-
 	/* On disable, rings have to be stopped being processed. */
 	if (!enable && dev->dequeue_zero_copy)
 		drain_zmbuf_list(dev->virtqueue[index]);
@@ -2622,11 +2614,13 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	struct virtio_net *dev;
 	struct VhostUserMsg msg;
 	struct rte_vdpa_device *vdpa_dev;
+	bool ready[VHOST_MAX_VRING];
 	int did = -1;
 	int ret;
 	int unlock_required = 0;
 	bool handled;
 	int request;
+	uint32_t i;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -2668,6 +2662,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 		VHOST_LOG_CONFIG(DEBUG, "External request %d\n", request);
 	}
 
+	/* Save ready status for all the VQs before message handle. */
+	for (i = 0; i < VHOST_MAX_VRING; i++)
+		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
+
 	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
 	if (ret < 0) {
 		VHOST_LOG_CONFIG(ERR,
@@ -2802,6 +2800,25 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 		return -1;
 	}
 
+	did = dev->vdpa_dev_id;
+	vdpa_dev = rte_vdpa_get_device(did);
+	/* Update ready status. */
+	for (i = 0; i < VHOST_MAX_VRING; i++) {
+		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
+
+		if ((cur_ready && request == VHOST_USER_SET_VRING_ENABLE &&
+				i == msg.payload.state.index) ||
+				cur_ready != ready[i]) {
+			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
+				vdpa_dev->ops->set_vring_state(dev->vid, i,
+								(int)cur_ready);
+
+			if (dev->notify_ops->vring_state_changed)
+				dev->notify_ops->vring_state_changed(dev->vid,
+							i, (int)cur_ready);
+		}
+	}
+
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
 		dev->flags |= VIRTIO_DEV_READY;
 
@@ -2816,8 +2833,6 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 		}
 	}
 
-	did = dev->vdpa_dev_id;
-	vdpa_dev = rte_vdpa_get_device(did);
 	if (vdpa_dev && virtio_is_ready(dev) &&
 			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
 			msg.request.master == VHOST_USER_SET_VRING_CALL) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update
  2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
                   ` (2 preceding siblings ...)
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
@ 2020-06-18 16:28 ` Matan Azrad
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
  4 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-18 16:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Last changes in vDPA device management by vhost library may cause queue
ready state update after the device configuration.

So, there is chance that some queue configuration information will be
known only after the device was configured.

Add support to reconfigure a queue after the device configuration
according to the queue state update and the configuration changes.

Adjust the host notifier and the guest notification configuration to be
per queue and to be applied in the enablement process.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 25 ----------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 ++++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 58 ++++++++++++++++++++++++++++++-------
 3 files changed, 54 insertions(+), 37 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 8ea1300..0ef9e85 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -142,30 +142,6 @@
 }
 
 static int
-mlx5_vdpa_direct_db_prepare(struct mlx5_vdpa_priv *priv)
-{
-	int ret;
-
-	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
-						   false);
-		if (ret != 0) {
-			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
-				"destroyed for device %d: %d.", priv->vid, ret);
-			return -1;
-		}
-		priv->direct_notifier = 0;
-	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL, true);
-	if (ret != 0)
-		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
-			" device %d: %d.", priv->vid, ret);
-	else
-		priv->direct_notifier = 1;
-	return 0;
-}
-
-static int
 mlx5_vdpa_features_set(int vid)
 {
 	int did = rte_vhost_get_vdpa_device_id(vid);
@@ -330,7 +306,6 @@
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %d.", did);
 	if (mlx5_vdpa_pd_create(priv) || mlx5_vdpa_mem_register(priv) ||
-	    mlx5_vdpa_direct_db_prepare(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 28ec0be..0b90900 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -70,11 +70,18 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+enum {
+	MLX5_VDPA_NOTIFIER_STATE_DISABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ENABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ERR
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	uint8_t notifier_state;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -103,7 +110,6 @@ struct mlx5_vdpa_steer {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	uint8_t configured;
-	uint8_t direct_notifier; /* Whether direct notifier is on or off. */
 	uint64_t last_traffic_tic;
 	pthread_t timer_tid;
 	pthread_mutex_t timer_lock;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 4b4d019..30d45d4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -36,6 +36,17 @@
 		break;
 	} while (1);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
+		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
+			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
+		else
+			virtq->notifier_state =
+					       MLX5_VDPA_NOTIFIER_STATE_ENABLED;
+		DRV_LOG(INFO, "Virtq %u notifier state is %s.", virtq->index,
+			virtq->notifier_state ==
+				MLX5_VDPA_NOTIFIER_STATE_ENABLED ? "enabled" :
+								    "disabled");
+	}
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
@@ -79,6 +90,7 @@
 	memset(&virtq->reset, 0, sizeof(virtq->reset));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
 
@@ -289,6 +301,7 @@
 	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->priv = priv;
@@ -297,10 +310,6 @@
 	virtq->intr_handle.fd = vq.kickfd;
 	if (virtq->intr_handle.fd == -1) {
 		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-		if (!priv->direct_notifier) {
-			DRV_LOG(ERR, "Virtq %d cannot be notified.", index);
-			goto error;
-		}
 	} else {
 		virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		if (rte_intr_callback_register(&virtq->intr_handle,
@@ -418,18 +427,35 @@
 		goto error;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		claim_zero(rte_vhost_enable_guest_notification(priv->vid, i,
-							       1));
-		if (mlx5_vdpa_virtq_setup(priv, i))
+	for (i = 0; i < nr_vring; i++)
+		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
 			goto error;
-	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
 	return -1;
 }
 
+static int
+mlx5_vdpa_virtq_is_modified(struct mlx5_vdpa_priv *priv,
+			    struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	int ret = rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq);
+
+	if (ret)
+		return -1;
+	if (vq.size != virtq->vq_size || vq.kickfd != virtq->intr_handle.fd)
+		return 1;
+	if (virtq->eqp.cq.cq) {
+		if (vq.callfd != virtq->eqp.cq.callfd)
+			return 1;
+	} else if (vq.callfd != -1) {
+		return 1;
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 {
@@ -438,12 +464,22 @@
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (virtq->enable == !!enable)
-		return 0;
 	if (!priv->configured) {
 		virtq->enable = !!enable;
 		return 0;
 	}
+	if (virtq->enable == !!enable) {
+		if (!enable)
+			return 0;
+		ret = mlx5_vdpa_virtq_is_modified(priv, virtq);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Virtq %d modify check failed.", index);
+			return -1;
+		}
+		if (ret == 0)
+			return 0;
+		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
+	}
 	if (enable) {
 		/* Configuration might have been updated - reconfigure virtq. */
 		if (virtq->virtq) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
@ 2020-06-19  6:44   ` Maxime Coquelin
  2020-06-19 13:28     ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19  6:44 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/18/20 6:28 PM, Matan Azrad wrote:
> As an arrangement to per queue operations in the vDPA device it is
> needed to change the next experimental API:
> 
> The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
> instead of per device.
> 
> A `qid` parameter was added to the API arguments list.
> 
> Setting the parameter to the value VHOST_QUEUE_ALL will configure the
> host notifier to all the device queues as done before this patch.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  doc/guides/rel_notes/release_20_08.rst |  2 ++
>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
>  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
>  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
>  lib/librte_vhost/rte_vhost.h           |  2 ++
>  lib/librte_vhost/vhost.h               |  3 ---
>  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
>  7 files changed, 30 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index ba16d3b..9732959 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -111,6 +111,8 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
>  
> +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
> +  queue and not per device, a qid parameter was added to the arguments list.
>  
>  ABI Changes
>  -----------
> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
> index ec97178..336837a 100644
> --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> @@ -839,7 +839,7 @@ struct internal_list {
>  	vdpa_ifcvf_stop(internal);
>  	vdpa_disable_vfio_intr(internal);
>  
> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
> +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
>  	if (ret && ret != -ENOTSUP)
>  		goto error;
>  
> @@ -858,7 +858,7 @@ struct internal_list {
>  	if (ret)
>  		goto stop_vf;
>  
> -	rte_vhost_host_notifier_ctrl(vid, true);
> +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
>  
>  	internal->sw_fallback_running = true;
>  
> @@ -893,7 +893,7 @@ struct internal_list {
>  	rte_atomic32_set(&internal->dev_attached, 1);
>  	update_datapath(internal);
>  
> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
>  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>  
>  	return 0;
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 9e758b6..8ea1300 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -147,7 +147,8 @@
>  	int ret;
>  
>  	if (priv->direct_notifier) {
> -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
> +		ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
> +						   false);
>  		if (ret != 0) {
>  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
>  				"destroyed for device %d: %d.", priv->vid, ret);
> @@ -155,7 +156,7 @@
>  		}
>  		priv->direct_notifier = 0;
>  	}
> -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
> +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL, true);
>  	if (ret != 0)
>  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
>  			" device %d: %d.", priv->vid, ret);
> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> index ecb3d91..2db536c 100644
> --- a/lib/librte_vhost/rte_vdpa.h
> +++ b/lib/librte_vhost/rte_vdpa.h
> @@ -202,22 +202,26 @@ struct rte_vdpa_device *
>  int
>  rte_vdpa_get_device_num(void);
>  
> +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
>   *
> - * Enable/Disable host notifier mapping for a vdpa port.
> + * Enable/Disable host notifier mapping for a vdpa queue.
>   *
>   * @param vid
>   *  vhost device id
>   * @param enable
>   *  true for host notifier map, false for host notifier unmap
> + * @param qid
> + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device queues
I would prefer two APIs that passing a special ID that means all queues:

rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
rte_vhost_host_notifier_ctrl_all(int vid, bool enable);

I think it is clearer for the user of the API.
Or if you think an extra API is overkill, just let the driver loop on
all the queues.

>   * @return
>   *  0 on success, -1 on failure
>   */
>  __rte_experimental
>  int
> -rte_vhost_host_notifier_ctrl(int vid, bool enable);
> +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>  
>  /**


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
@ 2020-06-19  6:49   ` Maxime Coquelin
  0 siblings, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19  6:49 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/18/20 6:28 PM, Matan Azrad wrote:
> No need to take access lock in the vhost-user massage handler when

s/massage/message/

> vDPA driver controls all the data-path of the vhost device.
> 
> It allows the vDPA set_vring_state operation callback to configure
> guest notifications.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index cddfa4b..b0849b9 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -2699,8 +2699,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  	case VHOST_USER_SEND_RARP:
>  	case VHOST_USER_NET_SET_MTU:
>  	case VHOST_USER_SET_SLAVE_REQ_FD:
> -		vhost_user_lock_all_queue_pairs(dev);
> -		unlock_required = 1;
> +		if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> +			vhost_user_lock_all_queue_pairs(dev);
> +			unlock_required = 1;
> +		}
>  		break;
>  	default:
>  		break;
> 

Makes sense:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
@ 2020-06-19  7:41   ` Maxime Coquelin
  2020-06-19 12:04     ` Maxime Coquelin
  2020-06-19 13:11     ` Matan Azrad
  0 siblings, 2 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19  7:41 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/18/20 6:28 PM, Matan Azrad wrote:
> Some guest drivers may not configure disabled virtio queues.
> 
> In this case, the vhost management never triggers the vDPA device
> configuration because it waits to the device to be ready.

This is not vDPA-only, even with SW datapath the application's
new_device callback never gets called.

> The current ready state means that all the virtio queues should be
> configured regardless the enablement status.
> 
> In order to support this case, this patch changes the ready state:
> The device is ready when at least 1 queue pair is configured and
> enabled.
> 
> So, now, the vDPA driver will be configured when the first queue pair is
> configured and enabled.
> 
> Also the queue state operation is change to the next rules:
> 	1. queue becomes ready (enabled and fully configured) -
> 		set_vring_state(enabled).
> 	2. queue becomes not ready - set_vring_state(disabled).
> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage was
> 		handled - set_vring_state(enabled).
> 
> The parallel operations for the application are adjusted too.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 51 ++++++++++++++++++++++++++++---------------
>  1 file changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index b0849b9..cfd5f27 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1295,7 +1295,7 @@
>  {
>  	bool rings_ok;
>  
> -	if (!vq)
> +	if (!vq || !vq->enabled)
>  		return false;
>  
>  	if (vq_is_packed(dev))
> @@ -1309,24 +1309,27 @@
>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
>  }
>  
> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
> +
>  static int
>  virtio_is_ready(struct virtio_net *dev)
>  {
>  	struct vhost_virtqueue *vq;
>  	uint32_t i;
>  
> -	if (dev->nr_vring == 0)
> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>  		return 0;
>  
> -	for (i = 0; i < dev->nr_vring; i++) {
> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>  		vq = dev->virtqueue[i];
>  
>  		if (!vq_is_ready(dev, vq))
>  			return 0;
>  	}
>  
> -	VHOST_LOG_CONFIG(INFO,
> -		"virtio is now ready for processing.\n");
> +	if (!(dev->flags & VIRTIO_DEV_READY))
> +		VHOST_LOG_CONFIG(INFO,
> +			"virtio is now ready for processing.\n");
>  	return 1;
>  }
>  
> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
>  	struct virtio_net *dev = *pdev;
>  	int enable = (int)msg->payload.state.num;
>  	int index = (int)msg->payload.state.index;
> -	struct rte_vdpa_device *vdpa_dev;
> -	int did = -1;
>  
>  	if (validate_msg_fds(msg, 0) != 0)
>  		return RTE_VHOST_MSG_RESULT_ERR;
> @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
>  		"set queue enable: %d to qp idx: %d\n",
>  		enable, index);
>  
> -	did = dev->vdpa_dev_id;
> -	vdpa_dev = rte_vdpa_get_device(did);
> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> -
> -	if (dev->notify_ops->vring_state_changed)
> -		dev->notify_ops->vring_state_changed(dev->vid,
> -				index, enable);
> -
>  	/* On disable, rings have to be stopped being processed. */
>  	if (!enable && dev->dequeue_zero_copy)
>  		drain_zmbuf_list(dev->virtqueue[index]);
> @@ -2622,11 +2614,13 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  	struct virtio_net *dev;
>  	struct VhostUserMsg msg;
>  	struct rte_vdpa_device *vdpa_dev;
> +	bool ready[VHOST_MAX_VRING];
>  	int did = -1;
>  	int ret;
>  	int unlock_required = 0;
>  	bool handled;
>  	int request;
> +	uint32_t i;
>  
>  	dev = get_device(vid);
>  	if (dev == NULL)
> @@ -2668,6 +2662,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n", request);
>  	}
>  
> +	/* Save ready status for all the VQs before message handle. */
> +	for (i = 0; i < VHOST_MAX_VRING; i++)
> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
> +

This big array can be avoided if you save the ready status in the
virtqueue once message have been handled.

>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
>  	if (ret < 0) {
>  		VHOST_LOG_CONFIG(ERR,
> @@ -2802,6 +2800,25 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  		return -1;
>  	}
>  
> +	did = dev->vdpa_dev_id;
> +	vdpa_dev = rte_vdpa_get_device(did);
> +	/* Update ready status. */
> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
> +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
> +
> +		if ((cur_ready && request == VHOST_USER_SET_VRING_ENABLE &&
> +				i == msg.payload.state.index) ||

Couldn't we remove above condition? Aren't the callbacks already called
in the set_vring_enable handler?

> +				cur_ready != ready[i]) {
> +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> +				vdpa_dev->ops->set_vring_state(dev->vid, i,
> +								(int)cur_ready);
> +
> +			if (dev->notify_ops->vring_state_changed)
> +				dev->notify_ops->vring_state_changed(dev->vid,
> +							i, (int)cur_ready);
> +		}
> +	}

I think we should move this into a dedicated function, which we would
call in every message handler that can modify the ready state.

Doing so, we would not have to assume the master sent us disable request
for the queue before, ans also would have proper synchronization if the
request uses reply-ack feature as it could assume the backend is no more
processing the ring once reply-ack is received.

>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
>  		dev->flags |= VIRTIO_DEV_READY;
>  
> @@ -2816,8 +2833,6 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>  		}
>  	}
>  
> -	did = dev->vdpa_dev_id;
> -	vdpa_dev = rte_vdpa_get_device(did);
>  	if (vdpa_dev && virtio_is_ready(dev) &&
>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
>  			msg.request.master == VHOST_USER_SET_VRING_CALL) {

Shouldn't check on SET_VRING_CALL above be removed?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-19  7:41   ` Maxime Coquelin
@ 2020-06-19 12:04     ` Maxime Coquelin
  2020-06-19 13:11     ` Matan Azrad
  1 sibling, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19 12:04 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev, Adrian Moreno

Hi Matan,

On 6/19/20 9:41 AM, Maxime Coquelin wrote:
> 
> 
> On 6/18/20 6:28 PM, Matan Azrad wrote:
>> Some guest drivers may not configure disabled virtio queues.
>>
>> In this case, the vhost management never triggers the vDPA device
>> configuration because it waits to the device to be ready.
> 
> This is not vDPA-only, even with SW datapath the application's
> new_device callback never gets called.
> 
>> The current ready state means that all the virtio queues should be
>> configured regardless the enablement status.
>>
>> In order to support this case, this patch changes the ready state:
>> The device is ready when at least 1 queue pair is configured and
>> enabled.
>>
>> So, now, the vDPA driver will be configured when the first queue pair is
>> configured and enabled.
>>
>> Also the queue state operation is change to the next rules:
>> 	1. queue becomes ready (enabled and fully configured) -
>> 		set_vring_state(enabled).
>> 	2. queue becomes not ready - set_vring_state(disabled).
>> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage was
>> 		handled - set_vring_state(enabled).
>>
>> The parallel operations for the application are adjusted too.
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>> ---
>>  lib/librte_vhost/vhost_user.c | 51 ++++++++++++++++++++++++++++---------------
>>  1 file changed, 33 insertions(+), 18 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index b0849b9..cfd5f27 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -1295,7 +1295,7 @@
>>  {
>>  	bool rings_ok;
>>  
>> -	if (!vq)
>> +	if (!vq || !vq->enabled)
>>  		return false;
>>  
>>  	if (vq_is_packed(dev))
>> @@ -1309,24 +1309,27 @@
>>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
>>  }
>>  
>> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
>> +
>>  static int
>>  virtio_is_ready(struct virtio_net *dev)
>>  {
>>  	struct vhost_virtqueue *vq;
>>  	uint32_t i;
>>  
>> -	if (dev->nr_vring == 0)
>> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>>  		return 0;
>>  
>> -	for (i = 0; i < dev->nr_vring; i++) {
>> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>>  		vq = dev->virtqueue[i];
>>  
>>  		if (!vq_is_ready(dev, vq))
>>  			return 0;
>>  	}
>>  
>> -	VHOST_LOG_CONFIG(INFO,
>> -		"virtio is now ready for processing.\n");
>> +	if (!(dev->flags & VIRTIO_DEV_READY))
>> +		VHOST_LOG_CONFIG(INFO,
>> +			"virtio is now ready for processing.\n");
>>  	return 1;
>>  }
>>  
>> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
>>  	struct virtio_net *dev = *pdev;
>>  	int enable = (int)msg->payload.state.num;
>>  	int index = (int)msg->payload.state.index;
>> -	struct rte_vdpa_device *vdpa_dev;
>> -	int did = -1;
>>  
>>  	if (validate_msg_fds(msg, 0) != 0)
>>  		return RTE_VHOST_MSG_RESULT_ERR;
>> @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
>>  		"set queue enable: %d to qp idx: %d\n",
>>  		enable, index);
>>  
>> -	did = dev->vdpa_dev_id;
>> -	vdpa_dev = rte_vdpa_get_device(did);
>> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
>> -
>> -	if (dev->notify_ops->vring_state_changed)
>> -		dev->notify_ops->vring_state_changed(dev->vid,
>> -				index, enable);
>> -
>>  	/* On disable, rings have to be stopped being processed. */
>>  	if (!enable && dev->dequeue_zero_copy)
>>  		drain_zmbuf_list(dev->virtqueue[index]);
>> @@ -2622,11 +2614,13 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>>  	struct virtio_net *dev;
>>  	struct VhostUserMsg msg;
>>  	struct rte_vdpa_device *vdpa_dev;
>> +	bool ready[VHOST_MAX_VRING];
>>  	int did = -1;
>>  	int ret;
>>  	int unlock_required = 0;
>>  	bool handled;
>>  	int request;
>> +	uint32_t i;
>>  
>>  	dev = get_device(vid);
>>  	if (dev == NULL)
>> @@ -2668,6 +2662,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n", request);
>>  	}
>>  
>> +	/* Save ready status for all the VQs before message handle. */
>> +	for (i = 0; i < VHOST_MAX_VRING; i++)
>> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
>> +
> 
> This big array can be avoided if you save the ready status in the
> virtqueue once message have been handled.
> 
>>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
>>  	if (ret < 0) {
>>  		VHOST_LOG_CONFIG(ERR,
>> @@ -2802,6 +2800,25 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>>  		return -1;
>>  	}
>>  
>> +	did = dev->vdpa_dev_id;
>> +	vdpa_dev = rte_vdpa_get_device(did);
>> +	/* Update ready status. */
>> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
>> +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
>> +
>> +		if ((cur_ready && request == VHOST_USER_SET_VRING_ENABLE &&
>> +				i == msg.payload.state.index) ||
> 
> Couldn't we remove above condition? Aren't the callbacks already called
> in the set_vring_enable handler?
> 
>> +				cur_ready != ready[i]) {
>> +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>> +				vdpa_dev->ops->set_vring_state(dev->vid, i,
>> +								(int)cur_ready);
>> +
>> +			if (dev->notify_ops->vring_state_changed)
>> +				dev->notify_ops->vring_state_changed(dev->vid,
>> +							i, (int)cur_ready);
>> +		}
>> +	}
> 
> I think we should move this into a dedicated function, which we would
> call in every message handler that can modify the ready state.
> 
> Doing so, we would not have to assume the master sent us disable request
> for the queue before, ans also would have proper synchronization if the
> request uses reply-ack feature as it could assume the backend is no more
> processing the ring once reply-ack is received.
> 
>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
>>  		dev->flags |= VIRTIO_DEV_READY;
>>  
>> @@ -2816,8 +2833,6 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
>>  		}
>>  	}
>>  
>> -	did = dev->vdpa_dev_id;
>> -	vdpa_dev = rte_vdpa_get_device(did);
>>  	if (vdpa_dev && virtio_is_ready(dev) &&
>>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
>>  			msg.request.master == VHOST_USER_SET_VRING_CALL) {
> 
> Shouldn't check on SET_VRING_CALL above be removed?
> 

Thinking at it again, I think ready state should include whether or not
the queue is enabled. And as soon as a request impacting ring addresses
or call or kick FDs is handled, we should reset the modified value and
notify for state change in the imapcted queue. Then request is handled
and once the requests are handled, we can send state change updates if
any one changed.

Doing that, we don't have to assume the Vhost-user master will have send
the disable request before doing the state change. And if it did, the
'not ready' update won't be sent twice to the driver or application.

In case I am not clear enough, I have prototyped this idea (only
compile-tested). If it works for you, feel free to add it in your
series.

Thanks,
Maxime


diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index df98d15de6..48e8fcfbc0 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -150,6 +150,7 @@ struct vhost_virtqueue {
        /* Backend value to determine if device should started/stopped */
        int                     backend;
        int                     enabled;
+       bool                    ready;
        int                     access_ok;
        rte_spinlock_t          access_lock;

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ea9cd107b9..f3cda536c6 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -228,6 +228,87 @@ vhost_backend_cleanup(struct virtio_net *dev)
        dev->postcopy_listening = 0;
 }

+
+static bool
+vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+       bool rings_ok;
+
+       if (!vq)
+               return false;
+
+       if (vq_is_packed(dev))
+               rings_ok = vq->desc_packed && vq->driver_event &&
+                       vq->device_event;
+       else
+               rings_ok = vq->desc && vq->avail && vq->used;
+
+       return rings_ok &&
+              vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
+              vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
+              vq->enabled;
+}
+
+static int
+virtio_is_ready(struct virtio_net *dev)
+{
+       struct vhost_virtqueue *vq;
+       uint32_t i;
+
+       if (dev->nr_vring == 0)
+               return 0;
+
+       for (i = 0; i < 2; i++) {
+               vq = dev->virtqueue[i];
+
+               if (!vq_is_ready(dev, vq))
+                       return 0;
+       }
+
+       VHOST_LOG_CONFIG(INFO,
+               "virtio is now ready for processing.\n");
+       return 1;
+}
+
+static void
+vhost_user_update_vring_state(struct virtio_net *dev, int idx)
+{
+       struct vhost_virtqueue *vq = dev->virtqueue[idx];
+       struct rte_vdpa_device *vdpa_dev;
+       int did;
+       bool was_ready = vq->ready;
+
+       vq->ready = vq_is_ready(dev, vq);
+       if (was_ready == vq->ready)
+               return;
+
+       if (dev->notify_ops->vring_state_changed)
+               dev->notify_ops->vring_state_changed(dev->vid, idx,
vq->ready);
+
+       did = dev->vdpa_dev_id;
+       vdpa_dev = rte_vdpa_get_device(did);
+       if (vdpa_dev && vdpa_dev->ops->set_vring_state)
+               vdpa_dev->ops->set_vring_state(dev->vid, idx, vq->ready);
+}
+
+static void
+vhost_user_update_vring_state_all(struct virtio_net *dev)
+{
+       uint32_t i;
+
+       for (i = 0; i < dev->nr_vring; i++)
+               vhost_user_update_vring_state(dev, i);
+}
+
+static void
+vhost_user_invalidate_vring(struct virtio_net *dev, int index)
+{
+       struct vhost_virtqueue *vq = dev->virtqueue[index];
+
+       vring_invalidate(dev, vq);
+       vhost_user_update_vring_state(dev, index);
+}
+
 /*
  * This function just returns success at the moment unless
  * the device hasn't been initialised.
@@ -841,7 +922,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev,
struct VhostUserMsg *msg,
         */
        memcpy(&vq->ring_addrs, addr, sizeof(*addr));

-       vring_invalidate(dev, vq);
+       vhost_user_invalidate_vring(dev, msg->payload.addr.index);

        if ((vq->enabled && (dev->features &
                                (1ULL <<
VHOST_USER_F_PROTOCOL_FEATURES))) ||
@@ -1267,7 +1348,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
struct VhostUserMsg *msg,
                         * need to be translated again as virtual
addresses have
                         * changed.
                         */
-                       vring_invalidate(dev, vq);
+                       vhost_user_invalidate_vring(dev, i);

                        dev = translate_ring_addresses(dev, i);
                        if (!dev) {
@@ -1290,46 +1371,6 @@ vhost_user_set_mem_table(struct virtio_net
**pdev, struct VhostUserMsg *msg,
        return RTE_VHOST_MSG_RESULT_ERR;
 }

-static bool
-vq_is_ready(struct virtio_net *dev, struct vhost_virtqueue *vq)
-{
-       bool rings_ok;
-
-       if (!vq)
-               return false;
-
-       if (vq_is_packed(dev))
-               rings_ok = vq->desc_packed && vq->driver_event &&
-                       vq->device_event;
-       else
-               rings_ok = vq->desc && vq->avail && vq->used;
-
-       return rings_ok &&
-              vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
-              vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
-}
-
-static int
-virtio_is_ready(struct virtio_net *dev)
-{
-       struct vhost_virtqueue *vq;
-       uint32_t i;
-
-       if (dev->nr_vring == 0)
-               return 0;
-
-       for (i = 0; i < dev->nr_vring; i++) {
-               vq = dev->virtqueue[i];
-
-               if (!vq_is_ready(dev, vq))
-                       return 0;
-       }
-
-       VHOST_LOG_CONFIG(INFO,
-               "virtio is now ready for processing.\n");
-       return 1;
-}
-
 static void *
 inflight_mem_alloc(const char *name, size_t size, int *fd)
 {
@@ -1599,6 +1640,10 @@ vhost_user_set_vring_call(struct virtio_net
**pdev, struct VhostUserMsg *msg,
        if (vq->callfd >= 0)
                close(vq->callfd);

+       vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+       vhost_user_update_vring_state(dev, file.index);
+
        vq->callfd = file.fd;

        return RTE_VHOST_MSG_RESULT_OK;
@@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
**pdev, struct VhostUserMsg *msg,
         * the ring starts already enabled. Otherwise, it is enabled via
         * the SET_VRING_ENABLE message.
         */
-       if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+       if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)))
                vq->enabled = 1;
-               if (dev->notify_ops->vring_state_changed)
-                       dev->notify_ops->vring_state_changed(
-                               dev->vid, file.index, 1);
-       }

        if (vq->kickfd >= 0)
                close(vq->kickfd);
+
+       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+       vhost_user_update_vring_state(dev, file.index);
+
        vq->kickfd = file.fd;

        if (vq_is_packed(dev)) {
@@ -1953,6 +1999,10 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
        msg->size = sizeof(msg->payload.state);
        msg->fd_num = 0;

+       /*
+        * No need to call vhost_user_invalidate_vring here,
+        * device is destroyed.
+        */
        vring_invalidate(dev, vq);

        return RTE_VHOST_MSG_RESULT_REPLY;
@@ -1970,8 +2020,7 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
        struct virtio_net *dev = *pdev;
        int enable = (int)msg->payload.state.num;
        int index = (int)msg->payload.state.index;
-       struct rte_vdpa_device *vdpa_dev;
-       int did = -1;
+       struct vhost_virtqueue *vq = dev->virtqueue[index];

        if (validate_msg_fds(msg, 0) != 0)
                return RTE_VHOST_MSG_RESULT_ERR;
@@ -1980,20 +2029,13 @@ vhost_user_set_vring_enable(struct virtio_net
**pdev,
                "set queue enable: %d to qp idx: %d\n",
                enable, index);

-       did = dev->vdpa_dev_id;
-       vdpa_dev = rte_vdpa_get_device(did);
-       if (vdpa_dev && vdpa_dev->ops->set_vring_state)
-               vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
-
-       if (dev->notify_ops->vring_state_changed)
-               dev->notify_ops->vring_state_changed(dev->vid,
-                               index, enable);
-
        /* On disable, rings have to be stopped being processed. */
        if (!enable && dev->dequeue_zero_copy)
-               drain_zmbuf_list(dev->virtqueue[index]);
+               drain_zmbuf_list(vq);
+
+       vq->enabled = enable;

-       dev->virtqueue[index]->enabled = enable;
+       vhost_user_update_vring_state(dev, index);

        return RTE_VHOST_MSG_RESULT_OK;
 }
@@ -2332,7 +2374,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
struct VhostUserMsg *msg,
                                        imsg->size);

                        if (is_vring_iotlb(dev, vq, imsg))
-                               vring_invalidate(dev, vq);
+                               vhost_user_invalidate_vring(dev, i);
                }
                break;
        default:
@@ -2791,6 +2833,8 @@ vhost_user_msg_handler(int vid, int fd)
                return -1;
        }

+       vhost_user_update_vring_state_all(dev);
+
        if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
                dev->flags |= VIRTIO_DEV_READY;

@@ -2808,8 +2852,7 @@ vhost_user_msg_handler(int vid, int fd)
        did = dev->vdpa_dev_id;
        vdpa_dev = rte_vdpa_get_device(did);
        if (vdpa_dev && virtio_is_ready(dev) &&
-                       !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
-                       msg.request.master == VHOST_USER_SET_VRING_CALL) {
+                       !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
                if (vdpa_dev->ops->dev_conf)
                        vdpa_dev->ops->dev_conf(dev->vid);
                dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-19  7:41   ` Maxime Coquelin
  2020-06-19 12:04     ` Maxime Coquelin
@ 2020-06-19 13:11     ` Matan Azrad
  2020-06-19 13:54       ` Maxime Coquelin
  1 sibling, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-19 13:11 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Hi Maxime

Thanks for the fast review.
This is first version, let's review it carefully to be sure it is correct.
@Xiao Wang, it will be good to hear your idea too.
We also need to understand the effect on IFC driver/device...
Just to update that I checked this code with the mlx5 adjustments and I sent in this series.
It works well with the vDPA example application.

From: Maxime Coquelin:
> On 6/18/20 6:28 PM, Matan Azrad wrote:
> > Some guest drivers may not configure disabled virtio queues.
> >
> > In this case, the vhost management never triggers the vDPA device
> > configuration because it waits to the device to be ready.
> 
> This is not vDPA-only, even with SW datapath the application's new_device
> callback never gets called.
> 
Yes, I wrote it below, I can be more specific here too in the next version.

> > The current ready state means that all the virtio queues should be
> > configured regardless the enablement status.
> >
> > In order to support this case, this patch changes the ready state:
> > The device is ready when at least 1 queue pair is configured and
> > enabled.
> >
> > So, now, the vDPA driver will be configured when the first queue pair
> > is configured and enabled.
> >
> > Also the queue state operation is change to the next rules:
> > 	1. queue becomes ready (enabled and fully configured) -
> > 		set_vring_state(enabled).
> > 	2. queue becomes not ready - set_vring_state(disabled).
> > 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage
> was
> > 		handled - set_vring_state(enabled).
> >
> > The parallel operations for the application are adjusted too.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  lib/librte_vhost/vhost_user.c | 51
> > ++++++++++++++++++++++++++++---------------
> >  1 file changed, 33 insertions(+), 18 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user.c
> > b/lib/librte_vhost/vhost_user.c index b0849b9..cfd5f27 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -1295,7 +1295,7 @@
> >  {
> >  	bool rings_ok;
> >
> > -	if (!vq)
> > +	if (!vq || !vq->enabled)
> >  		return false;
> >
> >  	if (vq_is_packed(dev))
> > @@ -1309,24 +1309,27 @@
> >  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;  }
> >
> > +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
> > +
> >  static int
> >  virtio_is_ready(struct virtio_net *dev)  {
> >  	struct vhost_virtqueue *vq;
> >  	uint32_t i;
> >
> > -	if (dev->nr_vring == 0)
> > +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
> >  		return 0;
> >
> > -	for (i = 0; i < dev->nr_vring; i++) {
> > +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
> >  		vq = dev->virtqueue[i];
> >
> >  		if (!vq_is_ready(dev, vq))
> >  			return 0;
> >  	}
> >
> > -	VHOST_LOG_CONFIG(INFO,
> > -		"virtio is now ready for processing.\n");
> > +	if (!(dev->flags & VIRTIO_DEV_READY))
> > +		VHOST_LOG_CONFIG(INFO,
> > +			"virtio is now ready for processing.\n");
> >  	return 1;
> >  }
> >
> > @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct
> virtio_net **pdev __rte_unused,
> >  	struct virtio_net *dev = *pdev;
> >  	int enable = (int)msg->payload.state.num;
> >  	int index = (int)msg->payload.state.index;
> > -	struct rte_vdpa_device *vdpa_dev;
> > -	int did = -1;
> >
> >  	if (validate_msg_fds(msg, 0) != 0)
> >  		return RTE_VHOST_MSG_RESULT_ERR;
> > @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct
> virtio_net **pdev __rte_unused,
> >  		"set queue enable: %d to qp idx: %d\n",
> >  		enable, index);
> >
> > -	did = dev->vdpa_dev_id;
> > -	vdpa_dev = rte_vdpa_get_device(did);
> > -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> > -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> > -
> > -	if (dev->notify_ops->vring_state_changed)
> > -		dev->notify_ops->vring_state_changed(dev->vid,
> > -				index, enable);
> > -
> >  	/* On disable, rings have to be stopped being processed. */
> >  	if (!enable && dev->dequeue_zero_copy)
> >  		drain_zmbuf_list(dev->virtqueue[index]);
> > @@ -2622,11 +2614,13 @@ typedef int
> (*vhost_message_handler_t)(struct virtio_net **pdev,
> >  	struct virtio_net *dev;
> >  	struct VhostUserMsg msg;
> >  	struct rte_vdpa_device *vdpa_dev;
> > +	bool ready[VHOST_MAX_VRING];
> >  	int did = -1;
> >  	int ret;
> >  	int unlock_required = 0;
> >  	bool handled;
> >  	int request;
> > +	uint32_t i;
> >
> >  	dev = get_device(vid);
> >  	if (dev == NULL)
> > @@ -2668,6 +2662,10 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
> >  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n",
> request);
> >  	}
> >
> > +	/* Save ready status for all the VQs before message handle. */
> > +	for (i = 0; i < VHOST_MAX_VRING; i++)
> > +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
> > +
> 
> This big array can be avoided if you save the ready status in the virtqueue
> once message have been handled.

You mean you prefer to save it in virtqueue structure? Desn't it same memory ?
In any case I don't think 0x100 is so big 😊
 
> >  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
> >  	if (ret < 0) {
> >  		VHOST_LOG_CONFIG(ERR,
> > @@ -2802,6 +2800,25 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
> >  		return -1;
> >  	}
> >
> > +	did = dev->vdpa_dev_id;
> > +	vdpa_dev = rte_vdpa_get_device(did);
> > +	/* Update ready status. */
> > +	for (i = 0; i < VHOST_MAX_VRING; i++) {
> > +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
> > +
> > +		if ((cur_ready && request ==
> VHOST_USER_SET_VRING_ENABLE &&
> > +				i == msg.payload.state.index) ||
> 
> Couldn't we remove above condition? Aren't the callbacks already called in
> the set_vring_enable handler?

As we agreed in the design discussion:

" 3. Same handling of the requests, except that we won't notify the 
 vdpa driver and the application of vring state changes in the 
 VHOST_USER_SET_VRING_ENABLE handler."  

So, I removed it from the set_vring_enable handler.

Now, the ready state doesn't depend only in VHOST_USER_SET_VRING_ENABLE massage.
 
> > +				cur_ready != ready[i]) {
> > +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> > +				vdpa_dev->ops->set_vring_state(dev->vid, i,
> > +
> 	(int)cur_ready);
> > +
> > +			if (dev->notify_ops->vring_state_changed)
> > +				dev->notify_ops->vring_state_changed(dev-
> >vid,
> > +							i, (int)cur_ready);
> > +		}
> > +	}
> 
> I think we should move this into a dedicated function, which we would call in
> every message handler that can modify the ready state.
>
> Doing so, we would not have to assume the master sent us disable request
> for the queue before, ans also would have proper synchronization if the
> request uses reply-ack feature as it could assume the backend is no more
> processing the ring once reply-ack is received.

Makes sense to do it before reply-ack and to create dedicated function to it.

Doen't the vDPA conf should be called before reply-ack too to be sure queues are ready before reply?

If so, we should move also the device ready code below (maybe also vdpa conf) to this function too.
 
But maybe call it directly from this function and not from the specific massage handlers is better, something like the vhost_user_check_and_alloc_queue_pair function style.

What do you think?

> >  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
> >  		dev->flags |= VIRTIO_DEV_READY;
> >
> > @@ -2816,8 +2833,6 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
> >  		}
> >  	}
> >
> > -	did = dev->vdpa_dev_id;
> > -	vdpa_dev = rte_vdpa_get_device(did);
> >  	if (vdpa_dev && virtio_is_ready(dev) &&
> >  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)
> &&
> >  			msg.request.master ==
> VHOST_USER_SET_VRING_CALL) {
> 
> Shouldn't check on SET_VRING_CALL above be removed?

Isn't it is a workaround for something?


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-19  6:44   ` Maxime Coquelin
@ 2020-06-19 13:28     ` Matan Azrad
  2020-06-19 14:01       ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-19 13:28 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin:
> On 6/18/20 6:28 PM, Matan Azrad wrote:
> > As an arrangement to per queue operations in the vDPA device it is
> > needed to change the next experimental API:
> >
> > The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
> > instead of per device.
> >
> > A `qid` parameter was added to the API arguments list.
> >
> > Setting the parameter to the value VHOST_QUEUE_ALL will configure the
> > host notifier to all the device queues as done before this patch.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  doc/guides/rel_notes/release_20_08.rst |  2 ++
> >  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
> >  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
> >  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
> >  lib/librte_vhost/rte_vhost.h           |  2 ++
> >  lib/librte_vhost/vhost.h               |  3 ---
> >  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
> >  7 files changed, 30 insertions(+), 14 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_20_08.rst
> > b/doc/guides/rel_notes/release_20_08.rst
> > index ba16d3b..9732959 100644
> > --- a/doc/guides/rel_notes/release_20_08.rst
> > +++ b/doc/guides/rel_notes/release_20_08.rst
> > @@ -111,6 +111,8 @@ API Changes
> >     Also, make sure to start the actual text at the margin.
> >
> =========================================================
> >
> > +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to
> > +be per
> > +  queue and not per device, a qid parameter was added to the arguments
> list.
> >
> >  ABI Changes
> >  -----------
> > diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > b/drivers/vdpa/ifc/ifcvf_vdpa.c index ec97178..336837a 100644
> > --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
> > +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> > @@ -839,7 +839,7 @@ struct internal_list {
> >  	vdpa_ifcvf_stop(internal);
> >  	vdpa_disable_vfio_intr(internal);
> >
> > -	ret = rte_vhost_host_notifier_ctrl(vid, false);
> > +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
> >  	if (ret && ret != -ENOTSUP)
> >  		goto error;
> >
> > @@ -858,7 +858,7 @@ struct internal_list {
> >  	if (ret)
> >  		goto stop_vf;
> >
> > -	rte_vhost_host_notifier_ctrl(vid, true);
> > +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
> >
> >  	internal->sw_fallback_running = true;
> >
> > @@ -893,7 +893,7 @@ struct internal_list {
> >  	rte_atomic32_set(&internal->dev_attached, 1);
> >  	update_datapath(internal);
> >
> > -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> > +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
> >  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> >
> >  	return 0;
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > b/drivers/vdpa/mlx5/mlx5_vdpa.c index 9e758b6..8ea1300 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > @@ -147,7 +147,8 @@
> >  	int ret;
> >
> >  	if (priv->direct_notifier) {
> > -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
> > +		ret = rte_vhost_host_notifier_ctrl(priv->vid,
> VHOST_QUEUE_ALL,
> > +						   false);
> >  		if (ret != 0) {
> >  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
> >  				"destroyed for device %d: %d.", priv->vid,
> ret); @@ -155,7 +156,7
> > @@
> >  		}
> >  		priv->direct_notifier = 0;
> >  	}
> > -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
> > +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
> > +true);
> >  	if (ret != 0)
> >  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured
> for"
> >  			" device %d: %d.", priv->vid, ret); diff --git
> > a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index
> > ecb3d91..2db536c 100644
> > --- a/lib/librte_vhost/rte_vdpa.h
> > +++ b/lib/librte_vhost/rte_vdpa.h
> > @@ -202,22 +202,26 @@ struct rte_vdpa_device *  int
> > rte_vdpa_get_device_num(void);
> >
> > +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice
> >   *
> > - * Enable/Disable host notifier mapping for a vdpa port.
> > + * Enable/Disable host notifier mapping for a vdpa queue.
> >   *
> >   * @param vid
> >   *  vhost device id
> >   * @param enable
> >   *  true for host notifier map, false for host notifier unmap
> > + * @param qid
> > + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device
> > + queues
> I would prefer two APIs that passing a special ID that means all queues:
> 
> rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
> rte_vhost_host_notifier_ctrl_all(int vid, bool enable);
> 
> I think it is clearer for the user of the API.
> Or if you think an extra API is overkill, just let the driver loop on all the
> queues.

We have a lot of options here with pros and cons.
I took the rte_eth_dev_callback_register style.

It is less intrusive with minimum code change.  

I'm not sure what is the clearest option but the current suggestion is well defined and 
allows to configure all the queues too.

Let me know what you prefer....

> >   * @return
> >   *  0 on success, -1 on failure
> >   */
> >  __rte_experimental
> >  int
> > -rte_vhost_host_notifier_ctrl(int vid, bool enable);
> > +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
> >
> >  /**


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-19 13:11     ` Matan Azrad
@ 2020-06-19 13:54       ` Maxime Coquelin
  2020-06-21  6:20         ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19 13:54 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev

Hi Matan,

On 6/19/20 3:11 PM, Matan Azrad wrote:
> Hi Maxime
> 
> Thanks for the fast review.
> This is first version, let's review it carefully to be sure it is correct.
> @Xiao Wang, it will be good to hear your idea too.
> We also need to understand the effect on IFC driver/device...
> Just to update that I checked this code with the mlx5 adjustments and I sent in this series.
> It works well with the vDPA example application.

OK.

> From: Maxime Coquelin:
>> On 6/18/20 6:28 PM, Matan Azrad wrote:
>>> Some guest drivers may not configure disabled virtio queues.
>>>
>>> In this case, the vhost management never triggers the vDPA device
>>> configuration because it waits to the device to be ready.
>>
>> This is not vDPA-only, even with SW datapath the application's new_device
>> callback never gets called.
>>
> Yes, I wrote it below, I can be more specific here too in the next version.
> 
>>> The current ready state means that all the virtio queues should be
>>> configured regardless the enablement status.
>>>
>>> In order to support this case, this patch changes the ready state:
>>> The device is ready when at least 1 queue pair is configured and
>>> enabled.
>>>
>>> So, now, the vDPA driver will be configured when the first queue pair
>>> is configured and enabled.
>>>
>>> Also the queue state operation is change to the next rules:
>>> 	1. queue becomes ready (enabled and fully configured) -
>>> 		set_vring_state(enabled).
>>> 	2. queue becomes not ready - set_vring_state(disabled).
>>> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage
>> was
>>> 		handled - set_vring_state(enabled).
>>>
>>> The parallel operations for the application are adjusted too.
>>>
>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>> ---
>>>  lib/librte_vhost/vhost_user.c | 51
>>> ++++++++++++++++++++++++++++---------------
>>>  1 file changed, 33 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c
>>> b/lib/librte_vhost/vhost_user.c index b0849b9..cfd5f27 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -1295,7 +1295,7 @@
>>>  {
>>>  	bool rings_ok;
>>>
>>> -	if (!vq)
>>> +	if (!vq || !vq->enabled)
>>>  		return false;
>>>
>>>  	if (vq_is_packed(dev))
>>> @@ -1309,24 +1309,27 @@
>>>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;  }
>>>
>>> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
>>> +
>>>  static int
>>>  virtio_is_ready(struct virtio_net *dev)  {
>>>  	struct vhost_virtqueue *vq;
>>>  	uint32_t i;
>>>
>>> -	if (dev->nr_vring == 0)
>>> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>>>  		return 0;
>>>
>>> -	for (i = 0; i < dev->nr_vring; i++) {
>>> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>>>  		vq = dev->virtqueue[i];
>>>
>>>  		if (!vq_is_ready(dev, vq))
>>>  			return 0;
>>>  	}
>>>
>>> -	VHOST_LOG_CONFIG(INFO,
>>> -		"virtio is now ready for processing.\n");
>>> +	if (!(dev->flags & VIRTIO_DEV_READY))
>>> +		VHOST_LOG_CONFIG(INFO,
>>> +			"virtio is now ready for processing.\n");
>>>  	return 1;
>>>  }
>>>
>>> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct
>> virtio_net **pdev __rte_unused,
>>>  	struct virtio_net *dev = *pdev;
>>>  	int enable = (int)msg->payload.state.num;
>>>  	int index = (int)msg->payload.state.index;
>>> -	struct rte_vdpa_device *vdpa_dev;
>>> -	int did = -1;
>>>
>>>  	if (validate_msg_fds(msg, 0) != 0)
>>>  		return RTE_VHOST_MSG_RESULT_ERR;
>>> @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct
>> virtio_net **pdev __rte_unused,
>>>  		"set queue enable: %d to qp idx: %d\n",
>>>  		enable, index);
>>>
>>> -	did = dev->vdpa_dev_id;
>>> -	vdpa_dev = rte_vdpa_get_device(did);
>>> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>>> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
>>> -
>>> -	if (dev->notify_ops->vring_state_changed)
>>> -		dev->notify_ops->vring_state_changed(dev->vid,
>>> -				index, enable);
>>> -
>>>  	/* On disable, rings have to be stopped being processed. */
>>>  	if (!enable && dev->dequeue_zero_copy)
>>>  		drain_zmbuf_list(dev->virtqueue[index]);
>>> @@ -2622,11 +2614,13 @@ typedef int
>> (*vhost_message_handler_t)(struct virtio_net **pdev,
>>>  	struct virtio_net *dev;
>>>  	struct VhostUserMsg msg;
>>>  	struct rte_vdpa_device *vdpa_dev;
>>> +	bool ready[VHOST_MAX_VRING];
>>>  	int did = -1;
>>>  	int ret;
>>>  	int unlock_required = 0;
>>>  	bool handled;
>>>  	int request;
>>> +	uint32_t i;
>>>
>>>  	dev = get_device(vid);
>>>  	if (dev == NULL)
>>> @@ -2668,6 +2662,10 @@ typedef int (*vhost_message_handler_t)(struct
>> virtio_net **pdev,
>>>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n",
>> request);
>>>  	}
>>>
>>> +	/* Save ready status for all the VQs before message handle. */
>>> +	for (i = 0; i < VHOST_MAX_VRING; i++)
>>> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
>>> +
>>
>> This big array can be avoided if you save the ready status in the virtqueue
>> once message have been handled.
> 
> You mean you prefer to save it in virtqueue structure? Desn't it same memory ?
> In any case I don't think 0x100 is so big 😊

I mean in the stack.

And one advantage of saving it in the vq structure is for example you
have memory hotplug. The vq is in ready state in the beginning and in
the end, but during the handling the ring host virtual addresses get
changed because of the munmap/mmap and we need to notify the driver
otherwise it will miss it.

>  
>>>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
>>>  	if (ret < 0) {
>>>  		VHOST_LOG_CONFIG(ERR,
>>> @@ -2802,6 +2800,25 @@ typedef int (*vhost_message_handler_t)(struct
>> virtio_net **pdev,
>>>  		return -1;
>>>  	}
>>>
>>> +	did = dev->vdpa_dev_id;
>>> +	vdpa_dev = rte_vdpa_get_device(did);
>>> +	/* Update ready status. */
>>> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
>>> +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
>>> +
>>> +		if ((cur_ready && request ==
>> VHOST_USER_SET_VRING_ENABLE &&
>>> +				i == msg.payload.state.index) ||
>>
>> Couldn't we remove above condition? Aren't the callbacks already called in
>> the set_vring_enable handler?
> 
> As we agreed in the design discussion:
> 
> " 3. Same handling of the requests, except that we won't notify the 
>  vdpa driver and the application of vring state changes in the 
>  VHOST_USER_SET_VRING_ENABLE handler."  
> 
> So, I removed it from the set_vring_enable handler.

My bad, the patch context where it is removed made to think it was in
vhost_user_set_vring_err(), so I missed it.

Thinking at it again since last time we discussed it, we have to send
the notification from the handler in the case

> Now, the ready state doesn't depend only in VHOST_USER_SET_VRING_ENABLE massage.
>  
>>> +				cur_ready != ready[i]) {
>>> +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>>> +				vdpa_dev->ops->set_vring_state(dev->vid, i,
>>> +
>> 	(int)cur_ready);
>>> +
>>> +			if (dev->notify_ops->vring_state_changed)
>>> +				dev->notify_ops->vring_state_changed(dev-
>>> vid,
>>> +							i, (int)cur_ready);
>>> +		}
>>> +	}
>>
>> I think we should move this into a dedicated function, which we would call in
>> every message handler that can modify the ready state.
>>
>> Doing so, we would not have to assume the master sent us disable request
>> for the queue before, ans also would have proper synchronization if the
>> request uses reply-ack feature as it could assume the backend is no more
>> processing the ring once reply-ack is received.
> 
> Makes sense to do it before reply-ack and to create dedicated function to it.
> 
> Doen't the vDPA conf should be called before reply-ack too to be sure queues are ready before reply?

I don't think so, because the backend can start processing the ring
after. What we don't want is that the backend continues to process the
rings when the guest asked to stop doing it.

> If so, we should move also the device ready code below (maybe also vdpa conf) to this function too.

So I don't think it is needed.

> But maybe call it directly from this function and not from the specific massage handlers is better, something like the vhost_user_check_and_alloc_queue_pair function style.
> 
> What do you think?
> 
>>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
>>>  		dev->flags |= VIRTIO_DEV_READY;
>>>
>>> @@ -2816,8 +2833,6 @@ typedef int (*vhost_message_handler_t)(struct
>> virtio_net **pdev,
>>>  		}
>>>  	}
>>>
>>> -	did = dev->vdpa_dev_id;
>>> -	vdpa_dev = rte_vdpa_get_device(did);
>>>  	if (vdpa_dev && virtio_is_ready(dev) &&
>>>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)
>> &&
>>>  			msg.request.master ==
>> VHOST_USER_SET_VRING_CALL) {
>>
>> Shouldn't check on SET_VRING_CALL above be removed?
> 
> Isn't it is a workaround for something?
> 

Normally, we should no more need it, as state change notification will
be sent if callfd came to change.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-19 13:28     ` Matan Azrad
@ 2020-06-19 14:01       ` Maxime Coquelin
  2020-06-21  6:26         ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-19 14:01 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/19/20 3:28 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> On 6/18/20 6:28 PM, Matan Azrad wrote:
>>> As an arrangement to per queue operations in the vDPA device it is
>>> needed to change the next experimental API:
>>>
>>> The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
>>> instead of per device.
>>>
>>> A `qid` parameter was added to the API arguments list.
>>>
>>> Setting the parameter to the value VHOST_QUEUE_ALL will configure the
>>> host notifier to all the device queues as done before this patch.
>>>
>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>> ---
>>>  doc/guides/rel_notes/release_20_08.rst |  2 ++
>>>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
>>>  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
>>>  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
>>>  lib/librte_vhost/rte_vhost.h           |  2 ++
>>>  lib/librte_vhost/vhost.h               |  3 ---
>>>  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
>>>  7 files changed, 30 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/release_20_08.rst
>>> b/doc/guides/rel_notes/release_20_08.rst
>>> index ba16d3b..9732959 100644
>>> --- a/doc/guides/rel_notes/release_20_08.rst
>>> +++ b/doc/guides/rel_notes/release_20_08.rst
>>> @@ -111,6 +111,8 @@ API Changes
>>>     Also, make sure to start the actual text at the margin.
>>>
>> =========================================================
>>>
>>> +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to
>>> +be per
>>> +  queue and not per device, a qid parameter was added to the arguments
>> list.
>>>
>>>  ABI Changes
>>>  -----------
>>> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>> b/drivers/vdpa/ifc/ifcvf_vdpa.c index ec97178..336837a 100644
>>> --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
>>> @@ -839,7 +839,7 @@ struct internal_list {
>>>  	vdpa_ifcvf_stop(internal);
>>>  	vdpa_disable_vfio_intr(internal);
>>>
>>> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
>>> +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
>>>  	if (ret && ret != -ENOTSUP)
>>>  		goto error;
>>>
>>> @@ -858,7 +858,7 @@ struct internal_list {
>>>  	if (ret)
>>>  		goto stop_vf;
>>>
>>> -	rte_vhost_host_notifier_ctrl(vid, true);
>>> +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
>>>
>>>  	internal->sw_fallback_running = true;
>>>
>>> @@ -893,7 +893,7 @@ struct internal_list {
>>>  	rte_atomic32_set(&internal->dev_attached, 1);
>>>  	update_datapath(internal);
>>>
>>> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>>> +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
>>>  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>>>
>>>  	return 0;
>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 9e758b6..8ea1300 100644
>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
>>> @@ -147,7 +147,8 @@
>>>  	int ret;
>>>
>>>  	if (priv->direct_notifier) {
>>> -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
>>> +		ret = rte_vhost_host_notifier_ctrl(priv->vid,
>> VHOST_QUEUE_ALL,
>>> +						   false);
>>>  		if (ret != 0) {
>>>  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
>>>  				"destroyed for device %d: %d.", priv->vid,
>> ret); @@ -155,7 +156,7
>>> @@
>>>  		}
>>>  		priv->direct_notifier = 0;
>>>  	}
>>> -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
>>> +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
>>> +true);
>>>  	if (ret != 0)
>>>  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured
>> for"
>>>  			" device %d: %d.", priv->vid, ret); diff --git
>>> a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index
>>> ecb3d91..2db536c 100644
>>> --- a/lib/librte_vhost/rte_vdpa.h
>>> +++ b/lib/librte_vhost/rte_vdpa.h
>>> @@ -202,22 +202,26 @@ struct rte_vdpa_device *  int
>>> rte_vdpa_get_device_num(void);
>>>
>>> +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
>>> +
>>>  /**
>>>   * @warning
>>>   * @b EXPERIMENTAL: this API may change without prior notice
>>>   *
>>> - * Enable/Disable host notifier mapping for a vdpa port.
>>> + * Enable/Disable host notifier mapping for a vdpa queue.
>>>   *
>>>   * @param vid
>>>   *  vhost device id
>>>   * @param enable
>>>   *  true for host notifier map, false for host notifier unmap
>>> + * @param qid
>>> + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device
>>> + queues
>> I would prefer two APIs that passing a special ID that means all queues:
>>
>> rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>> rte_vhost_host_notifier_ctrl_all(int vid, bool enable);
>>
>> I think it is clearer for the user of the API.
>> Or if you think an extra API is overkill, just let the driver loop on all the
>> queues.
> 
> We have a lot of options here with pros and cons.
> I took the rte_eth_dev_callback_register style.

Ok, I didn't looked at this code.

> It is less intrusive with minimum code change.  
> 
> I'm not sure what is the clearest option but the current suggestion is well defined and 
> allows to configure all the queues too.
> 
> Let me know what you prefer....

I personally don't like the style, but I can live with it if you prefer
doing it like that.

If you do it that way, you will have to rename VHOST_QUEUE_ALL to
RTE_VHOST_QUEUE_ALL, VHOST_MAX_VRING  to RTE_VHOST_MAX_VRING and
VHOST_MAX_QUEUE_PAIRS to RTE_VHOST_MAX_QUEUE_PAIRS as it will become
part of the ABI.

Not that it also means that we won't be able to increase the maximum
number of rings without breaking the ABI.

>>>   * @return
>>>   *  0 on success, -1 on failure
>>>   */
>>>  __rte_experimental
>>>  int
>>> -rte_vhost_host_notifier_ctrl(int vid, bool enable);
>>> +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>>>
>>>  /**
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-19 13:54       ` Maxime Coquelin
@ 2020-06-21  6:20         ` Matan Azrad
  2020-06-22  8:04           ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-21  6:20 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Hi Maxime

From: Maxime Coquelin:
> Hi Matan,
> 
> On 6/19/20 3:11 PM, Matan Azrad wrote:
> > Hi Maxime
> >
> > Thanks for the fast review.
> > This is first version, let's review it carefully to be sure it is correct.
> > @Xiao Wang, it will be good to hear your idea too.
> > We also need to understand the effect on IFC driver/device...
> > Just to update that I checked this code with the mlx5 adjustments and I
> sent in this series.
> > It works well with the vDPA example application.
> 
> OK.
> 
> > From: Maxime Coquelin:
> >> On 6/18/20 6:28 PM, Matan Azrad wrote:
> >>> Some guest drivers may not configure disabled virtio queues.
> >>>
> >>> In this case, the vhost management never triggers the vDPA device
> >>> configuration because it waits to the device to be ready.
> >>
> >> This is not vDPA-only, even with SW datapath the application's
> >> new_device callback never gets called.
> >>
> > Yes, I wrote it below, I can be more specific here too in the next version.
> >
> >>> The current ready state means that all the virtio queues should be
> >>> configured regardless the enablement status.
> >>>
> >>> In order to support this case, this patch changes the ready state:
> >>> The device is ready when at least 1 queue pair is configured and
> >>> enabled.
> >>>
> >>> So, now, the vDPA driver will be configured when the first queue
> >>> pair is configured and enabled.
> >>>
> >>> Also the queue state operation is change to the next rules:
> >>> 	1. queue becomes ready (enabled and fully configured) -
> >>> 		set_vring_state(enabled).
> >>> 	2. queue becomes not ready - set_vring_state(disabled).
> >>> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage
> >> was
> >>> 		handled - set_vring_state(enabled).
> >>>
> >>> The parallel operations for the application are adjusted too.
> >>>
> >>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >>> ---
> >>>  lib/librte_vhost/vhost_user.c | 51
> >>> ++++++++++++++++++++++++++++---------------
> >>>  1 file changed, 33 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_user.c
> >>> b/lib/librte_vhost/vhost_user.c index b0849b9..cfd5f27 100644
> >>> --- a/lib/librte_vhost/vhost_user.c
> >>> +++ b/lib/librte_vhost/vhost_user.c
> >>> @@ -1295,7 +1295,7 @@
> >>>  {
> >>>  	bool rings_ok;
> >>>
> >>> -	if (!vq)
> >>> +	if (!vq || !vq->enabled)
> >>>  		return false;
> >>>
> >>>  	if (vq_is_packed(dev))
> >>> @@ -1309,24 +1309,27 @@
> >>>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;  }
> >>>
> >>> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
> >>> +
> >>>  static int
> >>>  virtio_is_ready(struct virtio_net *dev)  {
> >>>  	struct vhost_virtqueue *vq;
> >>>  	uint32_t i;
> >>>
> >>> -	if (dev->nr_vring == 0)
> >>> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
> >>>  		return 0;
> >>>
> >>> -	for (i = 0; i < dev->nr_vring; i++) {
> >>> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
> >>>  		vq = dev->virtqueue[i];
> >>>
> >>>  		if (!vq_is_ready(dev, vq))
> >>>  			return 0;
> >>>  	}
> >>>
> >>> -	VHOST_LOG_CONFIG(INFO,
> >>> -		"virtio is now ready for processing.\n");
> >>> +	if (!(dev->flags & VIRTIO_DEV_READY))
> >>> +		VHOST_LOG_CONFIG(INFO,
> >>> +			"virtio is now ready for processing.\n");
> >>>  	return 1;
> >>>  }
> >>>
> >>> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct
> >> virtio_net **pdev __rte_unused,
> >>>  	struct virtio_net *dev = *pdev;
> >>>  	int enable = (int)msg->payload.state.num;
> >>>  	int index = (int)msg->payload.state.index;
> >>> -	struct rte_vdpa_device *vdpa_dev;
> >>> -	int did = -1;
> >>>
> >>>  	if (validate_msg_fds(msg, 0) != 0)
> >>>  		return RTE_VHOST_MSG_RESULT_ERR;
> >>> @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct
> >> virtio_net **pdev __rte_unused,
> >>>  		"set queue enable: %d to qp idx: %d\n",
> >>>  		enable, index);
> >>>
> >>> -	did = dev->vdpa_dev_id;
> >>> -	vdpa_dev = rte_vdpa_get_device(did);
> >>> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> >>> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> >>> -
> >>> -	if (dev->notify_ops->vring_state_changed)
> >>> -		dev->notify_ops->vring_state_changed(dev->vid,
> >>> -				index, enable);
> >>> -
> >>>  	/* On disable, rings have to be stopped being processed. */
> >>>  	if (!enable && dev->dequeue_zero_copy)
> >>>  		drain_zmbuf_list(dev->virtqueue[index]);
> >>> @@ -2622,11 +2614,13 @@ typedef int
> >> (*vhost_message_handler_t)(struct virtio_net **pdev,
> >>>  	struct virtio_net *dev;
> >>>  	struct VhostUserMsg msg;
> >>>  	struct rte_vdpa_device *vdpa_dev;
> >>> +	bool ready[VHOST_MAX_VRING];
> >>>  	int did = -1;
> >>>  	int ret;
> >>>  	int unlock_required = 0;
> >>>  	bool handled;
> >>>  	int request;
> >>> +	uint32_t i;
> >>>
> >>>  	dev = get_device(vid);
> >>>  	if (dev == NULL)
> >>> @@ -2668,6 +2662,10 @@ typedef int
> (*vhost_message_handler_t)(struct
> >> virtio_net **pdev,
> >>>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n",
> >> request);
> >>>  	}
> >>>
> >>> +	/* Save ready status for all the VQs before message handle. */
> >>> +	for (i = 0; i < VHOST_MAX_VRING; i++)
> >>> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
> >>> +
> >>
> >> This big array can be avoided if you save the ready status in the
> >> virtqueue once message have been handled.
> >
> > You mean you prefer to save it in virtqueue structure? Desn't it same
> memory ?
> > In any case I don't think 0x100 is so big 😊
> 
> I mean in the stack.

Do you think that 256B is too much for stack?
 
> And one advantage of saving it in the vq structure is for example you have
> memory hotplug. The vq is in ready state in the beginning and in the end, but
> during the handling the ring host virtual addresses get changed because of
> the munmap/mmap and we need to notify the driver otherwise it will miss it.

Do you mean VHOST_USER_SET_MEM_TABLE call after first configuration?

I don't understand what is the issue of saving it in stack here....

But one advantage of saving it in virtqueue structure is that the message handler should not check the ready state before each message.

I will change it in next version.

> >
> >>>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
> >>>  	if (ret < 0) {
> >>>  		VHOST_LOG_CONFIG(ERR,
> >>> @@ -2802,6 +2800,25 @@ typedef int
> (*vhost_message_handler_t)(struct
> >> virtio_net **pdev,
> >>>  		return -1;
> >>>  	}
> >>>
> >>> +	did = dev->vdpa_dev_id;
> >>> +	vdpa_dev = rte_vdpa_get_device(did);
> >>> +	/* Update ready status. */
> >>> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
> >>> +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
> >>> +
> >>> +		if ((cur_ready && request ==
> >> VHOST_USER_SET_VRING_ENABLE &&
> >>> +				i == msg.payload.state.index) ||
> >>
> >> Couldn't we remove above condition? Aren't the callbacks already
> >> called in the set_vring_enable handler?
> >
> > As we agreed in the design discussion:
> >
> > " 3. Same handling of the requests, except that we won't notify the
> > vdpa driver and the application of vring state changes in the
> > VHOST_USER_SET_VRING_ENABLE handler."
> >
> > So, I removed it from the set_vring_enable handler.
> 
> My bad, the patch context where it is removed made to think it was in
> vhost_user_set_vring_err(), so I missed it.
> 
> Thinking at it again since last time we discussed it, we have to send the
> notification from the handler in the case
> 
> > Now, the ready state doesn't depend only in
> VHOST_USER_SET_VRING_ENABLE massage.
> >
> >>> +				cur_ready != ready[i]) {
> >>> +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> >>> +				vdpa_dev->ops->set_vring_state(dev->vid, i,
> >>> +
> >> 	(int)cur_ready);
> >>> +
> >>> +			if (dev->notify_ops->vring_state_changed)
> >>> +				dev->notify_ops->vring_state_changed(dev-
> >>> vid,
> >>> +							i, (int)cur_ready);
> >>> +		}
> >>> +	}
> >>
> >> I think we should move this into a dedicated function, which we would
> >> call in every message handler that can modify the ready state.
> >>
> >> Doing so, we would not have to assume the master sent us disable
> >> request for the queue before, ans also would have proper
> >> synchronization if the request uses reply-ack feature as it could
> >> assume the backend is no more processing the ring once reply-ack is
> received.
> >
> > Makes sense to do it before reply-ack and to create dedicated function to
> it.
> >
> > Doen't the vDPA conf should be called before reply-ack too to be sure
> queues are ready before reply?
> 
> I don't think so, because the backend can start processing the ring after.
> What we don't want is that the backend continues to process the rings when
> the guest asked to stop doing it.

But "doing configuration after reply" may cause that the a guest kicks a queue while app \ vDPA driver is being configured.
It may lead to some order dependencies in configuration....

In addition, now, the device ready state becomes on only in the same time that a queue becomes on,
so we can do the device ready check (for new_device \ dev_conf calls) only when a queue becomes ready in the same function.

> > If so, we should move also the device ready code below (maybe also vdpa
> conf) to this function too.
> 
> So I don't think it is needed.
> > But maybe call it directly from this function and not from the specific
> massage handlers is better, something like the
> vhost_user_check_and_alloc_queue_pair function style.
> >
> > What do you think?

Any answer here?

> >
> >>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
> >>>  		dev->flags |= VIRTIO_DEV_READY;
> >>>
> >>> @@ -2816,8 +2833,6 @@ typedef int
> (*vhost_message_handler_t)(struct
> >> virtio_net **pdev,
> >>>  		}
> >>>  	}
> >>>
> >>> -	did = dev->vdpa_dev_id;
> >>> -	vdpa_dev = rte_vdpa_get_device(did);
> >>>  	if (vdpa_dev && virtio_is_ready(dev) &&
> >>>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)
> >> &&
> >>>  			msg.request.master ==
> >> VHOST_USER_SET_VRING_CALL) {
> >>
> >> Shouldn't check on SET_VRING_CALL above be removed?
> >
> > Isn't it is a workaround for something?
> >
> 
> Normally, we should no more need it, as state change notification will be
> sent if callfd came to change.

Ok, will remove it.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-19 14:01       ` Maxime Coquelin
@ 2020-06-21  6:26         ` Matan Azrad
  2020-06-22  8:06           ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-21  6:26 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Hi Maxime

From: Maxime Coquelin:
> On 6/19/20 3:28 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/18/20 6:28 PM, Matan Azrad wrote:
> >>> As an arrangement to per queue operations in the vDPA device it is
> >>> needed to change the next experimental API:
> >>>
> >>> The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
> >>> instead of per device.
> >>>
> >>> A `qid` parameter was added to the API arguments list.
> >>>
> >>> Setting the parameter to the value VHOST_QUEUE_ALL will configure
> >>> the host notifier to all the device queues as done before this patch.
> >>>
> >>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >>> ---
> >>>  doc/guides/rel_notes/release_20_08.rst |  2 ++
> >>>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
> >>>  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
> >>>  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
> >>>  lib/librte_vhost/rte_vhost.h           |  2 ++
> >>>  lib/librte_vhost/vhost.h               |  3 ---
> >>>  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
> >>>  7 files changed, 30 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/doc/guides/rel_notes/release_20_08.rst
> >>> b/doc/guides/rel_notes/release_20_08.rst
> >>> index ba16d3b..9732959 100644
> >>> --- a/doc/guides/rel_notes/release_20_08.rst
> >>> +++ b/doc/guides/rel_notes/release_20_08.rst
> >>> @@ -111,6 +111,8 @@ API Changes
> >>>     Also, make sure to start the actual text at the margin.
> >>>
> >>
> =========================================================
> >>>
> >>> +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to
> >>> +be per
> >>> +  queue and not per device, a qid parameter was added to the
> >>> +arguments
> >> list.
> >>>
> >>>  ABI Changes
> >>>  -----------
> >>> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
> >>> b/drivers/vdpa/ifc/ifcvf_vdpa.c index ec97178..336837a 100644
> >>> --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
> >>> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
> >>> @@ -839,7 +839,7 @@ struct internal_list {
> >>>  	vdpa_ifcvf_stop(internal);
> >>>  	vdpa_disable_vfio_intr(internal);
> >>>
> >>> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
> >>> +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
> >>>  	if (ret && ret != -ENOTSUP)
> >>>  		goto error;
> >>>
> >>> @@ -858,7 +858,7 @@ struct internal_list {
> >>>  	if (ret)
> >>>  		goto stop_vf;
> >>>
> >>> -	rte_vhost_host_notifier_ctrl(vid, true);
> >>> +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
> >>>
> >>>  	internal->sw_fallback_running = true;
> >>>
> >>> @@ -893,7 +893,7 @@ struct internal_list {
> >>>  	rte_atomic32_set(&internal->dev_attached, 1);
> >>>  	update_datapath(internal);
> >>>
> >>> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> >>> +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
> >>>  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> >>>
> >>>  	return 0;
> >>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
> >>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 9e758b6..8ea1300 100644
> >>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> >>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> >>> @@ -147,7 +147,8 @@
> >>>  	int ret;
> >>>
> >>>  	if (priv->direct_notifier) {
> >>> -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
> >>> +		ret = rte_vhost_host_notifier_ctrl(priv->vid,
> >> VHOST_QUEUE_ALL,
> >>> +						   false);
> >>>  		if (ret != 0) {
> >>>  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
> >>>  				"destroyed for device %d: %d.", priv->vid,
> >> ret); @@ -155,7 +156,7
> >>> @@
> >>>  		}
> >>>  		priv->direct_notifier = 0;
> >>>  	}
> >>> -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
> >>> +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
> >>> +true);
> >>>  	if (ret != 0)
> >>>  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured
> >> for"
> >>>  			" device %d: %d.", priv->vid, ret); diff --git
> >>> a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index
> >>> ecb3d91..2db536c 100644
> >>> --- a/lib/librte_vhost/rte_vdpa.h
> >>> +++ b/lib/librte_vhost/rte_vdpa.h
> >>> @@ -202,22 +202,26 @@ struct rte_vdpa_device *  int
> >>> rte_vdpa_get_device_num(void);
> >>>
> >>> +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
> >>> +
> >>>  /**
> >>>   * @warning
> >>>   * @b EXPERIMENTAL: this API may change without prior notice
> >>>   *
> >>> - * Enable/Disable host notifier mapping for a vdpa port.
> >>> + * Enable/Disable host notifier mapping for a vdpa queue.
> >>>   *
> >>>   * @param vid
> >>>   *  vhost device id
> >>>   * @param enable
> >>>   *  true for host notifier map, false for host notifier unmap
> >>> + * @param qid
> >>> + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device
> >>> + queues
> >> I would prefer two APIs that passing a special ID that means all queues:
> >>
> >> rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
> >> rte_vhost_host_notifier_ctrl_all(int vid, bool enable);
> >>
> >> I think it is clearer for the user of the API.
> >> Or if you think an extra API is overkill, just let the driver loop on
> >> all the queues.
> >
> > We have a lot of options here with pros and cons.
> > I took the rte_eth_dev_callback_register style.
> 
> Ok, I didn't looked at this code.
> 
> > It is less intrusive with minimum code change.
> >
> > I'm not sure what is the clearest option but the current suggestion is
> > well defined and allows to configure all the queues too.
> >
> > Let me know what you prefer....
> 
> I personally don't like the style, but I can live with it if you prefer doing it like
> that.
> 
> If you do it that way, you will have to rename VHOST_QUEUE_ALL to
> RTE_VHOST_QUEUE_ALL, VHOST_MAX_VRING  to RTE_VHOST_MAX_VRING
> and VHOST_MAX_QUEUE_PAIRS to RTE_VHOST_MAX_QUEUE_PAIRS as it
> will become part of the ABI.
> 
> Not that it also means that we won't be able to increase the maximum
> number of rings without breaking the ABI.

What's about defining RTE_VHOST_QUEUE_ALL as UINT16_MAX?

> >>>   * @return
> >>>   *  0 on success, -1 on failure
> >>>   */
> >>>  __rte_experimental
> >>>  int
> >>> -rte_vhost_host_notifier_ctrl(int vid, bool enable);
> >>> +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
> >>>
> >>>  /**
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-21  6:20         ` Matan Azrad
@ 2020-06-22  8:04           ` Maxime Coquelin
  2020-06-22  8:41             ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22  8:04 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev

Hi,

On 6/21/20 8:20 AM, Matan Azrad wrote:
> Hi Maxime
> 
> From: Maxime Coquelin:
>> Hi Matan,
>>
>> On 6/19/20 3:11 PM, Matan Azrad wrote:
>>> Hi Maxime
>>>
>>> Thanks for the fast review.
>>> This is first version, let's review it carefully to be sure it is correct.
>>> @Xiao Wang, it will be good to hear your idea too.
>>> We also need to understand the effect on IFC driver/device...
>>> Just to update that I checked this code with the mlx5 adjustments and I
>> sent in this series.
>>> It works well with the vDPA example application.
>>
>> OK.
>>
>>> From: Maxime Coquelin:
>>>> On 6/18/20 6:28 PM, Matan Azrad wrote:
>>>>> Some guest drivers may not configure disabled virtio queues.
>>>>>
>>>>> In this case, the vhost management never triggers the vDPA device
>>>>> configuration because it waits to the device to be ready.
>>>>
>>>> This is not vDPA-only, even with SW datapath the application's
>>>> new_device callback never gets called.
>>>>
>>> Yes, I wrote it below, I can be more specific here too in the next version.
>>>
>>>>> The current ready state means that all the virtio queues should be
>>>>> configured regardless the enablement status.
>>>>>
>>>>> In order to support this case, this patch changes the ready state:
>>>>> The device is ready when at least 1 queue pair is configured and
>>>>> enabled.
>>>>>
>>>>> So, now, the vDPA driver will be configured when the first queue
>>>>> pair is configured and enabled.
>>>>>
>>>>> Also the queue state operation is change to the next rules:
>>>>> 	1. queue becomes ready (enabled and fully configured) -
>>>>> 		set_vring_state(enabled).
>>>>> 	2. queue becomes not ready - set_vring_state(disabled).
>>>>> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage
>>>> was
>>>>> 		handled - set_vring_state(enabled).
>>>>>
>>>>> The parallel operations for the application are adjusted too.
>>>>>
>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>> ---
>>>>>  lib/librte_vhost/vhost_user.c | 51
>>>>> ++++++++++++++++++++++++++++---------------
>>>>>  1 file changed, 33 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_vhost/vhost_user.c
>>>>> b/lib/librte_vhost/vhost_user.c index b0849b9..cfd5f27 100644
>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>> @@ -1295,7 +1295,7 @@
>>>>>  {
>>>>>  	bool rings_ok;
>>>>>
>>>>> -	if (!vq)
>>>>> +	if (!vq || !vq->enabled)
>>>>>  		return false;
>>>>>
>>>>>  	if (vq_is_packed(dev))
>>>>> @@ -1309,24 +1309,27 @@
>>>>>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;  }
>>>>>
>>>>> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
>>>>> +
>>>>>  static int
>>>>>  virtio_is_ready(struct virtio_net *dev)  {
>>>>>  	struct vhost_virtqueue *vq;
>>>>>  	uint32_t i;
>>>>>
>>>>> -	if (dev->nr_vring == 0)
>>>>> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>>>>>  		return 0;
>>>>>
>>>>> -	for (i = 0; i < dev->nr_vring; i++) {
>>>>> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>>>>>  		vq = dev->virtqueue[i];
>>>>>
>>>>>  		if (!vq_is_ready(dev, vq))
>>>>>  			return 0;
>>>>>  	}
>>>>>
>>>>> -	VHOST_LOG_CONFIG(INFO,
>>>>> -		"virtio is now ready for processing.\n");
>>>>> +	if (!(dev->flags & VIRTIO_DEV_READY))
>>>>> +		VHOST_LOG_CONFIG(INFO,
>>>>> +			"virtio is now ready for processing.\n");
>>>>>  	return 1;
>>>>>  }
>>>>>
>>>>> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct
>>>> virtio_net **pdev __rte_unused,
>>>>>  	struct virtio_net *dev = *pdev;
>>>>>  	int enable = (int)msg->payload.state.num;
>>>>>  	int index = (int)msg->payload.state.index;
>>>>> -	struct rte_vdpa_device *vdpa_dev;
>>>>> -	int did = -1;
>>>>>
>>>>>  	if (validate_msg_fds(msg, 0) != 0)
>>>>>  		return RTE_VHOST_MSG_RESULT_ERR;
>>>>> @@ -1980,15 +1981,6 @@ static int vhost_user_set_vring_err(struct
>>>> virtio_net **pdev __rte_unused,
>>>>>  		"set queue enable: %d to qp idx: %d\n",
>>>>>  		enable, index);
>>>>>
>>>>> -	did = dev->vdpa_dev_id;
>>>>> -	vdpa_dev = rte_vdpa_get_device(did);
>>>>> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>>>>> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
>>>>> -
>>>>> -	if (dev->notify_ops->vring_state_changed)
>>>>> -		dev->notify_ops->vring_state_changed(dev->vid,
>>>>> -				index, enable);
>>>>> -
>>>>>  	/* On disable, rings have to be stopped being processed. */
>>>>>  	if (!enable && dev->dequeue_zero_copy)
>>>>>  		drain_zmbuf_list(dev->virtqueue[index]);
>>>>> @@ -2622,11 +2614,13 @@ typedef int
>>>> (*vhost_message_handler_t)(struct virtio_net **pdev,
>>>>>  	struct virtio_net *dev;
>>>>>  	struct VhostUserMsg msg;
>>>>>  	struct rte_vdpa_device *vdpa_dev;
>>>>> +	bool ready[VHOST_MAX_VRING];
>>>>>  	int did = -1;
>>>>>  	int ret;
>>>>>  	int unlock_required = 0;
>>>>>  	bool handled;
>>>>>  	int request;
>>>>> +	uint32_t i;
>>>>>
>>>>>  	dev = get_device(vid);
>>>>>  	if (dev == NULL)
>>>>> @@ -2668,6 +2662,10 @@ typedef int
>> (*vhost_message_handler_t)(struct
>>>> virtio_net **pdev,
>>>>>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n",
>>>> request);
>>>>>  	}
>>>>>
>>>>> +	/* Save ready status for all the VQs before message handle. */
>>>>> +	for (i = 0; i < VHOST_MAX_VRING; i++)
>>>>> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
>>>>> +
>>>>
>>>> This big array can be avoided if you save the ready status in the
>>>> virtqueue once message have been handled.
>>>
>>> You mean you prefer to save it in virtqueue structure? Desn't it same
>> memory ?
>>> In any case I don't think 0x100 is so big 😊
>>
>> I mean in the stack.
> 
> Do you think that 256B is too much for stack?
>  
>> And one advantage of saving it in the vq structure is for example you have
>> memory hotplug. The vq is in ready state in the beginning and in the end, but
>> during the handling the ring host virtual addresses get changed because of
>> the munmap/mmap and we need to notify the driver otherwise it will miss it.
> 
> Do you mean VHOST_USER_SET_MEM_TABLE call after first configuration?
> 
> I don't understand what is the issue of saving it in stack here....

The issue is if you only check ready state only before and after the
message affecting the ring is handled, it can be ready at both stages,
while the rings have changed and state change callback should have been
called.

Please check the example patch I sent on Friday, it takes care of
invalidating the ring state and call the state change callback.

> But one advantage of saving it in virtqueue structure is that the message handler should not check the ready state before each message.
> 
> I will change it in next version.
> 
>>>
>>>>>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
>>>>>  	if (ret < 0) {
>>>>>  		VHOST_LOG_CONFIG(ERR,
>>>>> @@ -2802,6 +2800,25 @@ typedef int
>> (*vhost_message_handler_t)(struct
>>>> virtio_net **pdev,
>>>>>  		return -1;
>>>>>  	}
>>>>>
>>>>> +	did = dev->vdpa_dev_id;
>>>>> +	vdpa_dev = rte_vdpa_get_device(did);
>>>>> +	/* Update ready status. */
>>>>> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
>>>>> +		bool cur_ready = vq_is_ready(dev, dev->virtqueue[i]);
>>>>> +
>>>>> +		if ((cur_ready && request ==
>>>> VHOST_USER_SET_VRING_ENABLE &&
>>>>> +				i == msg.payload.state.index) ||
>>>>
>>>> Couldn't we remove above condition? Aren't the callbacks already
>>>> called in the set_vring_enable handler?
>>>
>>> As we agreed in the design discussion:
>>>
>>> " 3. Same handling of the requests, except that we won't notify the
>>> vdpa driver and the application of vring state changes in the
>>> VHOST_USER_SET_VRING_ENABLE handler."
>>>
>>> So, I removed it from the set_vring_enable handler.
>>
>> My bad, the patch context where it is removed made to think it was in
>> vhost_user_set_vring_err(), so I missed it.
>>
>> Thinking at it again since last time we discussed it, we have to send the
>> notification from the handler in the case
>>
>>> Now, the ready state doesn't depend only in
>> VHOST_USER_SET_VRING_ENABLE massage.
>>>
>>>>> +				cur_ready != ready[i]) {
>>>>> +			if (vdpa_dev && vdpa_dev->ops->set_vring_state)
>>>>> +				vdpa_dev->ops->set_vring_state(dev->vid, i,
>>>>> +
>>>> 	(int)cur_ready);
>>>>> +
>>>>> +			if (dev->notify_ops->vring_state_changed)
>>>>> +				dev->notify_ops->vring_state_changed(dev-
>>>>> vid,
>>>>> +							i, (int)cur_ready);
>>>>> +		}
>>>>> +	}
>>>>
>>>> I think we should move this into a dedicated function, which we would
>>>> call in every message handler that can modify the ready state.
>>>>
>>>> Doing so, we would not have to assume the master sent us disable
>>>> request for the queue before, ans also would have proper
>>>> synchronization if the request uses reply-ack feature as it could
>>>> assume the backend is no more processing the ring once reply-ack is
>> received.
>>>
>>> Makes sense to do it before reply-ack and to create dedicated function to
>> it.
>>>
>>> Doen't the vDPA conf should be called before reply-ack too to be sure
>> queues are ready before reply?
>>
>> I don't think so, because the backend can start processing the ring after.
>> What we don't want is that the backend continues to process the rings when
>> the guest asked to stop doing it.
> 
> But "doing configuration after reply" may cause that the a guest kicks a queue while app \ vDPA driver is being configured.
> It may lead to some order dependencies in configuration....
I get your point, we can try to move the configuration before the reply.

But looking at qemu source code, neither SET_VRING_KICK nor
SET_VRING_CALL nor SET_VRING_ENABLE request for reply-ack, so it won't
have any effect.

> In addition, now, the device ready state becomes on only in the same time that a queue becomes on,
> so we can do the device ready check (for new_device \ dev_conf calls) only when a queue becomes ready in the same function.

If you want, we can do try that too.

>>> If so, we should move also the device ready code below (maybe also vdpa
>> conf) to this function too.
>>
>> So I don't think it is needed.
>>> But maybe call it directly from this function and not from the specific
>> massage handlers is better, something like the
>> vhost_user_check_and_alloc_queue_pair function style.
>>>
>>> What do you think?
> 
> Any answer here?

To move the .new_device and .dev_conf callbacks in the same fonction
that sends the vring change notifications? Yes, we can do that I think.

>>>
>>>>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
>>>>>  		dev->flags |= VIRTIO_DEV_READY;
>>>>>
>>>>> @@ -2816,8 +2833,6 @@ typedef int
>> (*vhost_message_handler_t)(struct
>>>> virtio_net **pdev,
>>>>>  		}
>>>>>  	}
>>>>>
>>>>> -	did = dev->vdpa_dev_id;
>>>>> -	vdpa_dev = rte_vdpa_get_device(did);
>>>>>  	if (vdpa_dev && virtio_is_ready(dev) &&
>>>>>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)
>>>> &&
>>>>>  			msg.request.master ==
>>>> VHOST_USER_SET_VRING_CALL) {
>>>>
>>>> Shouldn't check on SET_VRING_CALL above be removed?
>>>
>>> Isn't it is a workaround for something?
>>>
>>
>> Normally, we should no more need it, as state change notification will be
>> sent if callfd came to change.
> 
> Ok, will remove it.
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration
  2020-06-21  6:26         ` Matan Azrad
@ 2020-06-22  8:06           ` Maxime Coquelin
  0 siblings, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22  8:06 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/21/20 8:26 AM, Matan Azrad wrote:
> Hi Maxime
> 
> From: Maxime Coquelin:
>> On 6/19/20 3:28 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> On 6/18/20 6:28 PM, Matan Azrad wrote:
>>>>> As an arrangement to per queue operations in the vDPA device it is
>>>>> needed to change the next experimental API:
>>>>>
>>>>> The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
>>>>> instead of per device.
>>>>>
>>>>> A `qid` parameter was added to the API arguments list.
>>>>>
>>>>> Setting the parameter to the value VHOST_QUEUE_ALL will configure
>>>>> the host notifier to all the device queues as done before this patch.
>>>>>
>>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>>>>> ---
>>>>>  doc/guides/rel_notes/release_20_08.rst |  2 ++
>>>>>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
>>>>>  drivers/vdpa/mlx5/mlx5_vdpa.c          |  5 +++--
>>>>>  lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
>>>>>  lib/librte_vhost/rte_vhost.h           |  2 ++
>>>>>  lib/librte_vhost/vhost.h               |  3 ---
>>>>>  lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
>>>>>  7 files changed, 30 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/doc/guides/rel_notes/release_20_08.rst
>>>>> b/doc/guides/rel_notes/release_20_08.rst
>>>>> index ba16d3b..9732959 100644
>>>>> --- a/doc/guides/rel_notes/release_20_08.rst
>>>>> +++ b/doc/guides/rel_notes/release_20_08.rst
>>>>> @@ -111,6 +111,8 @@ API Changes
>>>>>     Also, make sure to start the actual text at the margin.
>>>>>
>>>>
>> =========================================================
>>>>>
>>>>> +* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to
>>>>> +be per
>>>>> +  queue and not per device, a qid parameter was added to the
>>>>> +arguments
>>>> list.
>>>>>
>>>>>  ABI Changes
>>>>>  -----------
>>>>> diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> b/drivers/vdpa/ifc/ifcvf_vdpa.c index ec97178..336837a 100644
>>>>> --- a/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> +++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
>>>>> @@ -839,7 +839,7 @@ struct internal_list {
>>>>>  	vdpa_ifcvf_stop(internal);
>>>>>  	vdpa_disable_vfio_intr(internal);
>>>>>
>>>>> -	ret = rte_vhost_host_notifier_ctrl(vid, false);
>>>>> +	ret = rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, false);
>>>>>  	if (ret && ret != -ENOTSUP)
>>>>>  		goto error;
>>>>>
>>>>> @@ -858,7 +858,7 @@ struct internal_list {
>>>>>  	if (ret)
>>>>>  		goto stop_vf;
>>>>>
>>>>> -	rte_vhost_host_notifier_ctrl(vid, true);
>>>>> +	rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true);
>>>>>
>>>>>  	internal->sw_fallback_running = true;
>>>>>
>>>>> @@ -893,7 +893,7 @@ struct internal_list {
>>>>>  	rte_atomic32_set(&internal->dev_attached, 1);
>>>>>  	update_datapath(internal);
>>>>>
>>>>> -	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>>>>> +	if (rte_vhost_host_notifier_ctrl(vid, VHOST_QUEUE_ALL, true) != 0)
>>>>>  		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>>>>>
>>>>>  	return 0;
>>>>> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> b/drivers/vdpa/mlx5/mlx5_vdpa.c index 9e758b6..8ea1300 100644
>>>>> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
>>>>> @@ -147,7 +147,8 @@
>>>>>  	int ret;
>>>>>
>>>>>  	if (priv->direct_notifier) {
>>>>> -		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
>>>>> +		ret = rte_vhost_host_notifier_ctrl(priv->vid,
>>>> VHOST_QUEUE_ALL,
>>>>> +						   false);
>>>>>  		if (ret != 0) {
>>>>>  			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
>>>>>  				"destroyed for device %d: %d.", priv->vid,
>>>> ret); @@ -155,7 +156,7
>>>>> @@
>>>>>  		}
>>>>>  		priv->direct_notifier = 0;
>>>>>  	}
>>>>> -	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
>>>>> +	ret = rte_vhost_host_notifier_ctrl(priv->vid, VHOST_QUEUE_ALL,
>>>>> +true);
>>>>>  	if (ret != 0)
>>>>>  		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured
>>>> for"
>>>>>  			" device %d: %d.", priv->vid, ret); diff --git
>>>>> a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index
>>>>> ecb3d91..2db536c 100644
>>>>> --- a/lib/librte_vhost/rte_vdpa.h
>>>>> +++ b/lib/librte_vhost/rte_vdpa.h
>>>>> @@ -202,22 +202,26 @@ struct rte_vdpa_device *  int
>>>>> rte_vdpa_get_device_num(void);
>>>>>
>>>>> +#define VHOST_QUEUE_ALL VHOST_MAX_VRING
>>>>> +
>>>>>  /**
>>>>>   * @warning
>>>>>   * @b EXPERIMENTAL: this API may change without prior notice
>>>>>   *
>>>>> - * Enable/Disable host notifier mapping for a vdpa port.
>>>>> + * Enable/Disable host notifier mapping for a vdpa queue.
>>>>>   *
>>>>>   * @param vid
>>>>>   *  vhost device id
>>>>>   * @param enable
>>>>>   *  true for host notifier map, false for host notifier unmap
>>>>> + * @param qid
>>>>> + *  vhost queue id, VHOST_QUEUE_ALL to configure all the device
>>>>> + queues
>>>> I would prefer two APIs that passing a special ID that means all queues:
>>>>
>>>> rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>>>> rte_vhost_host_notifier_ctrl_all(int vid, bool enable);
>>>>
>>>> I think it is clearer for the user of the API.
>>>> Or if you think an extra API is overkill, just let the driver loop on
>>>> all the queues.
>>>
>>> We have a lot of options here with pros and cons.
>>> I took the rte_eth_dev_callback_register style.
>>
>> Ok, I didn't looked at this code.
>>
>>> It is less intrusive with minimum code change.
>>>
>>> I'm not sure what is the clearest option but the current suggestion is
>>> well defined and allows to configure all the queues too.
>>>
>>> Let me know what you prefer....
>>
>> I personally don't like the style, but I can live with it if you prefer doing it like
>> that.
>>
>> If you do it that way, you will have to rename VHOST_QUEUE_ALL to
>> RTE_VHOST_QUEUE_ALL, VHOST_MAX_VRING  to RTE_VHOST_MAX_VRING
>> and VHOST_MAX_QUEUE_PAIRS to RTE_VHOST_MAX_QUEUE_PAIRS as it
>> will become part of the ABI.
>>
>> Not that it also means that we won't be able to increase the maximum
>> number of rings without breaking the ABI.
> 
> What's about defining RTE_VHOST_QUEUE_ALL as UINT16_MAX?

I am not fan, but it is better than basing it on VHOST_MAX_QUEUE_PAIRS.

>>>>>   * @return
>>>>>   *  0 on success, -1 on failure
>>>>>   */
>>>>>  __rte_experimental
>>>>>  int
>>>>> -rte_vhost_host_notifier_ctrl(int vid, bool enable);
>>>>> +rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
>>>>>
>>>>>  /**
>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22  8:04           ` Maxime Coquelin
@ 2020-06-22  8:41             ` Matan Azrad
  2020-06-22  8:56               ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-22  8:41 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin
> Hi,
> 
> On 6/21/20 8:20 AM, Matan Azrad wrote:
> > Hi Maxime
> >
> > From: Maxime Coquelin:
> >> Hi Matan,
> >>
> >> On 6/19/20 3:11 PM, Matan Azrad wrote:
> >>> Hi Maxime
> >>>
> >>> Thanks for the fast review.
> >>> This is first version, let's review it carefully to be sure it is correct.
> >>> @Xiao Wang, it will be good to hear your idea too.
> >>> We also need to understand the effect on IFC driver/device...
> >>> Just to update that I checked this code with the mlx5 adjustments
> >>> and I
> >> sent in this series.
> >>> It works well with the vDPA example application.
> >>
> >> OK.
> >>
> >>> From: Maxime Coquelin:
> >>>> On 6/18/20 6:28 PM, Matan Azrad wrote:
> >>>>> Some guest drivers may not configure disabled virtio queues.
> >>>>>
> >>>>> In this case, the vhost management never triggers the vDPA device
> >>>>> configuration because it waits to the device to be ready.
> >>>>
> >>>> This is not vDPA-only, even with SW datapath the application's
> >>>> new_device callback never gets called.
> >>>>
> >>> Yes, I wrote it below, I can be more specific here too in the next version.
> >>>
> >>>>> The current ready state means that all the virtio queues should be
> >>>>> configured regardless the enablement status.
> >>>>>
> >>>>> In order to support this case, this patch changes the ready state:
> >>>>> The device is ready when at least 1 queue pair is configured and
> >>>>> enabled.
> >>>>>
> >>>>> So, now, the vDPA driver will be configured when the first queue
> >>>>> pair is configured and enabled.
> >>>>>
> >>>>> Also the queue state operation is change to the next rules:
> >>>>> 	1. queue becomes ready (enabled and fully configured) -
> >>>>> 		set_vring_state(enabled).
> >>>>> 	2. queue becomes not ready - set_vring_state(disabled).
> >>>>> 	3. queue stay ready and VHOST_USER_SET_VRING_ENABLE massage
> >>>> was
> >>>>> 		handled - set_vring_state(enabled).
> >>>>>
> >>>>> The parallel operations for the application are adjusted too.
> >>>>>
> >>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >>>>> ---
> >>>>>  lib/librte_vhost/vhost_user.c | 51
> >>>>> ++++++++++++++++++++++++++++---------------
> >>>>>  1 file changed, 33 insertions(+), 18 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/librte_vhost/vhost_user.c
> >>>>> b/lib/librte_vhost/vhost_user.c index b0849b9..cfd5f27 100644
> >>>>> --- a/lib/librte_vhost/vhost_user.c
> >>>>> +++ b/lib/librte_vhost/vhost_user.c
> >>>>> @@ -1295,7 +1295,7 @@
> >>>>>  {
> >>>>>  	bool rings_ok;
> >>>>>
> >>>>> -	if (!vq)
> >>>>> +	if (!vq || !vq->enabled)
> >>>>>  		return false;
> >>>>>
> >>>>>  	if (vq_is_packed(dev))
> >>>>> @@ -1309,24 +1309,27 @@
> >>>>>  	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;  }
> >>>>>
> >>>>> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
> >>>>> +
> >>>>>  static int
> >>>>>  virtio_is_ready(struct virtio_net *dev)  {
> >>>>>  	struct vhost_virtqueue *vq;
> >>>>>  	uint32_t i;
> >>>>>
> >>>>> -	if (dev->nr_vring == 0)
> >>>>> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
> >>>>>  		return 0;
> >>>>>
> >>>>> -	for (i = 0; i < dev->nr_vring; i++) {
> >>>>> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
> >>>>>  		vq = dev->virtqueue[i];
> >>>>>
> >>>>>  		if (!vq_is_ready(dev, vq))
> >>>>>  			return 0;
> >>>>>  	}
> >>>>>
> >>>>> -	VHOST_LOG_CONFIG(INFO,
> >>>>> -		"virtio is now ready for processing.\n");
> >>>>> +	if (!(dev->flags & VIRTIO_DEV_READY))
> >>>>> +		VHOST_LOG_CONFIG(INFO,
> >>>>> +			"virtio is now ready for processing.\n");
> >>>>>  	return 1;
> >>>>>  }
> >>>>>
> >>>>> @@ -1970,8 +1973,6 @@ static int vhost_user_set_vring_err(struct
> >>>> virtio_net **pdev __rte_unused,
> >>>>>  	struct virtio_net *dev = *pdev;
> >>>>>  	int enable = (int)msg->payload.state.num;
> >>>>>  	int index = (int)msg->payload.state.index;
> >>>>> -	struct rte_vdpa_device *vdpa_dev;
> >>>>> -	int did = -1;
> >>>>>
> >>>>>  	if (validate_msg_fds(msg, 0) != 0)
> >>>>>  		return RTE_VHOST_MSG_RESULT_ERR; @@ -1980,15 +1981,6
> @@ static
> >>>>> int vhost_user_set_vring_err(struct
> >>>> virtio_net **pdev __rte_unused,
> >>>>>  		"set queue enable: %d to qp idx: %d\n",
> >>>>>  		enable, index);
> >>>>>
> >>>>> -	did = dev->vdpa_dev_id;
> >>>>> -	vdpa_dev = rte_vdpa_get_device(did);
> >>>>> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> >>>>> -		vdpa_dev->ops->set_vring_state(dev->vid, index,
> enable);
> >>>>> -
> >>>>> -	if (dev->notify_ops->vring_state_changed)
> >>>>> -		dev->notify_ops->vring_state_changed(dev->vid,
> >>>>> -				index, enable);
> >>>>> -
> >>>>>  	/* On disable, rings have to be stopped being processed. */
> >>>>>  	if (!enable && dev->dequeue_zero_copy)
> >>>>>  		drain_zmbuf_list(dev->virtqueue[index]);
> >>>>> @@ -2622,11 +2614,13 @@ typedef int
> >>>> (*vhost_message_handler_t)(struct virtio_net **pdev,
> >>>>>  	struct virtio_net *dev;
> >>>>>  	struct VhostUserMsg msg;
> >>>>>  	struct rte_vdpa_device *vdpa_dev;
> >>>>> +	bool ready[VHOST_MAX_VRING];
> >>>>>  	int did = -1;
> >>>>>  	int ret;
> >>>>>  	int unlock_required = 0;
> >>>>>  	bool handled;
> >>>>>  	int request;
> >>>>> +	uint32_t i;
> >>>>>
> >>>>>  	dev = get_device(vid);
> >>>>>  	if (dev == NULL)
> >>>>> @@ -2668,6 +2662,10 @@ typedef int
> >> (*vhost_message_handler_t)(struct
> >>>> virtio_net **pdev,
> >>>>>  		VHOST_LOG_CONFIG(DEBUG, "External request %d\n",
> >>>> request);
> >>>>>  	}
> >>>>>
> >>>>> +	/* Save ready status for all the VQs before message handle.
> */
> >>>>> +	for (i = 0; i < VHOST_MAX_VRING; i++)
> >>>>> +		ready[i] = vq_is_ready(dev, dev->virtqueue[i]);
> >>>>> +
> >>>>
> >>>> This big array can be avoided if you save the ready status in the
> >>>> virtqueue once message have been handled.
> >>>
> >>> You mean you prefer to save it in virtqueue structure? Desn't it
> >>> same
> >> memory ?
> >>> In any case I don't think 0x100 is so big 😊
> >>
> >> I mean in the stack.
> >
> > Do you think that 256B is too much for stack?
> >
> >> And one advantage of saving it in the vq structure is for example you
> >> have memory hotplug. The vq is in ready state in the beginning and in
> >> the end, but during the handling the ring host virtual addresses get
> >> changed because of the munmap/mmap and we need to notify the driver
> otherwise it will miss it.
> >
> > Do you mean VHOST_USER_SET_MEM_TABLE call after first configuration?
> >
> > I don't understand what is the issue of saving it in stack here....
> 
> The issue is if you only check ready state only before and after the message
> affecting the ring is handled, it can be ready at both stages, while the rings
> have changed and state change callback should have been called.

But in this version I checked twice, before message handler and after message handler, so it should catch any update.

In any case, as I said, I will move the ready memory to the virtiqueue structure in order to save the check before the message handler.
 
> Please check the example patch I sent on Friday, it takes care of invalidating
> the ring state and call the state change callback.
> 
> > But one advantage of saving it in virtqueue structure is that the message
> handler should not check the ready state before each message.
> >
> > I will change it in next version.
> >
> >>>
> >>>>>  	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
> >>>>>  	if (ret < 0) {
> >>>>>  		VHOST_LOG_CONFIG(ERR,
> >>>>> @@ -2802,6 +2800,25 @@ typedef int
> >> (*vhost_message_handler_t)(struct
> >>>> virtio_net **pdev,
> >>>>>  		return -1;
> >>>>>  	}
> >>>>>
> >>>>> +	did = dev->vdpa_dev_id;
> >>>>> +	vdpa_dev = rte_vdpa_get_device(did);
> >>>>> +	/* Update ready status. */
> >>>>> +	for (i = 0; i < VHOST_MAX_VRING; i++) {
> >>>>> +		bool cur_ready = vq_is_ready(dev, dev-
> >virtqueue[i]);
> >>>>> +
> >>>>> +		if ((cur_ready && request ==
> >>>> VHOST_USER_SET_VRING_ENABLE &&
> >>>>> +				i == msg.payload.state.index) ||
> >>>>
> >>>> Couldn't we remove above condition? Aren't the callbacks already
> >>>> called in the set_vring_enable handler?
> >>>
> >>> As we agreed in the design discussion:
> >>>
> >>> " 3. Same handling of the requests, except that we won't notify the
> >>> vdpa driver and the application of vring state changes in the
> >>> VHOST_USER_SET_VRING_ENABLE handler."
> >>>
> >>> So, I removed it from the set_vring_enable handler.
> >>
> >> My bad, the patch context where it is removed made to think it was in
> >> vhost_user_set_vring_err(), so I missed it.
> >>
> >> Thinking at it again since last time we discussed it, we have to send
> >> the notification from the handler in the case
> >>
> >>> Now, the ready state doesn't depend only in
> >> VHOST_USER_SET_VRING_ENABLE massage.
> >>>
> >>>>> +				cur_ready != ready[i]) {
> >>>>> +			if (vdpa_dev && vdpa_dev->ops-
> >set_vring_state)
> >>>>> +				vdpa_dev->ops-
> >set_vring_state(dev->vid, i,
> >>>>> +
> >>>> 	(int)cur_ready);
> >>>>> +
> >>>>> +			if (dev->notify_ops->vring_state_changed)
> >>>>> +				dev->notify_ops-
> >vring_state_changed(dev-
> >>>>> vid,
> >>>>> +							i,
> (int)cur_ready);
> >>>>> +		}
> >>>>> +	}
> >>>>
> >>>> I think we should move this into a dedicated function, which we
> >>>> would call in every message handler that can modify the ready state.
> >>>>
> >>>> Doing so, we would not have to assume the master sent us disable
> >>>> request for the queue before, ans also would have proper
> >>>> synchronization if the request uses reply-ack feature as it could
> >>>> assume the backend is no more processing the ring once reply-ack is
> >> received.
> >>>
> >>> Makes sense to do it before reply-ack and to create dedicated
> >>> function to
> >> it.
> >>>
> >>> Doen't the vDPA conf should be called before reply-ack too to be
> >>> sure
> >> queues are ready before reply?
> >>
> >> I don't think so, because the backend can start processing the ring after.
> >> What we don't want is that the backend continues to process the rings
> >> when the guest asked to stop doing it.
> >
> > But "doing configuration after reply" may cause that the a guest kicks a
> queue while app \ vDPA driver is being configured.
> > It may lead to some order dependencies in configuration....
> I get your point, we can try to move the configuration before the reply.
> 
> But looking at qemu source code, neither SET_VRING_KICK nor
> SET_VRING_CALL nor SET_VRING_ENABLE request for reply-ack, so it won't
> have any effect.
> 
> > In addition, now, the device ready state becomes on only in the same
> > time that a queue becomes on, so we can do the device ready check (for
> new_device \ dev_conf calls) only when a queue becomes ready in the same
> function.
> 
> If you want, we can do try that too.
> 
> >>> If so, we should move also the device ready code below (maybe also
> >>> vdpa
> >> conf) to this function too.
> >>
> >> So I don't think it is needed.
> >>> But maybe call it directly from this function and not from the
> >>> specific
> >> massage handlers is better, something like the
> >> vhost_user_check_and_alloc_queue_pair function style.
> >>>
> >>> What do you think?
> >
> > Any answer here?
> 
> To move the .new_device and .dev_conf callbacks in the same fonction that
> sends the vring change notifications? Yes, we can do that I think.
> 
> >>>
> >>>>>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
> >>>>>  		dev->flags |= VIRTIO_DEV_READY;
> >>>>>
> >>>>> @@ -2816,8 +2833,6 @@ typedef int
> >> (*vhost_message_handler_t)(struct
> >>>> virtio_net **pdev,
> >>>>>  		}
> >>>>>  	}
> >>>>>
> >>>>> -	did = dev->vdpa_dev_id;
> >>>>> -	vdpa_dev = rte_vdpa_get_device(did);
> >>>>>  	if (vdpa_dev && virtio_is_ready(dev) &&
> >>>>>  			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)
> >>>> &&
> >>>>>  			msg.request.master ==
> >>>> VHOST_USER_SET_VRING_CALL) {
> >>>>
> >>>> Shouldn't check on SET_VRING_CALL above be removed?
> >>>
> >>> Isn't it is a workaround for something?
> >>>
> >>
> >> Normally, we should no more need it, as state change notification
> >> will be sent if callfd came to change.
> >
> > Ok, will remove it.
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22  8:41             ` Matan Azrad
@ 2020-06-22  8:56               ` Maxime Coquelin
  2020-06-22 10:06                 ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22  8:56 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/22/20 10:41 AM, Matan Azrad wrote:
>> The issue is if you only check ready state only before and after the message
>> affecting the ring is handled, it can be ready at both stages, while the rings
>> have changed and state change callback should have been called.
> But in this version I checked twice, before message handler and after message handler, so it should catch any update.

No, this is not enough, we have to check also during some handlers, so
that the ready state is invalidated because sometimes it will be ready
before and after the message handler but with different values.

That's what I did in my example patch:
@@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
**pdev, struct VhostUserMsg *msg,

...

        if (vq->kickfd >= 0)
                close(vq->kickfd);
+
+       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+       vhost_user_update_vring_state(dev, file.index);
+
        vq->kickfd = file.fd;


Without that, the ready check will return ready before and after the
kickfd changed and the driver won't be notified.

> In any case, as I said, I will move the ready memory to the virtiqueue structure in order to save the check before the message handler.
>  
>> Please check the example patch I sent on Friday, it takes care of invalidating
>> the ring state and call the state change callback.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22  8:56               ` Maxime Coquelin
@ 2020-06-22 10:06                 ` Matan Azrad
  2020-06-22 12:32                   ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-22 10:06 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev


Hi Maxime

From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Monday, June 22, 2020 11:56 AM
> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> <xiao.w.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> 
> 
> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >> The issue is if you only check ready state only before and after the
> >> message affecting the ring is handled, it can be ready at both
> >> stages, while the rings have changed and state change callback should
> have been called.
> > But in this version I checked twice, before message handler and after
> message handler, so it should catch any update.
> 
> No, this is not enough, we have to check also during some handlers, so that
> the ready state is invalidated because sometimes it will be ready before and
> after the message handler but with different values.
> 
> That's what I did in my example patch:
> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
> **pdev, struct VhostUserMsg *msg,
> 
> ...
> 
>         if (vq->kickfd >= 0)
>                 close(vq->kickfd);
> +
> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +
> +       vhost_user_update_vring_state(dev, file.index);
> +
>         vq->kickfd = file.fd;
> 
> 
> Without that, the ready check will return ready before and after the kickfd
> changed and the driver won't be notified.

The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE message according to v1.

One of our assumption we agreed on in the design mail is that it doesn't make sense that QEMU will change queue configuration without enabling the queue again.
Because of that we decided to force calling state callback again when QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the queue is already ready.
So when driver/app see state enable->enable, it should take into account that the queue configuration was probably changed.

I think that this assumption is correct according to the QEMU code.

That's why I prefer to collect all the ready checks callbacks (queue state and device new\conf) to one function that will be called after the message handler:
Pseudo:
 vhost_user_update_ready_statuses() {
	switch (msg):
		case enable:
			if(enable is 1)
				force queue state =1.
		case callfd
		case kickfd
				.....
		Check queue and device ready + call callbacks if needed..
		Default
			Return;
}







^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 10:06                 ` Matan Azrad
@ 2020-06-22 12:32                   ` Maxime Coquelin
  2020-06-22 13:43                     ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22 12:32 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/22/20 12:06 PM, Matan Azrad wrote:
> 
> Hi Maxime
> 
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Monday, June 22, 2020 11:56 AM
>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>> <xiao.w.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>
>>
>>
>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>> The issue is if you only check ready state only before and after the
>>>> message affecting the ring is handled, it can be ready at both
>>>> stages, while the rings have changed and state change callback should
>> have been called.
>>> But in this version I checked twice, before message handler and after
>> message handler, so it should catch any update.
>>
>> No, this is not enough, we have to check also during some handlers, so that
>> the ready state is invalidated because sometimes it will be ready before and
>> after the message handler but with different values.
>>
>> That's what I did in my example patch:
>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
>> **pdev, struct VhostUserMsg *msg,
>>
>> ...
>>
>>         if (vq->kickfd >= 0)
>>                 close(vq->kickfd);
>> +
>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>> +
>> +       vhost_user_update_vring_state(dev, file.index);
>> +
>>         vq->kickfd = file.fd;
>>
>>
>> Without that, the ready check will return ready before and after the kickfd
>> changed and the driver won't be notified.
> 
> The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE message according to v1.
> 
> One of our assumption we agreed on in the design mail is that it doesn't make sense that QEMU will change queue configuration without enabling the queue again.
> Because of that we decided to force calling state callback again when QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the queue is already ready.
> So when driver/app see state enable->enable, it should take into account that the queue configuration was probably changed.
> 
> I think that this assumption is correct according to the QEMU code.

Yes, this was our initial assumption.
But now looking into the details of the implementation, I find it is
even cleaner & clearer not to do this assumption.

> That's why I prefer to collect all the ready checks callbacks (queue state and device new\conf) to one function that will be called after the message handler:
> Pseudo:
>  vhost_user_update_ready_statuses() {
> 	switch (msg):
> 		case enable:
> 			if(enable is 1)
> 				force queue state =1.
> 		case callfd
> 		case kickfd
> 				.....
> 		Check queue and device ready + call callbacks if needed..
> 		Default
> 			Return;
> }

I find it more natural to "invalidate" ready state where it is handled
(after vring_invalidate(), before setting new FD for call & kick, ...)



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 12:32                   ` Maxime Coquelin
@ 2020-06-22 13:43                     ` Matan Azrad
  2020-06-22 14:55                       ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-22 13:43 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin:
> Sent: Monday, June 22, 2020 3:33 PM
> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> <xiao.w.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> 
> 
> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >
> > Hi Maxime
> >
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Monday, June 22, 2020 11:56 AM
> >> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >> <xiao.w.wang@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>
> >>
> >>
> >> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>> The issue is if you only check ready state only before and after
> >>>> the message affecting the ring is handled, it can be ready at both
> >>>> stages, while the rings have changed and state change callback
> >>>> should
> >> have been called.
> >>> But in this version I checked twice, before message handler and
> >>> after
> >> message handler, so it should catch any update.
> >>
> >> No, this is not enough, we have to check also during some handlers,
> >> so that the ready state is invalidated because sometimes it will be
> >> ready before and after the message handler but with different values.
> >>
> >> That's what I did in my example patch:
> >> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
> >> **pdev, struct VhostUserMsg *msg,
> >>
> >> ...
> >>
> >>         if (vq->kickfd >= 0)
> >>                 close(vq->kickfd);
> >> +
> >> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >> +
> >> +       vhost_user_update_vring_state(dev, file.index);
> >> +
> >>         vq->kickfd = file.fd;
> >>
> >>
> >> Without that, the ready check will return ready before and after the
> >> kickfd changed and the driver won't be notified.
> >
> > The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE
> message according to v1.
> >
> > One of our assumption we agreed on in the design mail is that it doesn't
> make sense that QEMU will change queue configuration without enabling
> the queue again.
> > Because of that we decided to force calling state callback again when
> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
> queue is already ready.
> > So when driver/app see state enable->enable, it should take into account
> that the queue configuration was probably changed.
> >
> > I think that this assumption is correct according to the QEMU code.
> 
> Yes, this was our initial assumption.
> But now looking into the details of the implementation, I find it is even
> cleaner & clearer not to do this assumption.
> 
> > That's why I prefer to collect all the ready checks callbacks (queue state and
> device new\conf) to one function that will be called after the message
> handler:
> > Pseudo:
> >  vhost_user_update_ready_statuses() {
> > 	switch (msg):
> > 		case enable:
> > 			if(enable is 1)
> > 				force queue state =1.
> > 		case callfd
> > 		case kickfd
> > 				.....
> > 		Check queue and device ready + call callbacks if needed..
> > 		Default
> > 			Return;
> > }
> 
> I find it more natural to "invalidate" ready state where it is handled (after
> vring_invalidate(), before setting new FD for call & kick, ...)

I think that if you go with this direction, if the first queue pair is invalidated, you need to notify app\driver also about device ready change.
Also it will cause 2 notifications to the driver instead of one in case of FD change.

Why not to take this correct assumption and update ready state only in one point in the code instead of doing it in all the configuration handlers around?
IMO, It is correct, less intrusive, simpler, clearer and cleaner.
In addition it saves the style that already used in this function in:
- vhost_user_check_and_alloc_queue_pair
- 	switch (request) {
	case VHOST_USER_SET_FEATURES:
	case VHOST_USER_SET_PROTOCOL_FEATURES:
	case VHOST_USER_SET_OWNER:
	case VHOST_USER_SET_MEM_TABLE:
	case VHOST_USER_SET_LOG_BASE:
	case VHOST_USER_SET_LOG_FD:
	case VHOST_USER_SET_VRING_NUM:
	case VHOST_USER_SET_VRING_ADDR:
	case VHOST_USER_SET_VRING_BASE:
	case VHOST_USER_SET_VRING_KICK:
	case VHOST_USER_SET_VRING_CALL:
	case VHOST_USER_SET_VRING_ERR:
	case VHOST_USER_SET_VRING_ENABLE:
	case VHOST_USER_SEND_RARP:
	case VHOST_USER_NET_SET_MTU:
	case VHOST_USER_SET_SLAVE_REQ_FD:
			vhost_user_lock_all_queue_pairs(dev);

Matan





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 13:43                     ` Matan Azrad
@ 2020-06-22 14:55                       ` Maxime Coquelin
  2020-06-22 15:51                         ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22 14:55 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/22/20 3:43 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> Sent: Monday, June 22, 2020 3:33 PM
>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>> <xiao.w.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>
>>
>>
>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>
>>> Hi Maxime
>>>
>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>> <xiao.w.wang@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>
>>>>
>>>>
>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>> The issue is if you only check ready state only before and after
>>>>>> the message affecting the ring is handled, it can be ready at both
>>>>>> stages, while the rings have changed and state change callback
>>>>>> should
>>>> have been called.
>>>>> But in this version I checked twice, before message handler and
>>>>> after
>>>> message handler, so it should catch any update.
>>>>
>>>> No, this is not enough, we have to check also during some handlers,
>>>> so that the ready state is invalidated because sometimes it will be
>>>> ready before and after the message handler but with different values.
>>>>
>>>> That's what I did in my example patch:
>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct virtio_net
>>>> **pdev, struct VhostUserMsg *msg,
>>>>
>>>> ...
>>>>
>>>>         if (vq->kickfd >= 0)
>>>>                 close(vq->kickfd);
>>>> +
>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>> +
>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>> +
>>>>         vq->kickfd = file.fd;
>>>>
>>>>
>>>> Without that, the ready check will return ready before and after the
>>>> kickfd changed and the driver won't be notified.
>>>
>>> The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE
>> message according to v1.
>>>
>>> One of our assumption we agreed on in the design mail is that it doesn't
>> make sense that QEMU will change queue configuration without enabling
>> the queue again.
>>> Because of that we decided to force calling state callback again when
>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
>> queue is already ready.
>>> So when driver/app see state enable->enable, it should take into account
>> that the queue configuration was probably changed.
>>>
>>> I think that this assumption is correct according to the QEMU code.
>>
>> Yes, this was our initial assumption.
>> But now looking into the details of the implementation, I find it is even
>> cleaner & clearer not to do this assumption.
>>
>>> That's why I prefer to collect all the ready checks callbacks (queue state and
>> device new\conf) to one function that will be called after the message
>> handler:
>>> Pseudo:
>>>  vhost_user_update_ready_statuses() {
>>> 	switch (msg):
>>> 		case enable:
>>> 			if(enable is 1)
>>> 				force queue state =1.
>>> 		case callfd
>>> 		case kickfd
>>> 				.....
>>> 		Check queue and device ready + call callbacks if needed..
>>> 		Default
>>> 			Return;
>>> }
>>
>> I find it more natural to "invalidate" ready state where it is handled (after
>> vring_invalidate(), before setting new FD for call & kick, ...)
> 
> I think that if you go with this direction, if the first queue pair is invalidated, you need to notify app\driver also about device ready change.
> Also it will cause 2 notifications to the driver instead of one in case of FD change.

You'll always end-up with two notifications, either Qemu has sent the
disable and so you'll have one notification for the disable and one for
the enable, or it didn't sent the disable and it will happen at
old value invalidation time and after new value is taken into account.

> Why not to take this correct assumption and update ready state only in one point in the code instead of doing it in all the configuration handlers around?
> IMO, It is correct, less intrusive, simpler, clearer and cleaner.

I just looked closer at the Vhost-user spec, and I'm no more so sure
this is a correct assumption:

"While processing the rings (whether they are enabled or not), client
must support changing some configuration aspects on the fly."

> In addition it saves the style that already used in this function in:
> - vhost_user_check_and_alloc_queue_pair
> - 	switch (request) {
> 	case VHOST_USER_SET_FEATURES:
> 	case VHOST_USER_SET_PROTOCOL_FEATURES:
> 	case VHOST_USER_SET_OWNER:
> 	case VHOST_USER_SET_MEM_TABLE:
> 	case VHOST_USER_SET_LOG_BASE:
> 	case VHOST_USER_SET_LOG_FD:
> 	case VHOST_USER_SET_VRING_NUM:
> 	case VHOST_USER_SET_VRING_ADDR:
> 	case VHOST_USER_SET_VRING_BASE:
> 	case VHOST_USER_SET_VRING_KICK:
> 	case VHOST_USER_SET_VRING_CALL:
> 	case VHOST_USER_SET_VRING_ERR:
> 	case VHOST_USER_SET_VRING_ENABLE:
> 	case VHOST_USER_SEND_RARP:
> 	case VHOST_USER_NET_SET_MTU:
> 	case VHOST_USER_SET_SLAVE_REQ_FD:
> 			vhost_user_lock_all_queue_pairs(dev);
> 
> Matan
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 14:55                       ` Maxime Coquelin
@ 2020-06-22 15:51                         ` Matan Azrad
  2020-06-22 16:47                           ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-22 15:51 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin:
> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> Sent: Monday, June 22, 2020 3:33 PM
> >> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >> <xiao.w.wang@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>
> >>
> >>
> >> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>
> >>> Hi Maxime
> >>>
> >>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>> <xiao.w.wang@intel.com>
> >>>> Cc: dev@dpdk.org
> >>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>>>
> >>>>
> >>>>
> >>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>> The issue is if you only check ready state only before and after
> >>>>>> the message affecting the ring is handled, it can be ready at
> >>>>>> both stages, while the rings have changed and state change
> >>>>>> callback should
> >>>> have been called.
> >>>>> But in this version I checked twice, before message handler and
> >>>>> after
> >>>> message handler, so it should catch any update.
> >>>>
> >>>> No, this is not enough, we have to check also during some handlers,
> >>>> so that the ready state is invalidated because sometimes it will be
> >>>> ready before and after the message handler but with different values.
> >>>>
> >>>> That's what I did in my example patch:
> >>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> virtio_net
> >>>> **pdev, struct VhostUserMsg *msg,
> >>>>
> >>>> ...
> >>>>
> >>>>         if (vq->kickfd >= 0)
> >>>>                 close(vq->kickfd);
> >>>> +
> >>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>> +
> >>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>> +
> >>>>         vq->kickfd = file.fd;
> >>>>
> >>>>
> >>>> Without that, the ready check will return ready before and after
> >>>> the kickfd changed and the driver won't be notified.
> >>>
> >>> The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE
> >> message according to v1.
> >>>
> >>> One of our assumption we agreed on in the design mail is that it
> >>> doesn't
> >> make sense that QEMU will change queue configuration without enabling
> >> the queue again.
> >>> Because of that we decided to force calling state callback again
> >>> when
> >> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
> queue is
> >> already ready.
> >>> So when driver/app see state enable->enable, it should take into
> >>> account
> >> that the queue configuration was probably changed.
> >>>
> >>> I think that this assumption is correct according to the QEMU code.
> >>
> >> Yes, this was our initial assumption.
> >> But now looking into the details of the implementation, I find it is
> >> even cleaner & clearer not to do this assumption.
> >>
> >>> That's why I prefer to collect all the ready checks callbacks (queue
> >>> state and
> >> device new\conf) to one function that will be called after the
> >> message
> >> handler:
> >>> Pseudo:
> >>>  vhost_user_update_ready_statuses() {
> >>> 	switch (msg):
> >>> 		case enable:
> >>> 			if(enable is 1)
> >>> 				force queue state =1.
> >>> 		case callfd
> >>> 		case kickfd
> >>> 				.....
> >>> 		Check queue and device ready + call callbacks if needed..
> >>> 		Default
> >>> 			Return;
> >>> }
> >>
> >> I find it more natural to "invalidate" ready state where it is
> >> handled (after vring_invalidate(), before setting new FD for call &
> >> kick, ...)
> >
> > I think that if you go with this direction, if the first queue pair is invalidated,
> you need to notify app\driver also about device ready change.
> > Also it will cause 2 notifications to the driver instead of one in case of FD
> change.
> 
> You'll always end-up with two notifications, either Qemu has sent the disable
> and so you'll have one notification for the disable and one for the enable, or
> it didn't sent the disable and it will happen at old value invalidation time and
> after new value is taken into account.
>

I don't see it in current QEMU behavior.
When working MQ I see that some virtqs get configuration message while they are in enabled state.
Then, enable message is sent again later.

 
> > Why not to take this correct assumption and update ready state only in one
> point in the code instead of doing it in all the configuration handlers around?
> > IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> 
> I just looked closer at the Vhost-user spec, and I'm no more so sure this is a
> correct assumption:
> 
> "While processing the rings (whether they are enabled or not), client must
> support changing some configuration aspects on the fly."

Ok, this doesn't explain how configuration is changed on the fly.
As I mentioned, QEMU sends enable message always after configuration message.


> > In addition it saves the style that already used in this function in:
> > - vhost_user_check_and_alloc_queue_pair
> > - 	switch (request) {
> > 	case VHOST_USER_SET_FEATURES:
> > 	case VHOST_USER_SET_PROTOCOL_FEATURES:
> > 	case VHOST_USER_SET_OWNER:
> > 	case VHOST_USER_SET_MEM_TABLE:
> > 	case VHOST_USER_SET_LOG_BASE:
> > 	case VHOST_USER_SET_LOG_FD:
> > 	case VHOST_USER_SET_VRING_NUM:
> > 	case VHOST_USER_SET_VRING_ADDR:
> > 	case VHOST_USER_SET_VRING_BASE:
> > 	case VHOST_USER_SET_VRING_KICK:
> > 	case VHOST_USER_SET_VRING_CALL:
> > 	case VHOST_USER_SET_VRING_ERR:
> > 	case VHOST_USER_SET_VRING_ENABLE:
> > 	case VHOST_USER_SEND_RARP:
> > 	case VHOST_USER_NET_SET_MTU:
> > 	case VHOST_USER_SET_SLAVE_REQ_FD:
> > 			vhost_user_lock_all_queue_pairs(dev);
> >
> > Matan
> >
> >
> >
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 15:51                         ` Matan Azrad
@ 2020-06-22 16:47                           ` Maxime Coquelin
  2020-06-23  9:02                             ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-22 16:47 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/22/20 5:51 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>> <xiao.w.wang@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>
>>>>
>>>>
>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>
>>>>> Hi Maxime
>>>>>
>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>> <xiao.w.wang@intel.com>
>>>>>> Cc: dev@dpdk.org
>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>> The issue is if you only check ready state only before and after
>>>>>>>> the message affecting the ring is handled, it can be ready at
>>>>>>>> both stages, while the rings have changed and state change
>>>>>>>> callback should
>>>>>> have been called.
>>>>>>> But in this version I checked twice, before message handler and
>>>>>>> after
>>>>>> message handler, so it should catch any update.
>>>>>>
>>>>>> No, this is not enough, we have to check also during some handlers,
>>>>>> so that the ready state is invalidated because sometimes it will be
>>>>>> ready before and after the message handler but with different values.
>>>>>>
>>>>>> That's what I did in my example patch:
>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>> virtio_net
>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>         if (vq->kickfd >= 0)
>>>>>>                 close(vq->kickfd);
>>>>>> +
>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>> +
>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>> +
>>>>>>         vq->kickfd = file.fd;
>>>>>>
>>>>>>
>>>>>> Without that, the ready check will return ready before and after
>>>>>> the kickfd changed and the driver won't be notified.
>>>>>
>>>>> The driver will be notified in the next VHOST_USER_SET_VRING_ENABLE
>>>> message according to v1.
>>>>>
>>>>> One of our assumption we agreed on in the design mail is that it
>>>>> doesn't
>>>> make sense that QEMU will change queue configuration without enabling
>>>> the queue again.
>>>>> Because of that we decided to force calling state callback again
>>>>> when
>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
>> queue is
>>>> already ready.
>>>>> So when driver/app see state enable->enable, it should take into
>>>>> account
>>>> that the queue configuration was probably changed.
>>>>>
>>>>> I think that this assumption is correct according to the QEMU code.
>>>>
>>>> Yes, this was our initial assumption.
>>>> But now looking into the details of the implementation, I find it is
>>>> even cleaner & clearer not to do this assumption.
>>>>
>>>>> That's why I prefer to collect all the ready checks callbacks (queue
>>>>> state and
>>>> device new\conf) to one function that will be called after the
>>>> message
>>>> handler:
>>>>> Pseudo:
>>>>>  vhost_user_update_ready_statuses() {
>>>>> 	switch (msg):
>>>>> 		case enable:
>>>>> 			if(enable is 1)
>>>>> 				force queue state =1.
>>>>> 		case callfd
>>>>> 		case kickfd
>>>>> 				.....
>>>>> 		Check queue and device ready + call callbacks if needed..
>>>>> 		Default
>>>>> 			Return;
>>>>> }
>>>>
>>>> I find it more natural to "invalidate" ready state where it is
>>>> handled (after vring_invalidate(), before setting new FD for call &
>>>> kick, ...)
>>>
>>> I think that if you go with this direction, if the first queue pair is invalidated,
>> you need to notify app\driver also about device ready change.
>>> Also it will cause 2 notifications to the driver instead of one in case of FD
>> change.
>>
>> You'll always end-up with two notifications, either Qemu has sent the disable
>> and so you'll have one notification for the disable and one for the enable, or
>> it didn't sent the disable and it will happen at old value invalidation time and
>> after new value is taken into account.
>>
> 
> I don't see it in current QEMU behavior.
> When working MQ I see that some virtqs get configuration message while they are in enabled state.
> Then, enable message is sent again later.

I guess you mean the first queue pair? And it would not be in ready
state as it would be the initial configuration of the queue?

>  
>>> Why not to take this correct assumption and update ready state only in one
>> point in the code instead of doing it in all the configuration handlers around?
>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>
>> I just looked closer at the Vhost-user spec, and I'm no more so sure this is a
>> correct assumption:
>>
>> "While processing the rings (whether they are enabled or not), client must
>> support changing some configuration aspects on the fly."
> 
> Ok, this doesn't explain how configuration is changed on the fly.

I agree it lacks a bit of clarity.

> As I mentioned, QEMU sends enable message always after configuration message.

Yes, but we should not do assumptions on current Qemu version when
possible. Better to be safe and follow the specification, it will be
more robust. There is also the Virtio-user PMD to take into account for
example.

Thanks,
Maxime

> 
>>> In addition it saves the style that already used in this function in:
>>> - vhost_user_check_and_alloc_queue_pair
>>> - 	switch (request) {
>>> 	case VHOST_USER_SET_FEATURES:
>>> 	case VHOST_USER_SET_PROTOCOL_FEATURES:
>>> 	case VHOST_USER_SET_OWNER:
>>> 	case VHOST_USER_SET_MEM_TABLE:
>>> 	case VHOST_USER_SET_LOG_BASE:
>>> 	case VHOST_USER_SET_LOG_FD:
>>> 	case VHOST_USER_SET_VRING_NUM:
>>> 	case VHOST_USER_SET_VRING_ADDR:
>>> 	case VHOST_USER_SET_VRING_BASE:
>>> 	case VHOST_USER_SET_VRING_KICK:
>>> 	case VHOST_USER_SET_VRING_CALL:
>>> 	case VHOST_USER_SET_VRING_ERR:
>>> 	case VHOST_USER_SET_VRING_ENABLE:
>>> 	case VHOST_USER_SEND_RARP:
>>> 	case VHOST_USER_NET_SET_MTU:
>>> 	case VHOST_USER_SET_SLAVE_REQ_FD:
>>> 			vhost_user_lock_all_queue_pairs(dev);
>>>
>>> Matan
>>>
>>>
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-22 16:47                           ` Maxime Coquelin
@ 2020-06-23  9:02                             ` Matan Azrad
  2020-06-23  9:19                               ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-23  9:02 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin:
> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>> <xiao.w.wang@intel.com>
> >>>> Cc: dev@dpdk.org
> >>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>>>
> >>>>
> >>>>
> >>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>
> >>>>> Hi Maxime
> >>>>>
> >>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>> <xiao.w.wang@intel.com>
> >>>>>> Cc: dev@dpdk.org
> >>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>> definition
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>> The issue is if you only check ready state only before and
> >>>>>>>> after the message affecting the ring is handled, it can be
> >>>>>>>> ready at both stages, while the rings have changed and state
> >>>>>>>> change callback should
> >>>>>> have been called.
> >>>>>>> But in this version I checked twice, before message handler and
> >>>>>>> after
> >>>>>> message handler, so it should catch any update.
> >>>>>>
> >>>>>> No, this is not enough, we have to check also during some
> >>>>>> handlers, so that the ready state is invalidated because
> >>>>>> sometimes it will be ready before and after the message handler but
> with different values.
> >>>>>>
> >>>>>> That's what I did in my example patch:
> >>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> >> virtio_net
> >>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>         if (vq->kickfd >= 0)
> >>>>>>                 close(vq->kickfd);
> >>>>>> +
> >>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>> +
> >>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>> +
> >>>>>>         vq->kickfd = file.fd;
> >>>>>>
> >>>>>>
> >>>>>> Without that, the ready check will return ready before and after
> >>>>>> the kickfd changed and the driver won't be notified.
> >>>>>
> >>>>> The driver will be notified in the next
> >>>>> VHOST_USER_SET_VRING_ENABLE
> >>>> message according to v1.
> >>>>>
> >>>>> One of our assumption we agreed on in the design mail is that it
> >>>>> doesn't
> >>>> make sense that QEMU will change queue configuration without
> >>>> enabling the queue again.
> >>>>> Because of that we decided to force calling state callback again
> >>>>> when
> >>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
> >> queue is
> >>>> already ready.
> >>>>> So when driver/app see state enable->enable, it should take into
> >>>>> account
> >>>> that the queue configuration was probably changed.
> >>>>>
> >>>>> I think that this assumption is correct according to the QEMU code.
> >>>>
> >>>> Yes, this was our initial assumption.
> >>>> But now looking into the details of the implementation, I find it
> >>>> is even cleaner & clearer not to do this assumption.
> >>>>
> >>>>> That's why I prefer to collect all the ready checks callbacks
> >>>>> (queue state and
> >>>> device new\conf) to one function that will be called after the
> >>>> message
> >>>> handler:
> >>>>> Pseudo:
> >>>>>  vhost_user_update_ready_statuses() {
> >>>>> 	switch (msg):
> >>>>> 		case enable:
> >>>>> 			if(enable is 1)
> >>>>> 				force queue state =1.
> >>>>> 		case callfd
> >>>>> 		case kickfd
> >>>>> 				.....
> >>>>> 		Check queue and device ready + call callbacks if needed..
> >>>>> 		Default
> >>>>> 			Return;
> >>>>> }
> >>>>
> >>>> I find it more natural to "invalidate" ready state where it is
> >>>> handled (after vring_invalidate(), before setting new FD for call &
> >>>> kick, ...)
> >>>
> >>> I think that if you go with this direction, if the first queue pair
> >>> is invalidated,
> >> you need to notify app\driver also about device ready change.
> >>> Also it will cause 2 notifications to the driver instead of one in
> >>> case of FD
> >> change.
> >>
> >> You'll always end-up with two notifications, either Qemu has sent the
> >> disable and so you'll have one notification for the disable and one
> >> for the enable, or it didn't sent the disable and it will happen at
> >> old value invalidation time and after new value is taken into account.
> >>
> >
> > I don't see it in current QEMU behavior.
> > When working MQ I see that some virtqs get configuration message while
> they are in enabled state.
> > Then, enable message is sent again later.
> 
> I guess you mean the first queue pair? And it would not be in ready state as it
> would be the initial configuration of the queue?

Even after initialization when queue is ready.

> >
> >>> Why not to take this correct assumption and update ready state only
> >>> in one
> >> point in the code instead of doing it in all the configuration handlers
> around?
> >>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>
> >> I just looked closer at the Vhost-user spec, and I'm no more so sure
> >> this is a correct assumption:
> >>
> >> "While processing the rings (whether they are enabled or not), client
> >> must support changing some configuration aspects on the fly."
> >
> > Ok, this doesn't explain how configuration is changed on the fly.
> 
> I agree it lacks a bit of clarity.
> 
> > As I mentioned, QEMU sends enable message always after configuration
> message.
> 
> Yes, but we should not do assumptions on current Qemu version when
> possible. Better to be safe and follow the specification, it will be more robust.
> There is also the Virtio-user PMD to take into account for example.

I understand your point here but do you really want to be ready for any configuration update in run time?
What does it mean? How datatpath should handle configuration from control thread in run time while traffic is on?
For example, changing queue size \ addresses must stop traffic before...
Also changing FDs is very sensitive.

It doesn't make sense to me.

Also, according to "on the fly" direction we should not disable the queue unless enable message is coming to disable it.

In addition:
Do you really want to toggle vDPA drivers\app for any configuration message? It may cause queue recreation for each one (at least for mlx5).


> Thanks,
> Maxime
> 
> >
> >>> In addition it saves the style that already used in this function in:
> >>> - vhost_user_check_and_alloc_queue_pair
> >>> - 	switch (request) {
> >>> 	case VHOST_USER_SET_FEATURES:
> >>> 	case VHOST_USER_SET_PROTOCOL_FEATURES:
> >>> 	case VHOST_USER_SET_OWNER:
> >>> 	case VHOST_USER_SET_MEM_TABLE:
> >>> 	case VHOST_USER_SET_LOG_BASE:
> >>> 	case VHOST_USER_SET_LOG_FD:
> >>> 	case VHOST_USER_SET_VRING_NUM:
> >>> 	case VHOST_USER_SET_VRING_ADDR:
> >>> 	case VHOST_USER_SET_VRING_BASE:
> >>> 	case VHOST_USER_SET_VRING_KICK:
> >>> 	case VHOST_USER_SET_VRING_CALL:
> >>> 	case VHOST_USER_SET_VRING_ERR:
> >>> 	case VHOST_USER_SET_VRING_ENABLE:
> >>> 	case VHOST_USER_SEND_RARP:
> >>> 	case VHOST_USER_NET_SET_MTU:
> >>> 	case VHOST_USER_SET_SLAVE_REQ_FD:
> >>> 			vhost_user_lock_all_queue_pairs(dev);
> >>>
> >>> Matan
> >>>
> >>>
> >>>
> >>>
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23  9:02                             ` Matan Azrad
@ 2020-06-23  9:19                               ` Maxime Coquelin
  2020-06-23 11:53                                 ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-23  9:19 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/23/20 11:02 AM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin:
>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>> <xiao.w.wang@intel.com>
>>>>>> Cc: dev@dpdk.org
>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>
>>>>>>> Hi Maxime
>>>>>>>
>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>> Cc: dev@dpdk.org
>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>> definition
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>> change callback should
>>>>>>>> have been called.
>>>>>>>>> But in this version I checked twice, before message handler and
>>>>>>>>> after
>>>>>>>> message handler, so it should catch any update.
>>>>>>>>
>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>> sometimes it will be ready before and after the message handler but
>> with different values.
>>>>>>>>
>>>>>>>> That's what I did in my example patch:
>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>> virtio_net
>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>                 close(vq->kickfd);
>>>>>>>> +
>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>> +
>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>> +
>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>
>>>>>>>>
>>>>>>>> Without that, the ready check will return ready before and after
>>>>>>>> the kickfd changed and the driver won't be notified.
>>>>>>>
>>>>>>> The driver will be notified in the next
>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>> message according to v1.
>>>>>>>
>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>> doesn't
>>>>>> make sense that QEMU will change queue configuration without
>>>>>> enabling the queue again.
>>>>>>> Because of that we decided to force calling state callback again
>>>>>>> when
>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if the
>>>> queue is
>>>>>> already ready.
>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>> account
>>>>>> that the queue configuration was probably changed.
>>>>>>>
>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>
>>>>>> Yes, this was our initial assumption.
>>>>>> But now looking into the details of the implementation, I find it
>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>
>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>> (queue state and
>>>>>> device new\conf) to one function that will be called after the
>>>>>> message
>>>>>> handler:
>>>>>>> Pseudo:
>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>> 	switch (msg):
>>>>>>> 		case enable:
>>>>>>> 			if(enable is 1)
>>>>>>> 				force queue state =1.
>>>>>>> 		case callfd
>>>>>>> 		case kickfd
>>>>>>> 				.....
>>>>>>> 		Check queue and device ready + call callbacks if needed..
>>>>>>> 		Default
>>>>>>> 			Return;
>>>>>>> }
>>>>>>
>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>> handled (after vring_invalidate(), before setting new FD for call &
>>>>>> kick, ...)
>>>>>
>>>>> I think that if you go with this direction, if the first queue pair
>>>>> is invalidated,
>>>> you need to notify app\driver also about device ready change.
>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>> case of FD
>>>> change.
>>>>
>>>> You'll always end-up with two notifications, either Qemu has sent the
>>>> disable and so you'll have one notification for the disable and one
>>>> for the enable, or it didn't sent the disable and it will happen at
>>>> old value invalidation time and after new value is taken into account.
>>>>
>>>
>>> I don't see it in current QEMU behavior.
>>> When working MQ I see that some virtqs get configuration message while
>> they are in enabled state.
>>> Then, enable message is sent again later.
>>
>> I guess you mean the first queue pair? And it would not be in ready state as it
>> would be the initial configuration of the queue?
> 
> Even after initialization when queue is ready.
> 
>>>
>>>>> Why not to take this correct assumption and update ready state only
>>>>> in one
>>>> point in the code instead of doing it in all the configuration handlers
>> around?
>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>
>>>> I just looked closer at the Vhost-user spec, and I'm no more so sure
>>>> this is a correct assumption:
>>>>
>>>> "While processing the rings (whether they are enabled or not), client
>>>> must support changing some configuration aspects on the fly."
>>>
>>> Ok, this doesn't explain how configuration is changed on the fly.
>>
>> I agree it lacks a bit of clarity.
>>
>>> As I mentioned, QEMU sends enable message always after configuration
>> message.
>>
>> Yes, but we should not do assumptions on current Qemu version when
>> possible. Better to be safe and follow the specification, it will be more robust.
>> There is also the Virtio-user PMD to take into account for example.
> 
> I understand your point here but do you really want to be ready for any configuration update in run time?
> What does it mean? How datatpath should handle configuration from control thread in run time while traffic is on?
> For example, changing queue size \ addresses must stop traffic before...
> Also changing FDs is very sensitive.
> 
> It doesn't make sense to me.
> 
> Also, according to "on the fly" direction we should not disable the queue unless enable message is coming to disable it.
> 
> In addition:
> Do you really want to toggle vDPA drivers\app for any configuration message? It may cause queue recreation for each one (at least for mlx5).

I want to have something robust and maintainable.

These messages arriving after a queue have been configured once are rare
events, but this is usually the kind of things that cause maintenance
burden.

If you look at my example patch, you will understand that with my
proposal, there won't be any more state change notification than with
your proposal when Qemu or any other Vhost-user master send a disable
request before sending the request that impact the queue state.

It just adds more robustness if this unlikely event happens, by
invalidating the ring state to not ready before doing the actual ring
configuration change. So that this config change is not missed by the
vDPA driver or the application.

Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23  9:19                               ` Maxime Coquelin
@ 2020-06-23 11:53                                 ` Matan Azrad
  2020-06-23 13:55                                   ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-23 11:53 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



From: Maxime Coquelin:
> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>> <xiao.w.wang@intel.com>
> >>>>>> Cc: dev@dpdk.org
> >>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>> definition
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>> Hi Maxime
> >>>>>>>
> >>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>> Cc: dev@dpdk.org
> >>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>> definition
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>> The issue is if you only check ready state only before and
> >>>>>>>>>> after the message affecting the ring is handled, it can be
> >>>>>>>>>> ready at both stages, while the rings have changed and state
> >>>>>>>>>> change callback should
> >>>>>>>> have been called.
> >>>>>>>>> But in this version I checked twice, before message handler
> >>>>>>>>> and after
> >>>>>>>> message handler, so it should catch any update.
> >>>>>>>>
> >>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>> sometimes it will be ready before and after the message handler
> >>>>>>>> but
> >> with different values.
> >>>>>>>>
> >>>>>>>> That's what I did in my example patch:
> >>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> >>>> virtio_net
> >>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>                 close(vq->kickfd);
> >>>>>>>> +
> >>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>> +
> >>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>> +
> >>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>
> >>>>>>> The driver will be notified in the next
> >>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>> message according to v1.
> >>>>>>>
> >>>>>>> One of our assumption we agreed on in the design mail is that it
> >>>>>>> doesn't
> >>>>>> make sense that QEMU will change queue configuration without
> >>>>>> enabling the queue again.
> >>>>>>> Because of that we decided to force calling state callback again
> >>>>>>> when
> >>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
> the
> >>>> queue is
> >>>>>> already ready.
> >>>>>>> So when driver/app see state enable->enable, it should take into
> >>>>>>> account
> >>>>>> that the queue configuration was probably changed.
> >>>>>>>
> >>>>>>> I think that this assumption is correct according to the QEMU code.
> >>>>>>
> >>>>>> Yes, this was our initial assumption.
> >>>>>> But now looking into the details of the implementation, I find it
> >>>>>> is even cleaner & clearer not to do this assumption.
> >>>>>>
> >>>>>>> That's why I prefer to collect all the ready checks callbacks
> >>>>>>> (queue state and
> >>>>>> device new\conf) to one function that will be called after the
> >>>>>> message
> >>>>>> handler:
> >>>>>>> Pseudo:
> >>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>> 	switch (msg):
> >>>>>>> 		case enable:
> >>>>>>> 			if(enable is 1)
> >>>>>>> 				force queue state =1.
> >>>>>>> 		case callfd
> >>>>>>> 		case kickfd
> >>>>>>> 				.....
> >>>>>>> 		Check queue and device ready + call callbacks if
> needed..
> >>>>>>> 		Default
> >>>>>>> 			Return;
> >>>>>>> }
> >>>>>>
> >>>>>> I find it more natural to "invalidate" ready state where it is
> >>>>>> handled (after vring_invalidate(), before setting new FD for call
> >>>>>> & kick, ...)
> >>>>>
> >>>>> I think that if you go with this direction, if the first queue
> >>>>> pair is invalidated,
> >>>> you need to notify app\driver also about device ready change.
> >>>>> Also it will cause 2 notifications to the driver instead of one in
> >>>>> case of FD
> >>>> change.
> >>>>
> >>>> You'll always end-up with two notifications, either Qemu has sent
> >>>> the disable and so you'll have one notification for the disable and
> >>>> one for the enable, or it didn't sent the disable and it will
> >>>> happen at old value invalidation time and after new value is taken into
> account.
> >>>>
> >>>
> >>> I don't see it in current QEMU behavior.
> >>> When working MQ I see that some virtqs get configuration message
> >>> while
> >> they are in enabled state.
> >>> Then, enable message is sent again later.
> >>
> >> I guess you mean the first queue pair? And it would not be in ready
> >> state as it would be the initial configuration of the queue?
> >
> > Even after initialization when queue is ready.
> >
> >>>
> >>>>> Why not to take this correct assumption and update ready state
> >>>>> only in one
> >>>> point in the code instead of doing it in all the configuration
> >>>> handlers
> >> around?
> >>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>
> >>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>> sure this is a correct assumption:
> >>>>
> >>>> "While processing the rings (whether they are enabled or not),
> >>>> client must support changing some configuration aspects on the fly."
> >>>
> >>> Ok, this doesn't explain how configuration is changed on the fly.
> >>
> >> I agree it lacks a bit of clarity.
> >>
> >>> As I mentioned, QEMU sends enable message always after configuration
> >> message.
> >>
> >> Yes, but we should not do assumptions on current Qemu version when
> >> possible. Better to be safe and follow the specification, it will be more
> robust.
> >> There is also the Virtio-user PMD to take into account for example.
> >
> > I understand your point here but do you really want to be ready for any
> configuration update in run time?
> > What does it mean? How datatpath should handle configuration from
> control thread in run time while traffic is on?
> > For example, changing queue size \ addresses must stop traffic before...
> > Also changing FDs is very sensitive.
> >
> > It doesn't make sense to me.
> >
> > Also, according to "on the fly" direction we should not disable the queue
> unless enable message is coming to disable it.

No response, so looks like you agree that it doesn't make sense.

> > In addition:
> > Do you really want to toggle vDPA drivers\app for any configuration
> message? It may cause queue recreation for each one (at least for mlx5).
> 
> I want to have something robust and maintainable.

Me too.

> These messages arriving after a queue have been configured once are rare
> events, but this is usually the kind of things that cause maintenance burden.

In case of guest poll mode (testpmd virtio) we all the time get callfd twice.

> If you look at my example patch, you will understand that with my proposal,
> there won't be any more state change notification than with your proposal
> when Qemu or any other Vhost-user master send a disable request before
> sending the request that impact the queue state.

we didn't talk about disable time - this one is very simple.

Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.

> It just adds more robustness if this unlikely event happens, by invalidating
> the ring state to not ready before doing the actual ring configuration change.
> So that this config change is not missed by the vDPA driver or the application.

One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).
It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.


I will not insist anymore on waiting for enable for notifying although I not fan with it.

So, I suggest to create 1 notification function to be called after message handler and before reply.
This function is the only one which notify ready states in the next options:

1. virtq ready state is changed in the queue.
2. virtq ready state stays on after configuration message handler.
3. device state will be enabled when the first queue pair is ready.


Matan



> Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23 11:53                                 ` Matan Azrad
@ 2020-06-23 13:55                                   ` Maxime Coquelin
  2020-06-23 14:33                                     ` Maxime Coquelin
  2020-06-23 14:52                                     ` Matan Azrad
  0 siblings, 2 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-23 13:55 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev

Hi Matan,

On 6/23/20 1:53 PM, Matan Azrad wrote:
> 
> 
> From: Maxime Coquelin:
>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin:
>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin:
>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>> Cc: dev@dpdk.org
>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>> definition
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>
>>>>>>>>> Hi Maxime
>>>>>>>>>
>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>> definition
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>>>> change callback should
>>>>>>>>>> have been called.
>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>> and after
>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>
>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>> sometimes it will be ready before and after the message handler
>>>>>>>>>> but
>>>> with different values.
>>>>>>>>>>
>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>> virtio_net
>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>> +
>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>> +
>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>> +
>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>
>>>>>>>>> The driver will be notified in the next
>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>> message according to v1.
>>>>>>>>>
>>>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>>>> doesn't
>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>> enabling the queue again.
>>>>>>>>> Because of that we decided to force calling state callback again
>>>>>>>>> when
>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
>> the
>>>>>> queue is
>>>>>>>> already ready.
>>>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>>>> account
>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>
>>>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>>>
>>>>>>>> Yes, this was our initial assumption.
>>>>>>>> But now looking into the details of the implementation, I find it
>>>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>>>
>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>> (queue state and
>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>> message
>>>>>>>> handler:
>>>>>>>>> Pseudo:
>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>> 	switch (msg):
>>>>>>>>> 		case enable:
>>>>>>>>> 			if(enable is 1)
>>>>>>>>> 				force queue state =1.
>>>>>>>>> 		case callfd
>>>>>>>>> 		case kickfd
>>>>>>>>> 				.....
>>>>>>>>> 		Check queue and device ready + call callbacks if
>> needed..
>>>>>>>>> 		Default
>>>>>>>>> 			Return;
>>>>>>>>> }
>>>>>>>>
>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>> handled (after vring_invalidate(), before setting new FD for call
>>>>>>>> & kick, ...)
>>>>>>>
>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>> pair is invalidated,
>>>>>> you need to notify app\driver also about device ready change.
>>>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>>>> case of FD
>>>>>> change.
>>>>>>
>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>> the disable and so you'll have one notification for the disable and
>>>>>> one for the enable, or it didn't sent the disable and it will
>>>>>> happen at old value invalidation time and after new value is taken into
>> account.
>>>>>>
>>>>>
>>>>> I don't see it in current QEMU behavior.
>>>>> When working MQ I see that some virtqs get configuration message
>>>>> while
>>>> they are in enabled state.
>>>>> Then, enable message is sent again later.
>>>>
>>>> I guess you mean the first queue pair? And it would not be in ready
>>>> state as it would be the initial configuration of the queue?
>>>
>>> Even after initialization when queue is ready.
>>>
>>>>>
>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>> only in one
>>>>>> point in the code instead of doing it in all the configuration
>>>>>> handlers
>>>> around?
>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>
>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>> sure this is a correct assumption:
>>>>>>
>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>> client must support changing some configuration aspects on the fly."
>>>>>
>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>
>>>> I agree it lacks a bit of clarity.
>>>>
>>>>> As I mentioned, QEMU sends enable message always after configuration
>>>> message.
>>>>
>>>> Yes, but we should not do assumptions on current Qemu version when
>>>> possible. Better to be safe and follow the specification, it will be more
>> robust.
>>>> There is also the Virtio-user PMD to take into account for example.
>>>
>>> I understand your point here but do you really want to be ready for any
>> configuration update in run time?
>>> What does it mean? How datatpath should handle configuration from
>> control thread in run time while traffic is on?
>>> For example, changing queue size \ addresses must stop traffic before...
>>> Also changing FDs is very sensitive.
>>>
>>> It doesn't make sense to me.
>>>
>>> Also, according to "on the fly" direction we should not disable the queue
>> unless enable message is coming to disable it.
> 
> No response, so looks like you agree that it doesn't make sense.

No, my reply was general to all your comments.

With SW backend, I agree we don't need to disable the rings in case of
asynchronous changes to the ring because we protect it with a lock, so
we are sure the ring won't be accessed by another thread while doing the
change.

For vDPA case that's more problematic because we have no such locking
mechanism.

For example memory hotplug, Qemu does not seem to disable the queues so
we need to stop the vDPA device one way or another so that it does not
process the rings while the Vhost lib remaps the memory areas.

>>> In addition:
>>> Do you really want to toggle vDPA drivers\app for any configuration
>> message? It may cause queue recreation for each one (at least for mlx5).
>>
>> I want to have something robust and maintainable.
> 
> Me too.
> 
>> These messages arriving after a queue have been configured once are rare
>> events, but this is usually the kind of things that cause maintenance burden.
> 
> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.

Right.

>> If you look at my example patch, you will understand that with my proposal,
>> there won't be any more state change notification than with your proposal
>> when Qemu or any other Vhost-user master send a disable request before
>> sending the request that impact the queue state.
> 
> we didn't talk about disable time - this one is very simple.
> 
> Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.

I think this is necessary for synchronization with the Vhost-user
master (in case the master asks for this synchronization, like
set_mem_table for instance when reply-ack is enabled).

>> It just adds more robustness if this unlikely event happens, by invalidating
>> the ring state to not ready before doing the actual ring configuration change.
>> So that this config change is not missed by the vDPA driver or the application.
> 
> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).



> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.

I understand the spec as "the device is ready as soon as the first queue
pair is ready", but I might be wrong.

Do you suggest to call the dev_close() vDPA callback and the
destroy_device() application callback as soon as one of the ring of the
first queue pair receive a disable request or, with my patch, when one
of the rings receives a request that changes the ring state?

> 
> I will not insist anymore on waiting for enable for notifying although I not fan with it.
> 
> So, I suggest to create 1 notification function to be called after message handler and before reply.
> This function is the only one which notify ready states in the next options:
> 
> 1. virtq ready state is changed in the queue.
> 2. virtq ready state stays on after configuration message handler.
> 3. device state will be enabled when the first queue pair is ready.

IIUC, it will not disable the queues when there is a state change, is
that correct? If so, I think it does not work with memory hotplug case I
mentioned earlier.

Even for the callfd double change it can be problematic as Vhost-lib
will close the first one while it will still be used by the driver (Btw,
I see my example patch is also buggy in this regards, it should reset
the call_fd value in the virtqueue, then call
vhost_user_update_vring_state() and finally close the FD).

Thanks,
Maxime
> 
> Matan
> 
> 
> 
>> Maxime
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23 13:55                                   ` Maxime Coquelin
@ 2020-06-23 14:33                                     ` Maxime Coquelin
  2020-06-23 14:52                                     ` Matan Azrad
  1 sibling, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-23 14:33 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/23/20 3:55 PM, Maxime Coquelin wrote:
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>
>>
>> From: Maxime Coquelin:
>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>
>>>>
>>>> From: Maxime Coquelin:
>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>
>>>>>>
>>>>>> From: Maxime Coquelin:
>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Maxime Coquelin:
>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>> definition
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Maxime
>>>>>>>>>>
>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>> definition
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>>>>> change callback should
>>>>>>>>>>> have been called.
>>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>>> and after
>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>
>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>> sometimes it will be ready before and after the message handler
>>>>>>>>>>> but
>>>>> with different values.
>>>>>>>>>>>
>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>>> virtio_net
>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>> +
>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>> +
>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>> +
>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>
>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>> message according to v1.
>>>>>>>>>>
>>>>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>>>>> doesn't
>>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>>> enabling the queue again.
>>>>>>>>>> Because of that we decided to force calling state callback again
>>>>>>>>>> when
>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
>>> the
>>>>>>> queue is
>>>>>>>>> already ready.
>>>>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>>>>> account
>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>>>>
>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>> But now looking into the details of the implementation, I find it
>>>>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>>>>
>>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>>> (queue state and
>>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>>> message
>>>>>>>>> handler:
>>>>>>>>>> Pseudo:
>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>> 	switch (msg):
>>>>>>>>>> 		case enable:
>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>> 				force queue state =1.
>>>>>>>>>> 		case callfd
>>>>>>>>>> 		case kickfd
>>>>>>>>>> 				.....
>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>> needed..
>>>>>>>>>> 		Default
>>>>>>>>>> 			Return;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>>> handled (after vring_invalidate(), before setting new FD for call
>>>>>>>>> & kick, ...)
>>>>>>>>
>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>> pair is invalidated,
>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>>>>> case of FD
>>>>>>> change.
>>>>>>>
>>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>>> the disable and so you'll have one notification for the disable and
>>>>>>> one for the enable, or it didn't sent the disable and it will
>>>>>>> happen at old value invalidation time and after new value is taken into
>>> account.
>>>>>>>
>>>>>>
>>>>>> I don't see it in current QEMU behavior.
>>>>>> When working MQ I see that some virtqs get configuration message
>>>>>> while
>>>>> they are in enabled state.
>>>>>> Then, enable message is sent again later.
>>>>>
>>>>> I guess you mean the first queue pair? And it would not be in ready
>>>>> state as it would be the initial configuration of the queue?
>>>>
>>>> Even after initialization when queue is ready.
>>>>
>>>>>>
>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>> only in one
>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>> handlers
>>>>> around?
>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>
>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>> sure this is a correct assumption:
>>>>>>>
>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>> client must support changing some configuration aspects on the fly."
>>>>>>
>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>
>>>>> I agree it lacks a bit of clarity.
>>>>>
>>>>>> As I mentioned, QEMU sends enable message always after configuration
>>>>> message.
>>>>>
>>>>> Yes, but we should not do assumptions on current Qemu version when
>>>>> possible. Better to be safe and follow the specification, it will be more
>>> robust.
>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>
>>>> I understand your point here but do you really want to be ready for any
>>> configuration update in run time?
>>>> What does it mean? How datatpath should handle configuration from
>>> control thread in run time while traffic is on?
>>>> For example, changing queue size \ addresses must stop traffic before...
>>>> Also changing FDs is very sensitive.
>>>>
>>>> It doesn't make sense to me.
>>>>
>>>> Also, according to "on the fly" direction we should not disable the queue
>>> unless enable message is coming to disable it.
>>
>> No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so
> we are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues so
> we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
>>>> In addition:
>>>> Do you really want to toggle vDPA drivers\app for any configuration
>>> message? It may cause queue recreation for each one (at least for mlx5).
>>>
>>> I want to have something robust and maintainable.
>>
>> Me too.
>>
>>> These messages arriving after a queue have been configured once are rare
>>> events, but this is usually the kind of things that cause maintenance burden.
>>
>> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
> 
> Right.
> 
>>> If you look at my example patch, you will understand that with my proposal,
>>> there won't be any more state change notification than with your proposal
>>> when Qemu or any other Vhost-user master send a disable request before
>>> sending the request that impact the queue state.
>>
>> we didn't talk about disable time - this one is very simple.
>>
>> Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
>> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user
> master (in case the master asks for this synchronization, like
> set_mem_table for instance when reply-ack is enabled).
> 
>>> It just adds more robustness if this unlikely event happens, by invalidating
>>> the ring state to not ready before doing the actual ring configuration change.
>>> So that this config change is not missed by the vDPA driver or the application.
>>
>> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).
> 

Sorry, I forgot to reply here.
I am not sure about what do you mean about my invalidate proposal for
set_vring_base?

> 
>> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue
> pair is ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the
> first queue pair receive a disable request or, with my patch, when one
> of the rings receives a request that changes the ring state?
> 
>>
>> I will not insist anymore on waiting for enable for notifying although I not fan with it.
>>
>> So, I suggest to create 1 notification function to be called after message handler and before reply.
>> This function is the only one which notify ready states in the next options:
>>
>> 1. virtq ready state is changed in the queue.
>> 2. virtq ready state stays on after configuration message handler.
>> 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is
> that correct? If so, I think it does not work with memory hotplug case I
> mentioned earlier.
> 
> Even for the callfd double change it can be problematic as Vhost-lib
> will close the first one while it will still be used by the driver (Btw,
> I see my example patch is also buggy in this regards, it should reset
> the call_fd value in the virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).
> 
> Thanks,
> Maxime
>>
>> Matan
>>
>>
>>
>>> Maxime
>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23 13:55                                   ` Maxime Coquelin
  2020-06-23 14:33                                     ` Maxime Coquelin
@ 2020-06-23 14:52                                     ` Matan Azrad
  2020-06-23 15:18                                       ` Maxime Coquelin
  1 sibling, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-23 14:52 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 23, 2020 4:56 PM
> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> <xiao.w.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>> Cc: dev@dpdk.org
> >>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>> definition
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Maxime
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>> The issue is if you only check ready state only before and
> >>>>>>>>>>>> after the message affecting the ring is handled, it can be
> >>>>>>>>>>>> ready at both stages, while the rings have changed and
> >>>>>>>>>>>> state change callback should
> >>>>>>>>>> have been called.
> >>>>>>>>>>> But in this version I checked twice, before message handler
> >>>>>>>>>>> and after
> >>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>
> >>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>> handler but
> >>>> with different values.
> >>>>>>>>>>
> >>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> >>>>>> virtio_net
> >>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>
> >>>>>>>>>> ...
> >>>>>>>>>>
> >>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>> +
> >>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>> +
> >>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>> +
> >>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>
> >>>>>>>>> The driver will be notified in the next
> >>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>> message according to v1.
> >>>>>>>>>
> >>>>>>>>> One of our assumption we agreed on in the design mail is that
> >>>>>>>>> it doesn't
> >>>>>>>> make sense that QEMU will change queue configuration without
> >>>>>>>> enabling the queue again.
> >>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>> again when
> >>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even
> if
> >> the
> >>>>>> queue is
> >>>>>>>> already ready.
> >>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>> into account
> >>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>
> >>>>>>>>> I think that this assumption is correct according to the QEMU
> code.
> >>>>>>>>
> >>>>>>>> Yes, this was our initial assumption.
> >>>>>>>> But now looking into the details of the implementation, I find
> >>>>>>>> it is even cleaner & clearer not to do this assumption.
> >>>>>>>>
> >>>>>>>>> That's why I prefer to collect all the ready checks callbacks
> >>>>>>>>> (queue state and
> >>>>>>>> device new\conf) to one function that will be called after the
> >>>>>>>> message
> >>>>>>>> handler:
> >>>>>>>>> Pseudo:
> >>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>> 	switch (msg):
> >>>>>>>>> 		case enable:
> >>>>>>>>> 			if(enable is 1)
> >>>>>>>>> 				force queue state =1.
> >>>>>>>>> 		case callfd
> >>>>>>>>> 		case kickfd
> >>>>>>>>> 				.....
> >>>>>>>>> 		Check queue and device ready + call callbacks if
> >> needed..
> >>>>>>>>> 		Default
> >>>>>>>>> 			Return;
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> I find it more natural to "invalidate" ready state where it is
> >>>>>>>> handled (after vring_invalidate(), before setting new FD for
> >>>>>>>> call & kick, ...)
> >>>>>>>
> >>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>> pair is invalidated,
> >>>>>> you need to notify app\driver also about device ready change.
> >>>>>>> Also it will cause 2 notifications to the driver instead of one
> >>>>>>> in case of FD
> >>>>>> change.
> >>>>>>
> >>>>>> You'll always end-up with two notifications, either Qemu has sent
> >>>>>> the disable and so you'll have one notification for the disable
> >>>>>> and one for the enable, or it didn't sent the disable and it will
> >>>>>> happen at old value invalidation time and after new value is
> >>>>>> taken into
> >> account.
> >>>>>>
> >>>>>
> >>>>> I don't see it in current QEMU behavior.
> >>>>> When working MQ I see that some virtqs get configuration message
> >>>>> while
> >>>> they are in enabled state.
> >>>>> Then, enable message is sent again later.
> >>>>
> >>>> I guess you mean the first queue pair? And it would not be in ready
> >>>> state as it would be the initial configuration of the queue?
> >>>
> >>> Even after initialization when queue is ready.
> >>>
> >>>>>
> >>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>> only in one
> >>>>>> point in the code instead of doing it in all the configuration
> >>>>>> handlers
> >>>> around?
> >>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>
> >>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>> sure this is a correct assumption:
> >>>>>>
> >>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>> client must support changing some configuration aspects on the fly."
> >>>>>
> >>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>
> >>>> I agree it lacks a bit of clarity.
> >>>>
> >>>>> As I mentioned, QEMU sends enable message always after
> >>>>> configuration
> >>>> message.
> >>>>
> >>>> Yes, but we should not do assumptions on current Qemu version when
> >>>> possible. Better to be safe and follow the specification, it will
> >>>> be more
> >> robust.
> >>>> There is also the Virtio-user PMD to take into account for example.
> >>>
> >>> I understand your point here but do you really want to be ready for
> >>> any
> >> configuration update in run time?
> >>> What does it mean? How datatpath should handle configuration from
> >> control thread in run time while traffic is on?
> >>> For example, changing queue size \ addresses must stop traffic before...
> >>> Also changing FDs is very sensitive.
> >>>
> >>> It doesn't make sense to me.
> >>>
> >>> Also, according to "on the fly" direction we should not disable the
> >>> queue
> >> unless enable message is coming to disable it.
> >
> > No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so we
> are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues
> so we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
> >>> In addition:
> >>> Do you really want to toggle vDPA drivers\app for any configuration
> >> message? It may cause queue recreation for each one (at least for mlx5).
> >>
> >> I want to have something robust and maintainable.
> >
> > Me too.
> >
> >> These messages arriving after a queue have been configured once are
> >> rare events, but this is usually the kind of things that cause maintenance
> burden.
> >
> > In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
> 
> Right.
> 
> >> If you look at my example patch, you will understand that with my
> >> proposal, there won't be any more state change notification than with
> >> your proposal when Qemu or any other Vhost-user master send a disable
> >> request before sending the request that impact the queue state.
> >
> > we didn't talk about disable time - this one is very simple.
> >
> > Yes, In case the queue is disabled your proposal doesn't send extra
> notification as my.
> > But in case the queue is ready, your proposal send extra not ready
> notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user master (in
> case the master asks for this synchronization, like set_mem_table for
> instance when reply-ack is enabled).
> 
> >> It just adds more robustness if this unlikely event happens, by
> >> invalidating the ring state to not ready before doing the actual ring
> configuration change.
> >> So that this config change is not missed by the vDPA driver or the
> application.
> >
> > One more issue here is that there is some time that device is ready (already
> configured) and the first vittq-pair is not ready (your invalidate proposal for
> set_vring_base).
> 
> 
> 
> > It doesn’t save the concept that device is ready only in case the first virtq-
> pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue pair is
> ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the first
> queue pair receive a disable request or, with my patch, when one of the
> rings receives a request that changes the ring state?

I means, your proposal actually may make first virtq-pair ready state disabled when device ready.
So, yes, it leads to call device close\destroy.

> > I will not insist anymore on waiting for enable for notifying although I not
> fan with it.
> >
> > So, I suggest to create 1 notification function to be called after message
> handler and before reply.
> > This function is the only one which notify ready states in the next options:
> >
> > 1. virtq ready state is changed in the queue.
> > 2. virtq ready state stays on after configuration message handler.
> > 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is that
> correct? If so, I think it does not work with memory hotplug case I mentioned
> earlier.

It will do enable again which mean - something was modified.

> Even for the callfd double change it can be problematic as Vhost-lib will close
> the first one while it will still be used by the driver (Btw, I see my example
> patch is also buggy in this regards, it should reset the call_fd value in the
> virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).

Yes, this one leads for different handle for each message.

Maybe it leads for new queue modify operation.
So, queue doesn't send the state - just does configuration change on the fly.

What do you think?

 
> Thanks,
> Maxime
> >
> > Matan
> >
> >
> >
> >> Maxime
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23 14:52                                     ` Matan Azrad
@ 2020-06-23 15:18                                       ` Maxime Coquelin
  2020-06-24  5:54                                         ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-23 15:18 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/23/20 4:52 PM, Matan Azrad wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, June 23, 2020 4:56 PM
>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>> <xiao.w.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>
>> Hi Matan,
>>
>> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>>
>>>
>>> From: Maxime Coquelin:
>>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin:
>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin:
>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>> definition
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Maxime
>>>>>>>>>>>
>>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>> definition
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>>>> ready at both stages, while the rings have changed and
>>>>>>>>>>>>>> state change callback should
>>>>>>>>>>>> have been called.
>>>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>>>> and after
>>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>>
>>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>>> sometimes it will be ready before and after the message
>>>>>>>>>>>> handler but
>>>>>> with different values.
>>>>>>>>>>>>
>>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>>>> virtio_net
>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>>> +
>>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>>> +
>>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>>> +
>>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>>
>>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>>> message according to v1.
>>>>>>>>>>>
>>>>>>>>>>> One of our assumption we agreed on in the design mail is that
>>>>>>>>>>> it doesn't
>>>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>>>> enabling the queue again.
>>>>>>>>>>> Because of that we decided to force calling state callback
>>>>>>>>>>> again when
>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even
>> if
>>>> the
>>>>>>>> queue is
>>>>>>>>>> already ready.
>>>>>>>>>>> So when driver/app see state enable->enable, it should take
>>>>>>>>>>> into account
>>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>>
>>>>>>>>>>> I think that this assumption is correct according to the QEMU
>> code.
>>>>>>>>>>
>>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>>> But now looking into the details of the implementation, I find
>>>>>>>>>> it is even cleaner & clearer not to do this assumption.
>>>>>>>>>>
>>>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>>>> (queue state and
>>>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>>>> message
>>>>>>>>>> handler:
>>>>>>>>>>> Pseudo:
>>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>>> 	switch (msg):
>>>>>>>>>>> 		case enable:
>>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>>> 				force queue state =1.
>>>>>>>>>>> 		case callfd
>>>>>>>>>>> 		case kickfd
>>>>>>>>>>> 				.....
>>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>>> needed..
>>>>>>>>>>> 		Default
>>>>>>>>>>> 			Return;
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>>>> handled (after vring_invalidate(), before setting new FD for
>>>>>>>>>> call & kick, ...)
>>>>>>>>>
>>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>>> pair is invalidated,
>>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>>> Also it will cause 2 notifications to the driver instead of one
>>>>>>>>> in case of FD
>>>>>>>> change.
>>>>>>>>
>>>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>>>> the disable and so you'll have one notification for the disable
>>>>>>>> and one for the enable, or it didn't sent the disable and it will
>>>>>>>> happen at old value invalidation time and after new value is
>>>>>>>> taken into
>>>> account.
>>>>>>>>
>>>>>>>
>>>>>>> I don't see it in current QEMU behavior.
>>>>>>> When working MQ I see that some virtqs get configuration message
>>>>>>> while
>>>>>> they are in enabled state.
>>>>>>> Then, enable message is sent again later.
>>>>>>
>>>>>> I guess you mean the first queue pair? And it would not be in ready
>>>>>> state as it would be the initial configuration of the queue?
>>>>>
>>>>> Even after initialization when queue is ready.
>>>>>
>>>>>>>
>>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>>> only in one
>>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>>> handlers
>>>>>> around?
>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>>
>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>>> sure this is a correct assumption:
>>>>>>>>
>>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>>> client must support changing some configuration aspects on the fly."
>>>>>>>
>>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>>
>>>>>> I agree it lacks a bit of clarity.
>>>>>>
>>>>>>> As I mentioned, QEMU sends enable message always after
>>>>>>> configuration
>>>>>> message.
>>>>>>
>>>>>> Yes, but we should not do assumptions on current Qemu version when
>>>>>> possible. Better to be safe and follow the specification, it will
>>>>>> be more
>>>> robust.
>>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>>
>>>>> I understand your point here but do you really want to be ready for
>>>>> any
>>>> configuration update in run time?
>>>>> What does it mean? How datatpath should handle configuration from
>>>> control thread in run time while traffic is on?
>>>>> For example, changing queue size \ addresses must stop traffic before...
>>>>> Also changing FDs is very sensitive.
>>>>>
>>>>> It doesn't make sense to me.
>>>>>
>>>>> Also, according to "on the fly" direction we should not disable the
>>>>> queue
>>>> unless enable message is coming to disable it.
>>>
>>> No response, so looks like you agree that it doesn't make sense.
>>
>> No, my reply was general to all your comments.
>>
>> With SW backend, I agree we don't need to disable the rings in case of
>> asynchronous changes to the ring because we protect it with a lock, so we
>> are sure the ring won't be accessed by another thread while doing the
>> change.
>>
>> For vDPA case that's more problematic because we have no such locking
>> mechanism.
>>
>> For example memory hotplug, Qemu does not seem to disable the queues
>> so we need to stop the vDPA device one way or another so that it does not
>> process the rings while the Vhost lib remaps the memory areas.
>>
>>>>> In addition:
>>>>> Do you really want to toggle vDPA drivers\app for any configuration
>>>> message? It may cause queue recreation for each one (at least for mlx5).
>>>>
>>>> I want to have something robust and maintainable.
>>>
>>> Me too.
>>>
>>>> These messages arriving after a queue have been configured once are
>>>> rare events, but this is usually the kind of things that cause maintenance
>> burden.
>>>
>>> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
>>
>> Right.
>>
>>>> If you look at my example patch, you will understand that with my
>>>> proposal, there won't be any more state change notification than with
>>>> your proposal when Qemu or any other Vhost-user master send a disable
>>>> request before sending the request that impact the queue state.
>>>
>>> we didn't talk about disable time - this one is very simple.
>>>
>>> Yes, In case the queue is disabled your proposal doesn't send extra
>> notification as my.
>>> But in case the queue is ready, your proposal send extra not ready
>> notification for kikfd,callfd,set_vring_base configurations.
>>
>> I think this is necessary for synchronization with the Vhost-user master (in
>> case the master asks for this synchronization, like set_mem_table for
>> instance when reply-ack is enabled).
>>
>>>> It just adds more robustness if this unlikely event happens, by
>>>> invalidating the ring state to not ready before doing the actual ring
>> configuration change.
>>>> So that this config change is not missed by the vDPA driver or the
>> application.
>>>
>>> One more issue here is that there is some time that device is ready (already
>> configured) and the first vittq-pair is not ready (your invalidate proposal for
>> set_vring_base).
>>
>>
>>
>>> It doesn’t save the concept that device is ready only in case the first virtq-
>> pair is ready.
>>
>> I understand the spec as "the device is ready as soon as the first queue pair is
>> ready", but I might be wrong.
>>
>> Do you suggest to call the dev_close() vDPA callback and the
>> destroy_device() application callback as soon as one of the ring of the first
>> queue pair receive a disable request or, with my patch, when one of the
>> rings receives a request that changes the ring state?
> 
> I means, your proposal actually may make first virtq-pair ready state disabled when device ready.
> So, yes, it leads to call device close\destroy.

No it doesn't, there is no call to .dev_close()/.destroy_device() with
my patch if first queue pair gets disabled.

>>> I will not insist anymore on waiting for enable for notifying although I not
>> fan with it.
>>>
>>> So, I suggest to create 1 notification function to be called after message
>> handler and before reply.
>>> This function is the only one which notify ready states in the next options:
>>>
>>> 1. virtq ready state is changed in the queue.
>>> 2. virtq ready state stays on after configuration message handler.
>>> 3. device state will be enabled when the first queue pair is ready.
>>
>> IIUC, it will not disable the queues when there is a state change, is that
>> correct? If so, I think it does not work with memory hotplug case I mentioned
>> earlier.
> 
> It will do enable again which mean - something was modified.

Ok, thanks for the clarification.

I think it is not enough for the examples I gave below. For
set_mem_table, we need to stop the device from processing the vrings
before the set_mem_table handler calls the munmap(), and re-enable it
after the mmap() (I did that wrong in my example patch, I just did
that after the munmap/mmap happened, which is too late).

>> Even for the callfd double change it can be problematic as Vhost-lib will close
>> the first one while it will still be used by the driver (Btw, I see my example
>> patch is also buggy in this regards, it should reset the call_fd value in the
>> virtqueue, then call
>> vhost_user_update_vring_state() and finally close the FD).
> 
> Yes, this one leads for different handle for each message.
> 
> Maybe it leads for new queue modify operation.
> So, queue doesn't send the state - just does configuration change on the fly.
> 
> What do you think?

I think that configuration on the fly doesn't fly.
We would at least need to stop the device from processing the rings for
memory hotplug case, so why not just send a disable notification?

And for the double callfd, that does not look right to me not to request
the driver to stop using it before it is closed, isn't it?

Thanks,
Maxime

>  
>> Thanks,
>> Maxime
>>>
>>> Matan
>>>
>>>
>>>
>>>> Maxime
>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-23 15:18                                       ` Maxime Coquelin
@ 2020-06-24  5:54                                         ` Matan Azrad
  2020-06-24  7:22                                           ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-24  5:54 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev

Ho Maxime

Good morning

From: Maxime Coquelin:
> On 6/23/20 4:52 PM, Matan Azrad wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, June 23, 2020 4:56 PM
> >> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >> <xiao.w.wang@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>
> >> Hi Matan,
> >>
> >> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin:
> >>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Maxime
> >>>>>>>>>>>
> >>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>>>> definition
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>>>> The issue is if you only check ready state only before
> >>>>>>>>>>>>>> and after the message affecting the ring is handled, it
> >>>>>>>>>>>>>> can be ready at both stages, while the rings have changed
> >>>>>>>>>>>>>> and state change callback should
> >>>>>>>>>>>> have been called.
> >>>>>>>>>>>>> But in this version I checked twice, before message
> >>>>>>>>>>>>> handler and after
> >>>>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>>>
> >>>>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>>>> handler but
> >>>>>> with different values.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>>>> @@ -1847,15 +1892,16 @@
> vhost_user_set_vring_kick(struct
> >>>>>>>> virtio_net
> >>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>>>
> >>>>>>>>>>>> ...
> >>>>>>>>>>>>
> >>>>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>>>
> >>>>>>>>>>> The driver will be notified in the next
> >>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>>>> message according to v1.
> >>>>>>>>>>>
> >>>>>>>>>>> One of our assumption we agreed on in the design mail is
> >>>>>>>>>>> that it doesn't
> >>>>>>>>>> make sense that QEMU will change queue configuration
> without
> >>>>>>>>>> enabling the queue again.
> >>>>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>>>> again when
> >>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
> even
> >> if
> >>>> the
> >>>>>>>> queue is
> >>>>>>>>>> already ready.
> >>>>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>>>> into account
> >>>>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that this assumption is correct according to the
> >>>>>>>>>>> QEMU
> >> code.
> >>>>>>>>>>
> >>>>>>>>>> Yes, this was our initial assumption.
> >>>>>>>>>> But now looking into the details of the implementation, I
> >>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
> >>>>>>>>>>
> >>>>>>>>>>> That's why I prefer to collect all the ready checks
> >>>>>>>>>>> callbacks (queue state and
> >>>>>>>>>> device new\conf) to one function that will be called after
> >>>>>>>>>> the message
> >>>>>>>>>> handler:
> >>>>>>>>>>> Pseudo:
> >>>>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>>>> 	switch (msg):
> >>>>>>>>>>> 		case enable:
> >>>>>>>>>>> 			if(enable is 1)
> >>>>>>>>>>> 				force queue state =1.
> >>>>>>>>>>> 		case callfd
> >>>>>>>>>>> 		case kickfd
> >>>>>>>>>>> 				.....
> >>>>>>>>>>> 		Check queue and device ready + call callbacks if
> >>>> needed..
> >>>>>>>>>>> 		Default
> >>>>>>>>>>> 			Return;
> >>>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> I find it more natural to "invalidate" ready state where it
> >>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
> >>>>>>>>>> for call & kick, ...)
> >>>>>>>>>
> >>>>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>>>> pair is invalidated,
> >>>>>>>> you need to notify app\driver also about device ready change.
> >>>>>>>>> Also it will cause 2 notifications to the driver instead of
> >>>>>>>>> one in case of FD
> >>>>>>>> change.
> >>>>>>>>
> >>>>>>>> You'll always end-up with two notifications, either Qemu has
> >>>>>>>> sent the disable and so you'll have one notification for the
> >>>>>>>> disable and one for the enable, or it didn't sent the disable
> >>>>>>>> and it will happen at old value invalidation time and after new
> >>>>>>>> value is taken into
> >>>> account.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I don't see it in current QEMU behavior.
> >>>>>>> When working MQ I see that some virtqs get configuration
> message
> >>>>>>> while
> >>>>>> they are in enabled state.
> >>>>>>> Then, enable message is sent again later.
> >>>>>>
> >>>>>> I guess you mean the first queue pair? And it would not be in
> >>>>>> ready state as it would be the initial configuration of the queue?
> >>>>>
> >>>>> Even after initialization when queue is ready.
> >>>>>
> >>>>>>>
> >>>>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>>>> only in one
> >>>>>>>> point in the code instead of doing it in all the configuration
> >>>>>>>> handlers
> >>>>>> around?
> >>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>>>
> >>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>>>> sure this is a correct assumption:
> >>>>>>>>
> >>>>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>>>> client must support changing some configuration aspects on the
> fly."
> >>>>>>>
> >>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>>>
> >>>>>> I agree it lacks a bit of clarity.
> >>>>>>
> >>>>>>> As I mentioned, QEMU sends enable message always after
> >>>>>>> configuration
> >>>>>> message.
> >>>>>>
> >>>>>> Yes, but we should not do assumptions on current Qemu version
> >>>>>> when possible. Better to be safe and follow the specification, it
> >>>>>> will be more
> >>>> robust.
> >>>>>> There is also the Virtio-user PMD to take into account for example.
> >>>>>
> >>>>> I understand your point here but do you really want to be ready
> >>>>> for any
> >>>> configuration update in run time?
> >>>>> What does it mean? How datatpath should handle configuration from
> >>>> control thread in run time while traffic is on?
> >>>>> For example, changing queue size \ addresses must stop traffic
> before...
> >>>>> Also changing FDs is very sensitive.
> >>>>>
> >>>>> It doesn't make sense to me.
> >>>>>
> >>>>> Also, according to "on the fly" direction we should not disable
> >>>>> the queue
> >>>> unless enable message is coming to disable it.
> >>>
> >>> No response, so looks like you agree that it doesn't make sense.
> >>
> >> No, my reply was general to all your comments.
> >>
> >> With SW backend, I agree we don't need to disable the rings in case
> >> of asynchronous changes to the ring because we protect it with a
> >> lock, so we are sure the ring won't be accessed by another thread
> >> while doing the change.
> >>
> >> For vDPA case that's more problematic because we have no such locking
> >> mechanism.
> >>
> >> For example memory hotplug, Qemu does not seem to disable the
> queues
> >> so we need to stop the vDPA device one way or another so that it does
> >> not process the rings while the Vhost lib remaps the memory areas.
> >>
> >>>>> In addition:
> >>>>> Do you really want to toggle vDPA drivers\app for any
> >>>>> configuration
> >>>> message? It may cause queue recreation for each one (at least for
> mlx5).
> >>>>
> >>>> I want to have something robust and maintainable.
> >>>
> >>> Me too.
> >>>
> >>>> These messages arriving after a queue have been configured once are
> >>>> rare events, but this is usually the kind of things that cause
> >>>> maintenance
> >> burden.
> >>>
> >>> In case of guest poll mode (testpmd virtio) we all the time get callfd
> twice.
> >>
> >> Right.
> >>
> >>>> If you look at my example patch, you will understand that with my
> >>>> proposal, there won't be any more state change notification than
> >>>> with your proposal when Qemu or any other Vhost-user master send a
> >>>> disable request before sending the request that impact the queue
> state.
> >>>
> >>> we didn't talk about disable time - this one is very simple.
> >>>
> >>> Yes, In case the queue is disabled your proposal doesn't send extra
> >> notification as my.
> >>> But in case the queue is ready, your proposal send extra not ready
> >> notification for kikfd,callfd,set_vring_base configurations.
> >>
> >> I think this is necessary for synchronization with the Vhost-user
> >> master (in case the master asks for this synchronization, like
> >> set_mem_table for instance when reply-ack is enabled).
> >>
> >>>> It just adds more robustness if this unlikely event happens, by
> >>>> invalidating the ring state to not ready before doing the actual
> >>>> ring
> >> configuration change.
> >>>> So that this config change is not missed by the vDPA driver or the
> >> application.
> >>>
> >>> One more issue here is that there is some time that device is ready
> >>> (already
> >> configured) and the first vittq-pair is not ready (your invalidate
> >> proposal for set_vring_base).
> >>
> >>
> >>
> >>> It doesn’t save the concept that device is ready only in case the
> >>> first virtq-
> >> pair is ready.
> >>
> >> I understand the spec as "the device is ready as soon as the first
> >> queue pair is ready", but I might be wrong.
> >>
> >> Do you suggest to call the dev_close() vDPA callback and the
> >> destroy_device() application callback as soon as one of the ring of
> >> the first queue pair receive a disable request or, with my patch,
> >> when one of the rings receives a request that changes the ring state?
> >
> > I means, your proposal actually may make first virtq-pair ready state
> disabled when device ready.
> > So, yes, it leads to call device close\destroy.
> 
> No it doesn't, there is no call to .dev_close()/.destroy_device() with my
> patch if first queue pair gets disabled.
> 
> >>> I will not insist anymore on waiting for enable for notifying
> >>> although I not
> >> fan with it.
> >>>
> >>> So, I suggest to create 1 notification function to be called after
> >>> message
> >> handler and before reply.
> >>> This function is the only one which notify ready states in the next
> options:
> >>>
> >>> 1. virtq ready state is changed in the queue.
> >>> 2. virtq ready state stays on after configuration message handler.
> >>> 3. device state will be enabled when the first queue pair is ready.
> >>
> >> IIUC, it will not disable the queues when there is a state change, is
> >> that correct? If so, I think it does not work with memory hotplug
> >> case I mentioned earlier.
> >
> > It will do enable again which mean - something was modified.
> 
> Ok, thanks for the clarification.
> 
> I think it is not enough for the examples I gave below. For set_mem_table,
> we need to stop the device from processing the vrings before the
> set_mem_table handler calls the munmap(), and re-enable it after the
> mmap() (I did that wrong in my example patch, I just did that after the
> munmap/mmap happened, which is too late).
> 
> >> Even for the callfd double change it can be problematic as Vhost-lib
> >> will close the first one while it will still be used by the driver
> >> (Btw, I see my example patch is also buggy in this regards, it should
> >> reset the call_fd value in the virtqueue, then call
> >> vhost_user_update_vring_state() and finally close the FD).
> >
> > Yes, this one leads for different handle for each message.
> >
> > Maybe it leads for new queue modify operation.
> > So, queue doesn't send the state - just does configuration change on the
> fly.
> >
> > What do you think?
> 
> I think that configuration on the fly doesn't fly.
> We would at least need to stop the device from processing the rings for
> memory hotplug case, so why not just send a disable notification?

Yes, driver need notification here.

> And for the double callfd, that does not look right to me not to request the
> driver to stop using it before it is closed, isn't it?

Yes, and some drivers (include mlx5) may stop the traffic in this case too.

modify\update operation will solve all:

For example:

In memory hotplug:
Do new mmap
Call modify
Do munmup for old.

In callfd\kickfd change:

Set new FD.
Call modify.
Close old FD.

Modify is clearer, save calls and faster (datapath will back faster).


>  Thanks,
> Maxime
> 
> >
> >> Thanks,
> >> Maxime
> >>>
> >>> Matan
> >>>
> >>>
> >>>
> >>>> Maxime
> >>>
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-24  5:54                                         ` Matan Azrad
@ 2020-06-24  7:22                                           ` Maxime Coquelin
  2020-06-24  8:38                                             ` Matan Azrad
  0 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-24  7:22 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev

Good morning Matan,

On 6/24/20 7:54 AM, Matan Azrad wrote:
> Ho Maxime
> 
> Good morning
> 
> From: Maxime Coquelin:
>> On 6/23/20 4:52 PM, Matan Azrad wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, June 23, 2020 4:56 PM
>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>> <xiao.w.wang@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>
>>>> Hi Matan,
>>>>
>>>> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin:
>>>>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin:
>>>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>> definition
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maxime
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>>>>> The issue is if you only check ready state only before
>>>>>>>>>>>>>>>> and after the message affecting the ring is handled, it
>>>>>>>>>>>>>>>> can be ready at both stages, while the rings have changed
>>>>>>>>>>>>>>>> and state change callback should
>>>>>>>>>>>>>> have been called.
>>>>>>>>>>>>>>> But in this version I checked twice, before message
>>>>>>>>>>>>>>> handler and after
>>>>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>>>>> sometimes it will be ready before and after the message
>>>>>>>>>>>>>> handler but
>>>>>>>> with different values.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>>>>> @@ -1847,15 +1892,16 @@
>> vhost_user_set_vring_kick(struct
>>>>>>>>>> virtio_net
>>>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>>>>> message according to v1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One of our assumption we agreed on in the design mail is
>>>>>>>>>>>>> that it doesn't
>>>>>>>>>>>> make sense that QEMU will change queue configuration
>> without
>>>>>>>>>>>> enabling the queue again.
>>>>>>>>>>>>> Because of that we decided to force calling state callback
>>>>>>>>>>>>> again when
>>>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
>> even
>>>> if
>>>>>> the
>>>>>>>>>> queue is
>>>>>>>>>>>> already ready.
>>>>>>>>>>>>> So when driver/app see state enable->enable, it should take
>>>>>>>>>>>>> into account
>>>>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that this assumption is correct according to the
>>>>>>>>>>>>> QEMU
>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>>>>> But now looking into the details of the implementation, I
>>>>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
>>>>>>>>>>>>
>>>>>>>>>>>>> That's why I prefer to collect all the ready checks
>>>>>>>>>>>>> callbacks (queue state and
>>>>>>>>>>>> device new\conf) to one function that will be called after
>>>>>>>>>>>> the message
>>>>>>>>>>>> handler:
>>>>>>>>>>>>> Pseudo:
>>>>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>>>>> 	switch (msg):
>>>>>>>>>>>>> 		case enable:
>>>>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>>>>> 				force queue state =1.
>>>>>>>>>>>>> 		case callfd
>>>>>>>>>>>>> 		case kickfd
>>>>>>>>>>>>> 				.....
>>>>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>>>>> needed..
>>>>>>>>>>>>> 		Default
>>>>>>>>>>>>> 			Return;
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> I find it more natural to "invalidate" ready state where it
>>>>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
>>>>>>>>>>>> for call & kick, ...)
>>>>>>>>>>>
>>>>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>>>>> pair is invalidated,
>>>>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>>>>> Also it will cause 2 notifications to the driver instead of
>>>>>>>>>>> one in case of FD
>>>>>>>>>> change.
>>>>>>>>>>
>>>>>>>>>> You'll always end-up with two notifications, either Qemu has
>>>>>>>>>> sent the disable and so you'll have one notification for the
>>>>>>>>>> disable and one for the enable, or it didn't sent the disable
>>>>>>>>>> and it will happen at old value invalidation time and after new
>>>>>>>>>> value is taken into
>>>>>> account.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't see it in current QEMU behavior.
>>>>>>>>> When working MQ I see that some virtqs get configuration
>> message
>>>>>>>>> while
>>>>>>>> they are in enabled state.
>>>>>>>>> Then, enable message is sent again later.
>>>>>>>>
>>>>>>>> I guess you mean the first queue pair? And it would not be in
>>>>>>>> ready state as it would be the initial configuration of the queue?
>>>>>>>
>>>>>>> Even after initialization when queue is ready.
>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>>>>> only in one
>>>>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>>>>> handlers
>>>>>>>> around?
>>>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>>>>
>>>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>>>>> sure this is a correct assumption:
>>>>>>>>>>
>>>>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>>>>> client must support changing some configuration aspects on the
>> fly."
>>>>>>>>>
>>>>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>>>>
>>>>>>>> I agree it lacks a bit of clarity.
>>>>>>>>
>>>>>>>>> As I mentioned, QEMU sends enable message always after
>>>>>>>>> configuration
>>>>>>>> message.
>>>>>>>>
>>>>>>>> Yes, but we should not do assumptions on current Qemu version
>>>>>>>> when possible. Better to be safe and follow the specification, it
>>>>>>>> will be more
>>>>>> robust.
>>>>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>>>>
>>>>>>> I understand your point here but do you really want to be ready
>>>>>>> for any
>>>>>> configuration update in run time?
>>>>>>> What does it mean? How datatpath should handle configuration from
>>>>>> control thread in run time while traffic is on?
>>>>>>> For example, changing queue size \ addresses must stop traffic
>> before...
>>>>>>> Also changing FDs is very sensitive.
>>>>>>>
>>>>>>> It doesn't make sense to me.
>>>>>>>
>>>>>>> Also, according to "on the fly" direction we should not disable
>>>>>>> the queue
>>>>>> unless enable message is coming to disable it.
>>>>>
>>>>> No response, so looks like you agree that it doesn't make sense.
>>>>
>>>> No, my reply was general to all your comments.
>>>>
>>>> With SW backend, I agree we don't need to disable the rings in case
>>>> of asynchronous changes to the ring because we protect it with a
>>>> lock, so we are sure the ring won't be accessed by another thread
>>>> while doing the change.
>>>>
>>>> For vDPA case that's more problematic because we have no such locking
>>>> mechanism.
>>>>
>>>> For example memory hotplug, Qemu does not seem to disable the
>> queues
>>>> so we need to stop the vDPA device one way or another so that it does
>>>> not process the rings while the Vhost lib remaps the memory areas.
>>>>
>>>>>>> In addition:
>>>>>>> Do you really want to toggle vDPA drivers\app for any
>>>>>>> configuration
>>>>>> message? It may cause queue recreation for each one (at least for
>> mlx5).
>>>>>>
>>>>>> I want to have something robust and maintainable.
>>>>>
>>>>> Me too.
>>>>>
>>>>>> These messages arriving after a queue have been configured once are
>>>>>> rare events, but this is usually the kind of things that cause
>>>>>> maintenance
>>>> burden.
>>>>>
>>>>> In case of guest poll mode (testpmd virtio) we all the time get callfd
>> twice.
>>>>
>>>> Right.
>>>>
>>>>>> If you look at my example patch, you will understand that with my
>>>>>> proposal, there won't be any more state change notification than
>>>>>> with your proposal when Qemu or any other Vhost-user master send a
>>>>>> disable request before sending the request that impact the queue
>> state.
>>>>>
>>>>> we didn't talk about disable time - this one is very simple.
>>>>>
>>>>> Yes, In case the queue is disabled your proposal doesn't send extra
>>>> notification as my.
>>>>> But in case the queue is ready, your proposal send extra not ready
>>>> notification for kikfd,callfd,set_vring_base configurations.
>>>>
>>>> I think this is necessary for synchronization with the Vhost-user
>>>> master (in case the master asks for this synchronization, like
>>>> set_mem_table for instance when reply-ack is enabled).
>>>>
>>>>>> It just adds more robustness if this unlikely event happens, by
>>>>>> invalidating the ring state to not ready before doing the actual
>>>>>> ring
>>>> configuration change.
>>>>>> So that this config change is not missed by the vDPA driver or the
>>>> application.
>>>>>
>>>>> One more issue here is that there is some time that device is ready
>>>>> (already
>>>> configured) and the first vittq-pair is not ready (your invalidate
>>>> proposal for set_vring_base).
>>>>
>>>>
>>>>
>>>>> It doesn’t save the concept that device is ready only in case the
>>>>> first virtq-
>>>> pair is ready.
>>>>
>>>> I understand the spec as "the device is ready as soon as the first
>>>> queue pair is ready", but I might be wrong.
>>>>
>>>> Do you suggest to call the dev_close() vDPA callback and the
>>>> destroy_device() application callback as soon as one of the ring of
>>>> the first queue pair receive a disable request or, with my patch,
>>>> when one of the rings receives a request that changes the ring state?
>>>
>>> I means, your proposal actually may make first virtq-pair ready state
>> disabled when device ready.
>>> So, yes, it leads to call device close\destroy.
>>
>> No it doesn't, there is no call to .dev_close()/.destroy_device() with my
>> patch if first queue pair gets disabled.
>>
>>>>> I will not insist anymore on waiting for enable for notifying
>>>>> although I not
>>>> fan with it.
>>>>>
>>>>> So, I suggest to create 1 notification function to be called after
>>>>> message
>>>> handler and before reply.
>>>>> This function is the only one which notify ready states in the next
>> options:
>>>>>
>>>>> 1. virtq ready state is changed in the queue.
>>>>> 2. virtq ready state stays on after configuration message handler.
>>>>> 3. device state will be enabled when the first queue pair is ready.
>>>>
>>>> IIUC, it will not disable the queues when there is a state change, is
>>>> that correct? If so, I think it does not work with memory hotplug
>>>> case I mentioned earlier.
>>>
>>> It will do enable again which mean - something was modified.
>>
>> Ok, thanks for the clarification.
>>
>> I think it is not enough for the examples I gave below. For set_mem_table,
>> we need to stop the device from processing the vrings before the
>> set_mem_table handler calls the munmap(), and re-enable it after the
>> mmap() (I did that wrong in my example patch, I just did that after the
>> munmap/mmap happened, which is too late).
>>
>>>> Even for the callfd double change it can be problematic as Vhost-lib
>>>> will close the first one while it will still be used by the driver
>>>> (Btw, I see my example patch is also buggy in this regards, it should
>>>> reset the call_fd value in the virtqueue, then call
>>>> vhost_user_update_vring_state() and finally close the FD).
>>>
>>> Yes, this one leads for different handle for each message.
>>>
>>> Maybe it leads for new queue modify operation.
>>> So, queue doesn't send the state - just does configuration change on the
>> fly.
>>>
>>> What do you think?
>>
>> I think that configuration on the fly doesn't fly.
>> We would at least need to stop the device from processing the rings for
>> memory hotplug case, so why not just send a disable notification?
> 
> Yes, driver need notification here.
> 
>> And for the double callfd, that does not look right to me not to request the
>> driver to stop using it before it is closed, isn't it?
> 
> Yes, and some drivers (include mlx5) may stop the traffic in this case too.
> 
> modify\update operation will solve all:
> 
> For example:
> 
> In memory hotplug:
> Do new mmap
> Call modify
> Do munmup for old.
> 
> In callfd\kickfd change:
> 
> Set new FD.
> Call modify.
> Close old FD.
> 
> Modify is clearer, save calls and faster (datapath will back faster).

It should work, but that is not light modifications to do in
set_mem_table handler (the function is quite complex already with
postcopy live-migration support).

With a modify callback, won't the driver part be more complex? Since it
would have to check which state has changed in the ring, and based on
that decide whether it should stop the ring or not.

As you says that in case of memory hotplug and double callfd, the driver
may stop processing the rings anyway, so would it be that much faster
than disabling/enabling the vring?

These events having a very rare occurrence, does it really matter if
it is a bit longer?

Thanks,
Maxime

> 
>>  Thanks,
>> Maxime
>>
>>>
>>>> Thanks,
>>>> Maxime
>>>>>
>>>>> Matan
>>>>>
>>>>>
>>>>>
>>>>>> Maxime
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-24  7:22                                           ` Maxime Coquelin
@ 2020-06-24  8:38                                             ` Matan Azrad
  2020-06-24  9:12                                               ` Maxime Coquelin
  0 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-24  8:38 UTC (permalink / raw)
  To: Maxime Coquelin, Xiao Wang; +Cc: dev



> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, June 24, 2020 10:22 AM
> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> <xiao.w.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> Good morning Matan,
> 
> On 6/24/20 7:54 AM, Matan Azrad wrote:
> > Ho Maxime
> >
> > Good morning
> >
> > From: Maxime Coquelin:
> >> On 6/23/20 4:52 PM, Matan Azrad wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Tuesday, June 23, 2020 4:56 PM
> >>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>> <xiao.w.wang@intel.com>
> >>>> Cc: dev@dpdk.org
> >>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>>>
> >>>> Hi Matan,
> >>>>
> >>>> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin:
> >>>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> From: Maxime Coquelin:
> >>>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>>>> definition
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Maxime
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>>>>>> definition
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>>>>>> The issue is if you only check ready state only before
> >>>>>>>>>>>>>>>> and after the message affecting the ring is handled, it
> >>>>>>>>>>>>>>>> can be ready at both stages, while the rings have
> >>>>>>>>>>>>>>>> changed and state change callback should
> >>>>>>>>>>>>>> have been called.
> >>>>>>>>>>>>>>> But in this version I checked twice, before message
> >>>>>>>>>>>>>>> handler and after
> >>>>>>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>>>>>> handler but
> >>>>>>>> with different values.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>>>>>> @@ -1847,15 +1892,16 @@
> >> vhost_user_set_vring_kick(struct
> >>>>>>>>>> virtio_net
> >>>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Without that, the ready check will return ready before
> >>>>>>>>>>>>>> and after the kickfd changed and the driver won't be
> notified.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The driver will be notified in the next
> >>>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>>>>>> message according to v1.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> One of our assumption we agreed on in the design mail is
> >>>>>>>>>>>>> that it doesn't
> >>>>>>>>>>>> make sense that QEMU will change queue configuration
> >> without
> >>>>>>>>>>>> enabling the queue again.
> >>>>>>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>>>>>> again when
> >>>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
> >> even
> >>>> if
> >>>>>> the
> >>>>>>>>>> queue is
> >>>>>>>>>>>> already ready.
> >>>>>>>>>>>>> So when driver/app see state enable->enable, it should
> >>>>>>>>>>>>> take into account
> >>>>>>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think that this assumption is correct according to the
> >>>>>>>>>>>>> QEMU
> >>>> code.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, this was our initial assumption.
> >>>>>>>>>>>> But now looking into the details of the implementation, I
> >>>>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> That's why I prefer to collect all the ready checks
> >>>>>>>>>>>>> callbacks (queue state and
> >>>>>>>>>>>> device new\conf) to one function that will be called after
> >>>>>>>>>>>> the message
> >>>>>>>>>>>> handler:
> >>>>>>>>>>>>> Pseudo:
> >>>>>>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>>>>>> 	switch (msg):
> >>>>>>>>>>>>> 		case enable:
> >>>>>>>>>>>>> 			if(enable is 1)
> >>>>>>>>>>>>> 				force queue state =1.
> >>>>>>>>>>>>> 		case callfd
> >>>>>>>>>>>>> 		case kickfd
> >>>>>>>>>>>>> 				.....
> >>>>>>>>>>>>> 		Check queue and device ready + call callbacks
> if
> >>>>>> needed..
> >>>>>>>>>>>>> 		Default
> >>>>>>>>>>>>> 			Return;
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> I find it more natural to "invalidate" ready state where it
> >>>>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
> >>>>>>>>>>>> for call & kick, ...)
> >>>>>>>>>>>
> >>>>>>>>>>> I think that if you go with this direction, if the first
> >>>>>>>>>>> queue pair is invalidated,
> >>>>>>>>>> you need to notify app\driver also about device ready change.
> >>>>>>>>>>> Also it will cause 2 notifications to the driver instead of
> >>>>>>>>>>> one in case of FD
> >>>>>>>>>> change.
> >>>>>>>>>>
> >>>>>>>>>> You'll always end-up with two notifications, either Qemu has
> >>>>>>>>>> sent the disable and so you'll have one notification for the
> >>>>>>>>>> disable and one for the enable, or it didn't sent the disable
> >>>>>>>>>> and it will happen at old value invalidation time and after
> >>>>>>>>>> new value is taken into
> >>>>>> account.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I don't see it in current QEMU behavior.
> >>>>>>>>> When working MQ I see that some virtqs get configuration
> >> message
> >>>>>>>>> while
> >>>>>>>> they are in enabled state.
> >>>>>>>>> Then, enable message is sent again later.
> >>>>>>>>
> >>>>>>>> I guess you mean the first queue pair? And it would not be in
> >>>>>>>> ready state as it would be the initial configuration of the queue?
> >>>>>>>
> >>>>>>> Even after initialization when queue is ready.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> Why not to take this correct assumption and update ready
> >>>>>>>>>>> state only in one
> >>>>>>>>>> point in the code instead of doing it in all the
> >>>>>>>>>> configuration handlers
> >>>>>>>> around?
> >>>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>>>>>
> >>>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more
> >>>>>>>>>> so sure this is a correct assumption:
> >>>>>>>>>>
> >>>>>>>>>> "While processing the rings (whether they are enabled or
> >>>>>>>>>> not), client must support changing some configuration aspects
> >>>>>>>>>> on the
> >> fly."
> >>>>>>>>>
> >>>>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>>>>>
> >>>>>>>> I agree it lacks a bit of clarity.
> >>>>>>>>
> >>>>>>>>> As I mentioned, QEMU sends enable message always after
> >>>>>>>>> configuration
> >>>>>>>> message.
> >>>>>>>>
> >>>>>>>> Yes, but we should not do assumptions on current Qemu version
> >>>>>>>> when possible. Better to be safe and follow the specification,
> >>>>>>>> it will be more
> >>>>>> robust.
> >>>>>>>> There is also the Virtio-user PMD to take into account for
> example.
> >>>>>>>
> >>>>>>> I understand your point here but do you really want to be ready
> >>>>>>> for any
> >>>>>> configuration update in run time?
> >>>>>>> What does it mean? How datatpath should handle configuration
> >>>>>>> from
> >>>>>> control thread in run time while traffic is on?
> >>>>>>> For example, changing queue size \ addresses must stop traffic
> >> before...
> >>>>>>> Also changing FDs is very sensitive.
> >>>>>>>
> >>>>>>> It doesn't make sense to me.
> >>>>>>>
> >>>>>>> Also, according to "on the fly" direction we should not disable
> >>>>>>> the queue
> >>>>>> unless enable message is coming to disable it.
> >>>>>
> >>>>> No response, so looks like you agree that it doesn't make sense.
> >>>>
> >>>> No, my reply was general to all your comments.
> >>>>
> >>>> With SW backend, I agree we don't need to disable the rings in case
> >>>> of asynchronous changes to the ring because we protect it with a
> >>>> lock, so we are sure the ring won't be accessed by another thread
> >>>> while doing the change.
> >>>>
> >>>> For vDPA case that's more problematic because we have no such
> >>>> locking mechanism.
> >>>>
> >>>> For example memory hotplug, Qemu does not seem to disable the
> >> queues
> >>>> so we need to stop the vDPA device one way or another so that it
> >>>> does not process the rings while the Vhost lib remaps the memory
> areas.
> >>>>
> >>>>>>> In addition:
> >>>>>>> Do you really want to toggle vDPA drivers\app for any
> >>>>>>> configuration
> >>>>>> message? It may cause queue recreation for each one (at least for
> >> mlx5).
> >>>>>>
> >>>>>> I want to have something robust and maintainable.
> >>>>>
> >>>>> Me too.
> >>>>>
> >>>>>> These messages arriving after a queue have been configured once
> >>>>>> are rare events, but this is usually the kind of things that
> >>>>>> cause maintenance
> >>>> burden.
> >>>>>
> >>>>> In case of guest poll mode (testpmd virtio) we all the time get
> >>>>> callfd
> >> twice.
> >>>>
> >>>> Right.
> >>>>
> >>>>>> If you look at my example patch, you will understand that with my
> >>>>>> proposal, there won't be any more state change notification than
> >>>>>> with your proposal when Qemu or any other Vhost-user master
> send
> >>>>>> a disable request before sending the request that impact the
> >>>>>> queue
> >> state.
> >>>>>
> >>>>> we didn't talk about disable time - this one is very simple.
> >>>>>
> >>>>> Yes, In case the queue is disabled your proposal doesn't send
> >>>>> extra
> >>>> notification as my.
> >>>>> But in case the queue is ready, your proposal send extra not ready
> >>>> notification for kikfd,callfd,set_vring_base configurations.
> >>>>
> >>>> I think this is necessary for synchronization with the Vhost-user
> >>>> master (in case the master asks for this synchronization, like
> >>>> set_mem_table for instance when reply-ack is enabled).
> >>>>
> >>>>>> It just adds more robustness if this unlikely event happens, by
> >>>>>> invalidating the ring state to not ready before doing the actual
> >>>>>> ring
> >>>> configuration change.
> >>>>>> So that this config change is not missed by the vDPA driver or
> >>>>>> the
> >>>> application.
> >>>>>
> >>>>> One more issue here is that there is some time that device is
> >>>>> ready (already
> >>>> configured) and the first vittq-pair is not ready (your invalidate
> >>>> proposal for set_vring_base).
> >>>>
> >>>>
> >>>>
> >>>>> It doesn’t save the concept that device is ready only in case the
> >>>>> first virtq-
> >>>> pair is ready.
> >>>>
> >>>> I understand the spec as "the device is ready as soon as the first
> >>>> queue pair is ready", but I might be wrong.
> >>>>
> >>>> Do you suggest to call the dev_close() vDPA callback and the
> >>>> destroy_device() application callback as soon as one of the ring of
> >>>> the first queue pair receive a disable request or, with my patch,
> >>>> when one of the rings receives a request that changes the ring state?
> >>>
> >>> I means, your proposal actually may make first virtq-pair ready
> >>> state
> >> disabled when device ready.
> >>> So, yes, it leads to call device close\destroy.
> >>
> >> No it doesn't, there is no call to .dev_close()/.destroy_device()
> >> with my patch if first queue pair gets disabled.
> >>
> >>>>> I will not insist anymore on waiting for enable for notifying
> >>>>> although I not
> >>>> fan with it.
> >>>>>
> >>>>> So, I suggest to create 1 notification function to be called after
> >>>>> message
> >>>> handler and before reply.
> >>>>> This function is the only one which notify ready states in the
> >>>>> next
> >> options:
> >>>>>
> >>>>> 1. virtq ready state is changed in the queue.
> >>>>> 2. virtq ready state stays on after configuration message handler.
> >>>>> 3. device state will be enabled when the first queue pair is ready.
> >>>>
> >>>> IIUC, it will not disable the queues when there is a state change,
> >>>> is that correct? If so, I think it does not work with memory
> >>>> hotplug case I mentioned earlier.
> >>>
> >>> It will do enable again which mean - something was modified.
> >>
> >> Ok, thanks for the clarification.
> >>
> >> I think it is not enough for the examples I gave below. For
> >> set_mem_table, we need to stop the device from processing the vrings
> >> before the set_mem_table handler calls the munmap(), and re-enable it
> >> after the
> >> mmap() (I did that wrong in my example patch, I just did that after
> >> the munmap/mmap happened, which is too late).
> >>
> >>>> Even for the callfd double change it can be problematic as
> >>>> Vhost-lib will close the first one while it will still be used by
> >>>> the driver (Btw, I see my example patch is also buggy in this
> >>>> regards, it should reset the call_fd value in the virtqueue, then
> >>>> call
> >>>> vhost_user_update_vring_state() and finally close the FD).
> >>>
> >>> Yes, this one leads for different handle for each message.
> >>>
> >>> Maybe it leads for new queue modify operation.
> >>> So, queue doesn't send the state - just does configuration change on
> >>> the
> >> fly.
> >>>
> >>> What do you think?
> >>
> >> I think that configuration on the fly doesn't fly.
> >> We would at least need to stop the device from processing the rings
> >> for memory hotplug case, so why not just send a disable notification?
> >
> > Yes, driver need notification here.
> >
> >> And for the double callfd, that does not look right to me not to
> >> request the driver to stop using it before it is closed, isn't it?
> >
> > Yes, and some drivers (include mlx5) may stop the traffic in this case too.
> >
> > modify\update operation will solve all:
> >
> > For example:
> >
> > In memory hotplug:
> > Do new mmap
> > Call modify
> > Do munmup for old.
> >
> > In callfd\kickfd change:
> >
> > Set new FD.
> > Call modify.
> > Close old FD.
> >
> > Modify is clearer, save calls and faster (datapath will back faster).
> 
> It should work, but that is not light modifications to do in set_mem_table
> handler (the function is quite complex already with postcopy live-migration
> support).
> 
> With a modify callback, won't the driver part be more complex? Since it
> would have to check which state has changed in the ring, and based on that
> decide whether it should stop the ring or not.
> 
> As you says that in case of memory hotplug and double callfd, the driver may
> stop processing the rings anyway, so would it be that much faster than
> disabling/enabling the vring?
> 
> These events having a very rare occurrence, does it really matter if it is a bit
> longer?


Just thinking again about memory hotplug:

Mlx5 driver device need to be reinitialized in this case because the NIC has memory translation which must be updated before the virtqs creation.

So, maybe we need to close and config the vDPA device in this case.

@Xiao Wang, can you comment the IFC behavior here.

Matan


> Thanks,
> Maxime
> 
> >
> >>  Thanks,
> >> Maxime
> >>
> >>>
> >>>> Thanks,
> >>>> Maxime
> >>>>>
> >>>>> Matan
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Maxime
> >>>>>
> >>>
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
  2020-06-24  8:38                                             ` Matan Azrad
@ 2020-06-24  9:12                                               ` Maxime Coquelin
  0 siblings, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-24  9:12 UTC (permalink / raw)
  To: Matan Azrad, Xiao Wang; +Cc: dev



On 6/24/20 10:38 AM, Matan Azrad wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, June 24, 2020 10:22 AM
>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>> <xiao.w.wang@intel.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>
>> Good morning Matan,
>>
>> On 6/24/20 7:54 AM, Matan Azrad wrote:
>>> Ho Maxime
>>>
>>> Good morning
>>>
>>> From: Maxime Coquelin:
>>>> On 6/23/20 4:52 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> Sent: Tuesday, June 23, 2020 4:56 PM
>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>> <xiao.w.wang@intel.com>
>>>>>> Cc: dev@dpdk.org
>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>>>
>>>>>> Hi Matan,
>>>>>>
>>>>>> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin:
>>>>>>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Maxime
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>>>>>>> The issue is if you only check ready state only before
>>>>>>>>>>>>>>>>>> and after the message affecting the ring is handled, it
>>>>>>>>>>>>>>>>>> can be ready at both stages, while the rings have
>>>>>>>>>>>>>>>>>> changed and state change callback should
>>>>>>>>>>>>>>>> have been called.
>>>>>>>>>>>>>>>>> But in this version I checked twice, before message
>>>>>>>>>>>>>>>>> handler and after
>>>>>>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>>>>>>> sometimes it will be ready before and after the message
>>>>>>>>>>>>>>>> handler but
>>>>>>>>>> with different values.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>>>>>>> @@ -1847,15 +1892,16 @@
>>>> vhost_user_set_vring_kick(struct
>>>>>>>>>>>> virtio_net
>>>>>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Without that, the ready check will return ready before
>>>>>>>>>>>>>>>> and after the kickfd changed and the driver won't be
>> notified.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>>>>>>> message according to v1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One of our assumption we agreed on in the design mail is
>>>>>>>>>>>>>>> that it doesn't
>>>>>>>>>>>>>> make sense that QEMU will change queue configuration
>>>> without
>>>>>>>>>>>>>> enabling the queue again.
>>>>>>>>>>>>>>> Because of that we decided to force calling state callback
>>>>>>>>>>>>>>> again when
>>>>>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
>>>> even
>>>>>> if
>>>>>>>> the
>>>>>>>>>>>> queue is
>>>>>>>>>>>>>> already ready.
>>>>>>>>>>>>>>> So when driver/app see state enable->enable, it should
>>>>>>>>>>>>>>> take into account
>>>>>>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think that this assumption is correct according to the
>>>>>>>>>>>>>>> QEMU
>>>>>> code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>>>>>>> But now looking into the details of the implementation, I
>>>>>>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That's why I prefer to collect all the ready checks
>>>>>>>>>>>>>>> callbacks (queue state and
>>>>>>>>>>>>>> device new\conf) to one function that will be called after
>>>>>>>>>>>>>> the message
>>>>>>>>>>>>>> handler:
>>>>>>>>>>>>>>> Pseudo:
>>>>>>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>>>>>>> 	switch (msg):
>>>>>>>>>>>>>>> 		case enable:
>>>>>>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>>>>>>> 				force queue state =1.
>>>>>>>>>>>>>>> 		case callfd
>>>>>>>>>>>>>>> 		case kickfd
>>>>>>>>>>>>>>> 				.....
>>>>>>>>>>>>>>> 		Check queue and device ready + call callbacks
>> if
>>>>>>>> needed..
>>>>>>>>>>>>>>> 		Default
>>>>>>>>>>>>>>> 			Return;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I find it more natural to "invalidate" ready state where it
>>>>>>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
>>>>>>>>>>>>>> for call & kick, ...)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that if you go with this direction, if the first
>>>>>>>>>>>>> queue pair is invalidated,
>>>>>>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>>>>>>> Also it will cause 2 notifications to the driver instead of
>>>>>>>>>>>>> one in case of FD
>>>>>>>>>>>> change.
>>>>>>>>>>>>
>>>>>>>>>>>> You'll always end-up with two notifications, either Qemu has
>>>>>>>>>>>> sent the disable and so you'll have one notification for the
>>>>>>>>>>>> disable and one for the enable, or it didn't sent the disable
>>>>>>>>>>>> and it will happen at old value invalidation time and after
>>>>>>>>>>>> new value is taken into
>>>>>>>> account.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I don't see it in current QEMU behavior.
>>>>>>>>>>> When working MQ I see that some virtqs get configuration
>>>> message
>>>>>>>>>>> while
>>>>>>>>>> they are in enabled state.
>>>>>>>>>>> Then, enable message is sent again later.
>>>>>>>>>>
>>>>>>>>>> I guess you mean the first queue pair? And it would not be in
>>>>>>>>>> ready state as it would be the initial configuration of the queue?
>>>>>>>>>
>>>>>>>>> Even after initialization when queue is ready.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Why not to take this correct assumption and update ready
>>>>>>>>>>>>> state only in one
>>>>>>>>>>>> point in the code instead of doing it in all the
>>>>>>>>>>>> configuration handlers
>>>>>>>>>> around?
>>>>>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>>>>>>
>>>>>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more
>>>>>>>>>>>> so sure this is a correct assumption:
>>>>>>>>>>>>
>>>>>>>>>>>> "While processing the rings (whether they are enabled or
>>>>>>>>>>>> not), client must support changing some configuration aspects
>>>>>>>>>>>> on the
>>>> fly."
>>>>>>>>>>>
>>>>>>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>>>>>>
>>>>>>>>>> I agree it lacks a bit of clarity.
>>>>>>>>>>
>>>>>>>>>>> As I mentioned, QEMU sends enable message always after
>>>>>>>>>>> configuration
>>>>>>>>>> message.
>>>>>>>>>>
>>>>>>>>>> Yes, but we should not do assumptions on current Qemu version
>>>>>>>>>> when possible. Better to be safe and follow the specification,
>>>>>>>>>> it will be more
>>>>>>>> robust.
>>>>>>>>>> There is also the Virtio-user PMD to take into account for
>> example.
>>>>>>>>>
>>>>>>>>> I understand your point here but do you really want to be ready
>>>>>>>>> for any
>>>>>>>> configuration update in run time?
>>>>>>>>> What does it mean? How datatpath should handle configuration
>>>>>>>>> from
>>>>>>>> control thread in run time while traffic is on?
>>>>>>>>> For example, changing queue size \ addresses must stop traffic
>>>> before...
>>>>>>>>> Also changing FDs is very sensitive.
>>>>>>>>>
>>>>>>>>> It doesn't make sense to me.
>>>>>>>>>
>>>>>>>>> Also, according to "on the fly" direction we should not disable
>>>>>>>>> the queue
>>>>>>>> unless enable message is coming to disable it.
>>>>>>>
>>>>>>> No response, so looks like you agree that it doesn't make sense.
>>>>>>
>>>>>> No, my reply was general to all your comments.
>>>>>>
>>>>>> With SW backend, I agree we don't need to disable the rings in case
>>>>>> of asynchronous changes to the ring because we protect it with a
>>>>>> lock, so we are sure the ring won't be accessed by another thread
>>>>>> while doing the change.
>>>>>>
>>>>>> For vDPA case that's more problematic because we have no such
>>>>>> locking mechanism.
>>>>>>
>>>>>> For example memory hotplug, Qemu does not seem to disable the
>>>> queues
>>>>>> so we need to stop the vDPA device one way or another so that it
>>>>>> does not process the rings while the Vhost lib remaps the memory
>> areas.
>>>>>>
>>>>>>>>> In addition:
>>>>>>>>> Do you really want to toggle vDPA drivers\app for any
>>>>>>>>> configuration
>>>>>>>> message? It may cause queue recreation for each one (at least for
>>>> mlx5).
>>>>>>>>
>>>>>>>> I want to have something robust and maintainable.
>>>>>>>
>>>>>>> Me too.
>>>>>>>
>>>>>>>> These messages arriving after a queue have been configured once
>>>>>>>> are rare events, but this is usually the kind of things that
>>>>>>>> cause maintenance
>>>>>> burden.
>>>>>>>
>>>>>>> In case of guest poll mode (testpmd virtio) we all the time get
>>>>>>> callfd
>>>> twice.
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>>>> If you look at my example patch, you will understand that with my
>>>>>>>> proposal, there won't be any more state change notification than
>>>>>>>> with your proposal when Qemu or any other Vhost-user master
>> send
>>>>>>>> a disable request before sending the request that impact the
>>>>>>>> queue
>>>> state.
>>>>>>>
>>>>>>> we didn't talk about disable time - this one is very simple.
>>>>>>>
>>>>>>> Yes, In case the queue is disabled your proposal doesn't send
>>>>>>> extra
>>>>>> notification as my.
>>>>>>> But in case the queue is ready, your proposal send extra not ready
>>>>>> notification for kikfd,callfd,set_vring_base configurations.
>>>>>>
>>>>>> I think this is necessary for synchronization with the Vhost-user
>>>>>> master (in case the master asks for this synchronization, like
>>>>>> set_mem_table for instance when reply-ack is enabled).
>>>>>>
>>>>>>>> It just adds more robustness if this unlikely event happens, by
>>>>>>>> invalidating the ring state to not ready before doing the actual
>>>>>>>> ring
>>>>>> configuration change.
>>>>>>>> So that this config change is not missed by the vDPA driver or
>>>>>>>> the
>>>>>> application.
>>>>>>>
>>>>>>> One more issue here is that there is some time that device is
>>>>>>> ready (already
>>>>>> configured) and the first vittq-pair is not ready (your invalidate
>>>>>> proposal for set_vring_base).
>>>>>>
>>>>>>
>>>>>>
>>>>>>> It doesn’t save the concept that device is ready only in case the
>>>>>>> first virtq-
>>>>>> pair is ready.
>>>>>>
>>>>>> I understand the spec as "the device is ready as soon as the first
>>>>>> queue pair is ready", but I might be wrong.
>>>>>>
>>>>>> Do you suggest to call the dev_close() vDPA callback and the
>>>>>> destroy_device() application callback as soon as one of the ring of
>>>>>> the first queue pair receive a disable request or, with my patch,
>>>>>> when one of the rings receives a request that changes the ring state?
>>>>>
>>>>> I means, your proposal actually may make first virtq-pair ready
>>>>> state
>>>> disabled when device ready.
>>>>> So, yes, it leads to call device close\destroy.
>>>>
>>>> No it doesn't, there is no call to .dev_close()/.destroy_device()
>>>> with my patch if first queue pair gets disabled.
>>>>
>>>>>>> I will not insist anymore on waiting for enable for notifying
>>>>>>> although I not
>>>>>> fan with it.
>>>>>>>
>>>>>>> So, I suggest to create 1 notification function to be called after
>>>>>>> message
>>>>>> handler and before reply.
>>>>>>> This function is the only one which notify ready states in the
>>>>>>> next
>>>> options:
>>>>>>>
>>>>>>> 1. virtq ready state is changed in the queue.
>>>>>>> 2. virtq ready state stays on after configuration message handler.
>>>>>>> 3. device state will be enabled when the first queue pair is ready.
>>>>>>
>>>>>> IIUC, it will not disable the queues when there is a state change,
>>>>>> is that correct? If so, I think it does not work with memory
>>>>>> hotplug case I mentioned earlier.
>>>>>
>>>>> It will do enable again which mean - something was modified.
>>>>
>>>> Ok, thanks for the clarification.
>>>>
>>>> I think it is not enough for the examples I gave below. For
>>>> set_mem_table, we need to stop the device from processing the vrings
>>>> before the set_mem_table handler calls the munmap(), and re-enable it
>>>> after the
>>>> mmap() (I did that wrong in my example patch, I just did that after
>>>> the munmap/mmap happened, which is too late).
>>>>
>>>>>> Even for the callfd double change it can be problematic as
>>>>>> Vhost-lib will close the first one while it will still be used by
>>>>>> the driver (Btw, I see my example patch is also buggy in this
>>>>>> regards, it should reset the call_fd value in the virtqueue, then
>>>>>> call
>>>>>> vhost_user_update_vring_state() and finally close the FD).
>>>>>
>>>>> Yes, this one leads for different handle for each message.
>>>>>
>>>>> Maybe it leads for new queue modify operation.
>>>>> So, queue doesn't send the state - just does configuration change on
>>>>> the
>>>> fly.
>>>>>
>>>>> What do you think?
>>>>
>>>> I think that configuration on the fly doesn't fly.
>>>> We would at least need to stop the device from processing the rings
>>>> for memory hotplug case, so why not just send a disable notification?
>>>
>>> Yes, driver need notification here.
>>>
>>>> And for the double callfd, that does not look right to me not to
>>>> request the driver to stop using it before it is closed, isn't it?
>>>
>>> Yes, and some drivers (include mlx5) may stop the traffic in this case too.
>>>
>>> modify\update operation will solve all:
>>>
>>> For example:
>>>
>>> In memory hotplug:
>>> Do new mmap
>>> Call modify
>>> Do munmup for old.
>>>
>>> In callfd\kickfd change:
>>>
>>> Set new FD.
>>> Call modify.
>>> Close old FD.
>>>
>>> Modify is clearer, save calls and faster (datapath will back faster).
>>
>> It should work, but that is not light modifications to do in set_mem_table
>> handler (the function is quite complex already with postcopy live-migration
>> support).
>>
>> With a modify callback, won't the driver part be more complex? Since it
>> would have to check which state has changed in the ring, and based on that
>> decide whether it should stop the ring or not.
>>
>> As you says that in case of memory hotplug and double callfd, the driver may
>> stop processing the rings anyway, so would it be that much faster than
>> disabling/enabling the vring?
>>
>> These events having a very rare occurrence, does it really matter if it is a bit
>> longer?
> 
> 
> Just thinking again about memory hotplug:
> 
> Mlx5 driver device need to be reinitialized in this case because the NIC has memory translation which must be updated before the virtqs creation.
> 
> So, maybe we need to close and config the vDPA device in this case.

Right, disabling vrings is not enough for memory hotplug.

It would make sense to call dev_close and dev_conf here, that's the most
conservative approach.

> @Xiao Wang, can you comment the IFC behavior here.
> 
> Matan
> 
> 
>> Thanks,
>> Maxime
>>
>>>
>>>>  Thanks,
>>>> Maxime
>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Maxime
>>>>>>>
>>>>>>> Matan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Maxime
>>>>>>>
>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state
  2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
                   ` (3 preceding siblings ...)
  2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
@ 2020-06-25 13:38 ` Matan Azrad
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
                     ` (5 more replies)
  4 siblings, 6 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Due to the issue described in "vhost: improve device ready definition"
patch here, we need to change the ready state definition in vhost device.

To support the suggestion improvement there is update for the host notifier control API.

Also need to skip access lock when vDPA device is configured.

Add also support for configuration change when the device is ready.


Matan Azrad (5):
  vhost: skip access lock when vDPA is configured
  vhost: improve device readiness notifications
  vhost: handle memory hotplug with vDPA devices
  vhost: notify virtq file descriptor update
  vdpa/mlx5: support queue update

 drivers/vdpa/mlx5/mlx5_vdpa.c       | 26 ------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 80 +++++++++++++++++++++++++-----------
 lib/librte_vhost/vhost.h            |  1 +
 lib/librte_vhost/vhost_user.c       | 81 ++++++++++++++++++++++++++++---------
 5 files changed, 126 insertions(+), 70 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
@ 2020-06-25 13:38   ` Matan Azrad
  2020-06-28  3:06     ` Xia, Chenbo
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

No need to take access lock in the vhost-user message handler when
vDPA driver controls all the data-path of the vhost device.

It allows the vDPA set_vring_state operation callback to configure
guest notifications.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 4e1af91..8d8050b 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2690,8 +2690,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	case VHOST_USER_SEND_RARP:
 	case VHOST_USER_NET_SET_MTU:
 	case VHOST_USER_SET_SLAVE_REQ_FD:
-		vhost_user_lock_all_queue_pairs(dev);
-		unlock_required = 1;
+		if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
+			vhost_user_lock_all_queue_pairs(dev);
+			unlock_required = 1;
+		}
 		break;
 	default:
 		break;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
@ 2020-06-25 13:38   ` Matan Azrad
  2020-06-26 12:10     ` Maxime Coquelin
  2020-06-28  3:08     ` Xia, Chenbo
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Some guest drivers may not configure disabled virtio queues.

In this case, the vhost management never notifies the application and
the vDPA device readiness because it waits to the device to be ready.

The current ready state means that all the virtio queues should be
configured regardless the enablement status.

In order to support this case, this patch changes the ready state:
The device is ready when at least 1 queue pair is configured and
enabled.

So, now, the application and vDPA driver are notifies when the first
queue pair is configured and enabled.

Also the queue notifications will be triggered according to the new
ready definition.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 55 +++++++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 17f1e9a..8a74f33 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -151,6 +151,7 @@ struct vhost_virtqueue {
 	int			backend;
 	int			enabled;
 	int			access_ok;
+	int			ready;
 	rte_spinlock_t		access_lock;
 
 	/* Used to notify the guest (trigger interrupt) */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8d8050b..b90fc78 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -228,6 +228,21 @@
 	dev->postcopy_listening = 0;
 }
 
+static void
+vhost_user_notify_queue_state(struct virtio_net *dev, uint16_t index,
+			      int enable)
+{
+	int did = dev->vdpa_dev_id;
+	struct rte_vdpa_device *vdpa_dev = rte_vdpa_get_device(did);
+
+	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
+		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
+
+	if (dev->notify_ops->vring_state_changed)
+		dev->notify_ops->vring_state_changed(dev->vid,
+				index, enable);
+}
+
 /*
  * This function just returns success at the moment unless
  * the device hasn't been initialised.
@@ -1306,27 +1321,31 @@
 
 	return rings_ok &&
 	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
-	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
+	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
+	       vq->enabled;
 }
 
+#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
+
 static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *vq;
 	uint32_t i;
 
-	if (dev->nr_vring == 0)
+	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
 		return 0;
 
-	for (i = 0; i < dev->nr_vring; i++) {
+	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
 		vq = dev->virtqueue[i];
 
 		if (!vq_is_ready(dev, vq))
 			return 0;
 	}
 
-	VHOST_LOG_CONFIG(INFO,
-		"virtio is now ready for processing.\n");
+	if (!(dev->flags & VIRTIO_DEV_RUNNING))
+		VHOST_LOG_CONFIG(INFO,
+			"virtio is now ready for processing.\n");
 	return 1;
 }
 
@@ -1970,8 +1989,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
 	int index = (int)msg->payload.state.index;
-	struct rte_vdpa_device *vdpa_dev;
-	int did = -1;
 
 	if (validate_msg_fds(msg, 0) != 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
@@ -1980,15 +1997,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, index);
 
-	did = dev->vdpa_dev_id;
-	vdpa_dev = rte_vdpa_get_device(did);
-	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
-		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
-
-	if (dev->notify_ops->vring_state_changed)
-		dev->notify_ops->vring_state_changed(dev->vid,
-				index, enable);
-
 	/* On disable, rings have to be stopped being processed. */
 	if (!enable && dev->dequeue_zero_copy)
 		drain_zmbuf_list(dev->virtqueue[index]);
@@ -2618,6 +2626,7 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	int unlock_required = 0;
 	bool handled;
 	int request;
+	uint32_t i;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -2793,6 +2802,17 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 		return -1;
 	}
 
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+		bool cur_ready = vq_is_ready(dev, vq);
+
+		if (cur_ready != (vq && vq->ready)) {
+			vhost_user_notify_queue_state(dev, i, cur_ready);
+			vq->ready = cur_ready;
+		}
+	}
+
+
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
 		dev->flags |= VIRTIO_DEV_READY;
 
@@ -2810,8 +2830,7 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	did = dev->vdpa_dev_id;
 	vdpa_dev = rte_vdpa_get_device(did);
 	if (vdpa_dev && virtio_is_ready(dev) &&
-			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
-			msg.request.master == VHOST_USER_SET_VRING_CALL) {
+	    !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
@ 2020-06-25 13:38   ` Matan Azrad
  2020-06-26 12:15     ` Maxime Coquelin
  2020-06-28  3:18     ` Xia, Chenbo
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Some vDPA drivers' basic configurations should be updated when the
guest memory is hotplugged.

Close vDPA device before hotplug operation and recreate it after the
hotplug operation is done.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_vhost/vhost_user.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b90fc78..f690fdb 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1073,6 +1073,15 @@
 	}
 
 	if (dev->mem) {
+		if (dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) {
+			int did = dev->vdpa_dev_id;
+			struct rte_vdpa_device *vdpa_dev =
+						rte_vdpa_get_device(did);
+
+			if (vdpa_dev && vdpa_dev->ops->dev_close)
+				vdpa_dev->ops->dev_close(dev->vid);
+			dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED;
+		}
 		free_mem_region(dev);
 		rte_free(dev->mem);
 		dev->mem = NULL;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
                     ` (2 preceding siblings ...)
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
@ 2020-06-25 13:38   ` Matan Azrad
  2020-06-26 12:19     ` Maxime Coquelin
  2020-06-28  3:19     ` Xia, Chenbo
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
  5 siblings, 2 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

When virtq call or kick file descriptors are changed in the device
configuration when the queue is ready, the application and the vDPA
driver should be notified to be aligned to the new file descriptors.

Notify the state to be disabled before the file descriptor update and
return it back to be enabled after the update.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 lib/librte_vhost/vhost_user.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f690fdb..f3966b6 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1624,6 +1624,12 @@
 		"vring call idx:%d file:%d\n", file.index, file.fd);
 
 	vq = dev->virtqueue[file.index];
+
+	if (vq->ready) {
+		vhost_user_notify_queue_state(dev, file.index, 0);
+		vq->ready = 0;
+	}
+
 	if (vq->callfd >= 0)
 		close(vq->callfd);
 
@@ -1882,6 +1888,11 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 				dev->vid, file.index, 1);
 	}
 
+	if (vq->ready) {
+		vhost_user_notify_queue_state(dev, file.index, 0);
+		vq->ready = 0;
+	}
+
 	if (vq->kickfd >= 0)
 		close(vq->kickfd);
 	vq->kickfd = file.fd;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
                     ` (3 preceding siblings ...)
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
@ 2020-06-25 13:38   ` Matan Azrad
  2020-06-26 12:29     ` Maxime Coquelin
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
  5 siblings, 1 reply; 59+ messages in thread
From: Matan Azrad @ 2020-06-25 13:38 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Last changes in vDPA device management by vhost library may cause queue
ready state update after the device configuration.

So, there is chance that some queue configuration information will be
known only after the device was configured.

Add support to reconfigure a queue after the device configuration
according to the queue state update and the configuration changes.

Adjust the host notifier and the guest notification configuration to be
per queue and to be applied in the enablement process.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 26 ------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 80 ++++++++++++++++++++++++++-----------
 3 files changed, 64 insertions(+), 50 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 51f3fe8..a2b1816 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -141,31 +141,6 @@
 }
 
 static int
-mlx5_vdpa_direct_db_prepare(struct mlx5_vdpa_priv *priv)
-{
-	int ret;
-
-	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid,
-						   RTE_VHOST_QUEUE_ALL, false);
-		if (ret != 0) {
-			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
-				"destroyed for device %d: %d.", priv->vid, ret);
-			return -1;
-		}
-		priv->direct_notifier = 0;
-	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, RTE_VHOST_QUEUE_ALL,
-					   true);
-	if (ret != 0)
-		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
-			" device %d: %d.", priv->vid, ret);
-	else
-		priv->direct_notifier = 1;
-	return 0;
-}
-
-static int
 mlx5_vdpa_features_set(int vid)
 {
 	int did = rte_vhost_get_vdpa_device_id(vid);
@@ -330,7 +305,6 @@
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %d.", did);
 	if (mlx5_vdpa_pd_create(priv) || mlx5_vdpa_mem_register(priv) ||
-	    mlx5_vdpa_direct_db_prepare(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index c0228b2..d4e405a 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,11 +73,18 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+enum {
+	MLX5_VDPA_NOTIFIER_STATE_DISABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ENABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ERR
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	uint8_t notifier_state;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -112,7 +119,6 @@ enum {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	uint8_t configured;
-	uint8_t direct_notifier; /* Whether direct notifier is on or off. */
 	uint64_t last_traffic_tic;
 	uint32_t last_total;
 	pthread_t timer_tid;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 4b4d019..3e61264 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -36,6 +36,17 @@
 		break;
 	} while (1);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
+		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
+			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
+		else
+			virtq->notifier_state =
+					       MLX5_VDPA_NOTIFIER_STATE_ENABLED;
+		DRV_LOG(INFO, "Virtq %u notifier state is %s.", virtq->index,
+			virtq->notifier_state ==
+				MLX5_VDPA_NOTIFIER_STATE_ENABLED ? "enabled" :
+								    "disabled");
+	}
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
@@ -79,6 +90,7 @@
 	memset(&virtq->reset, 0, sizeof(virtq->reset));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
 
@@ -87,10 +99,8 @@
 {
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < priv->nr_virtqs; i++)
 		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
-		priv->virtqs[i].enable = 0;
-	}
 	if (priv->tis) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
 		priv->tis = NULL;
@@ -143,6 +153,7 @@
 		DRV_LOG(ERR, "Failed to set virtq %d base.", index);
 		return -1;
 	}
+	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return 0;
 }
 
@@ -289,6 +300,7 @@
 	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->priv = priv;
@@ -297,10 +309,6 @@
 	virtq->intr_handle.fd = vq.kickfd;
 	if (virtq->intr_handle.fd == -1) {
 		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-		if (!priv->direct_notifier) {
-			DRV_LOG(ERR, "Virtq %d cannot be notified.", index);
-			goto error;
-		}
 	} else {
 		virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		if (rte_intr_callback_register(&virtq->intr_handle,
@@ -315,6 +323,8 @@
 				virtq->intr_handle.fd, index);
 		}
 	}
+	DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
+		index);
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
@@ -418,18 +428,35 @@
 		goto error;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		claim_zero(rte_vhost_enable_guest_notification(priv->vid, i,
-							       1));
-		if (mlx5_vdpa_virtq_setup(priv, i))
+	for (i = 0; i < nr_vring; i++)
+		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
 			goto error;
-	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
 	return -1;
 }
 
+static int
+mlx5_vdpa_virtq_is_modified(struct mlx5_vdpa_priv *priv,
+			    struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	int ret = rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq);
+
+	if (ret)
+		return -1;
+	if (vq.size != virtq->vq_size || vq.kickfd != virtq->intr_handle.fd)
+		return 1;
+	if (virtq->eqp.cq.cq) {
+		if (vq.callfd != virtq->eqp.cq.callfd)
+			return 1;
+	} else if (vq.callfd != -1) {
+		return 1;
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 {
@@ -438,26 +465,33 @@
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (virtq->enable == !!enable)
-		return 0;
 	if (!priv->configured) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-	if (enable) {
-		/* Configuration might have been updated - reconfigure virtq. */
-		if (virtq->virtq) {
-			ret = mlx5_vdpa_virtq_stop(priv, index);
-			if (ret)
-				DRV_LOG(WARNING, "Failed to stop virtq %d.",
-					index);
-			mlx5_vdpa_virtq_unset(virtq);
+	if (virtq->enable == !!enable) {
+		if (!enable)
+			return 0;
+		ret = mlx5_vdpa_virtq_is_modified(priv, virtq);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Virtq %d modify check failed.", index);
+			return -1;
 		}
+		if (ret == 0)
+			return 0;
+		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
+	}
+	if (virtq->virtq) {
+		ret = mlx5_vdpa_virtq_stop(priv, index);
+		if (ret)
+			DRV_LOG(WARNING, "Failed to stop virtq %d.", index);
+		mlx5_vdpa_virtq_unset(virtq);
+	}
+	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-			/* The only case virtq can stay invalid. */
 		}
 	}
 	virtq->enable = !!enable;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
@ 2020-06-26 12:10     ` Maxime Coquelin
  2020-06-28  3:08     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-26 12:10 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Xiao Wang



On 6/25/20 3:38 PM, Matan Azrad wrote:
> Some guest drivers may not configure disabled virtio queues.
> 
> In this case, the vhost management never notifies the application and
> the vDPA device readiness because it waits to the device to be ready.
> 
> The current ready state means that all the virtio queues should be
> configured regardless the enablement status.
> 
> In order to support this case, this patch changes the ready state:
> The device is ready when at least 1 queue pair is configured and
> enabled.
> 
> So, now, the application and vDPA driver are notifies when the first
> queue pair is configured and enabled.
> 
> Also the queue notifications will be triggered according to the new
> ready definition.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost.h      |  1 +
>  lib/librte_vhost/vhost_user.c | 55 +++++++++++++++++++++++++++++--------------
>  2 files changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 17f1e9a..8a74f33 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -151,6 +151,7 @@ struct vhost_virtqueue {
>  	int			backend;
>  	int			enabled;
>  	int			access_ok;
> +	int			ready;
>  	rte_spinlock_t		access_lock;
>  
>  	/* Used to notify the guest (trigger interrupt) */
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 8d8050b..b90fc78 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -228,6 +228,21 @@
>  	dev->postcopy_listening = 0;
>  }
>  
> +static void
> +vhost_user_notify_queue_state(struct virtio_net *dev, uint16_t index,
> +			      int enable)
> +{
> +	int did = dev->vdpa_dev_id;
> +	struct rte_vdpa_device *vdpa_dev = rte_vdpa_get_device(did);
> +
> +	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> +		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> +
> +	if (dev->notify_ops->vring_state_changed)
> +		dev->notify_ops->vring_state_changed(dev->vid,
> +				index, enable);
> +}
> +
>  /*
>   * This function just returns success at the moment unless
>   * the device hasn't been initialised.
> @@ -1306,27 +1321,31 @@
>  
>  	return rings_ok &&
>  	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
> -	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
> +	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
> +	       vq->enabled;
>  }
>  
> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u

(Thinking out loud) If for some reason it would cause issue with OVS-
DPDK or other application, it should be easy to only apply this new way
of initializing the device based on whether vdpa device is attached or
not.

>  static int
>  virtio_is_ready(struct virtio_net *dev)
>  {
>  	struct vhost_virtqueue *vq;
>  	uint32_t i;
>  
> -	if (dev->nr_vring == 0)
> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>  		return 0;
>  
> -	for (i = 0; i < dev->nr_vring; i++) {
> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>  		vq = dev->virtqueue[i];
>  
>  		if (!vq_is_ready(dev, vq))
>  			return 0;
>  	}
>  
> -	VHOST_LOG_CONFIG(INFO,
> -		"virtio is now ready for processing.\n");
> +	if (!(dev->flags & VIRTIO_DEV_RUNNING))
> +		VHOST_LOG_CONFIG(INFO,
> +			"virtio is now ready for processing.\n");
>  	return 1;
>  }


Patch looks good to me, thanks for working on it.


Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
@ 2020-06-26 12:15     ` Maxime Coquelin
  2020-06-28  3:18     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-26 12:15 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Xiao Wang



On 6/25/20 3:38 PM, Matan Azrad wrote:
> Some vDPA drivers' basic configurations should be updated when the
> guest memory is hotplugged.
> 
> Close vDPA device before hotplug operation and recreate it after the
> hotplug operation is done.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index b90fc78..f690fdb 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1073,6 +1073,15 @@
>  	}
>  
>  	if (dev->mem) {
> +		if (dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) {
> +			int did = dev->vdpa_dev_id;
> +			struct rte_vdpa_device *vdpa_dev =
> +						rte_vdpa_get_device(did);
> +
> +			if (vdpa_dev && vdpa_dev->ops->dev_close)
> +				vdpa_dev->ops->dev_close(dev->vid);
> +			dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED;
> +		}
>  		free_mem_region(dev);
>  		rte_free(dev->mem);
>  		dev->mem = NULL;
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
@ 2020-06-26 12:19     ` Maxime Coquelin
  2020-06-28  3:19     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-26 12:19 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Xiao Wang



On 6/25/20 3:38 PM, Matan Azrad wrote:
> When virtq call or kick file descriptors are changed in the device
> configuration when the queue is ready, the application and the vDPA
> driver should be notified to be aligned to the new file descriptors.
> 
> Notify the state to be disabled before the file descriptor update and
> return it back to be enabled after the update.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index f690fdb..f3966b6 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1624,6 +1624,12 @@
>  		"vring call idx:%d file:%d\n", file.index, file.fd);
>  
>  	vq = dev->virtqueue[file.index];
> +
> +	if (vq->ready) {
> +		vhost_user_notify_queue_state(dev, file.index, 0);
> +		vq->ready = 0;
> +	}
> +
>  	if (vq->callfd >= 0)
>  		close(vq->callfd);
>  
> @@ -1882,6 +1888,11 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
>  				dev->vid, file.index, 1);
>  	}
>  
> +	if (vq->ready) {
> +		vhost_user_notify_queue_state(dev, file.index, 0);
> +		vq->ready = 0;
> +	}
> +
>  	if (vq->kickfd >= 0)
>  		close(vq->kickfd);
>  	vq->kickfd = file.fd;
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
@ 2020-06-26 12:29     ` Maxime Coquelin
  0 siblings, 0 replies; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-26 12:29 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Xiao Wang



On 6/25/20 3:38 PM, Matan Azrad wrote:
> Last changes in vDPA device management by vhost library may cause queue
> ready state update after the device configuration.
> 
> So, there is chance that some queue configuration information will be
> known only after the device was configured.
> 
> Add support to reconfigure a queue after the device configuration
> according to the queue state update and the configuration changes.
> 
> Adjust the host notifier and the guest notification configuration to be
> per queue and to be applied in the enablement process.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/vdpa/mlx5/mlx5_vdpa.c       | 26 ------------
>  drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++-
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 80 ++++++++++++++++++++++++++-----------
>  3 files changed, 64 insertions(+), 50 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>


@Xiao, we'll need same thing inthe IFC dirver, i.e. check for state
change (new FDs) on enablement.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
@ 2020-06-28  3:06     ` Xia, Chenbo
  0 siblings, 0 replies; 59+ messages in thread
From: Xia, Chenbo @ 2020-06-28  3:06 UTC (permalink / raw)
  To: Matan Azrad, Maxime Coquelin; +Cc: dev, Wang, Xiao W


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Thursday, June 25, 2020 9:38 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: dev@dpdk.org; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is
> configured
> 
> No need to take access lock in the vhost-user message handler when vDPA
> driver controls all the data-path of the vhost device.
> 
> It allows the vDPA set_vring_state operation callback to configure guest
> notifications.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost_user.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index
> 4e1af91..8d8050b 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -2690,8 +2690,10 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
>  	case VHOST_USER_SEND_RARP:
>  	case VHOST_USER_NET_SET_MTU:
>  	case VHOST_USER_SET_SLAVE_REQ_FD:
> -		vhost_user_lock_all_queue_pairs(dev);
> -		unlock_required = 1;
> +		if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> +			vhost_user_lock_all_queue_pairs(dev);
> +			unlock_required = 1;
> +		}
>  		break;
>  	default:
>  		break;
> --
> 1.8.3.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
  2020-06-26 12:10     ` Maxime Coquelin
@ 2020-06-28  3:08     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Xia, Chenbo @ 2020-06-28  3:08 UTC (permalink / raw)
  To: Matan Azrad, Maxime Coquelin; +Cc: dev, Wang, Xiao W


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Thursday, June 25, 2020 9:38 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: dev@dpdk.org; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness
> notifications
> 
> Some guest drivers may not configure disabled virtio queues.
> 
> In this case, the vhost management never notifies the application and the vDPA
> device readiness because it waits to the device to be ready.
> 
> The current ready state means that all the virtio queues should be configured
> regardless the enablement status.
> 
> In order to support this case, this patch changes the ready state:
> The device is ready when at least 1 queue pair is configured and enabled.
> 
> So, now, the application and vDPA driver are notifies when the first queue pair is
> configured and enabled.
> 
> Also the queue notifications will be triggered according to the new ready
> definition.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost.h      |  1 +
>  lib/librte_vhost/vhost_user.c | 55 +++++++++++++++++++++++++++++------------
> --
>  2 files changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> 17f1e9a..8a74f33 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -151,6 +151,7 @@ struct vhost_virtqueue {
>  	int			backend;
>  	int			enabled;
>  	int			access_ok;
> +	int			ready;
>  	rte_spinlock_t		access_lock;
> 
>  	/* Used to notify the guest (trigger interrupt) */ diff --git
> a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index
> 8d8050b..b90fc78 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -228,6 +228,21 @@
>  	dev->postcopy_listening = 0;
>  }
> 
> +static void
> +vhost_user_notify_queue_state(struct virtio_net *dev, uint16_t index,
> +			      int enable)
> +{
> +	int did = dev->vdpa_dev_id;
> +	struct rte_vdpa_device *vdpa_dev = rte_vdpa_get_device(did);
> +
> +	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> +		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> +
> +	if (dev->notify_ops->vring_state_changed)
> +		dev->notify_ops->vring_state_changed(dev->vid,
> +				index, enable);
> +}
> +
>  /*
>   * This function just returns success at the moment unless
>   * the device hasn't been initialised.
> @@ -1306,27 +1321,31 @@
> 
>  	return rings_ok &&
>  	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
> -	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
> +	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
> +	       vq->enabled;
>  }
> 
> +#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
> +
>  static int
>  virtio_is_ready(struct virtio_net *dev)  {
>  	struct vhost_virtqueue *vq;
>  	uint32_t i;
> 
> -	if (dev->nr_vring == 0)
> +	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
>  		return 0;
> 
> -	for (i = 0; i < dev->nr_vring; i++) {
> +	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
>  		vq = dev->virtqueue[i];
> 
>  		if (!vq_is_ready(dev, vq))
>  			return 0;
>  	}
> 
> -	VHOST_LOG_CONFIG(INFO,
> -		"virtio is now ready for processing.\n");
> +	if (!(dev->flags & VIRTIO_DEV_RUNNING))
> +		VHOST_LOG_CONFIG(INFO,
> +			"virtio is now ready for processing.\n");
>  	return 1;
>  }
> 
> @@ -1970,8 +1989,6 @@ static int vhost_user_set_vring_err(struct virtio_net
> **pdev __rte_unused,
>  	struct virtio_net *dev = *pdev;
>  	int enable = (int)msg->payload.state.num;
>  	int index = (int)msg->payload.state.index;
> -	struct rte_vdpa_device *vdpa_dev;
> -	int did = -1;
> 
>  	if (validate_msg_fds(msg, 0) != 0)
>  		return RTE_VHOST_MSG_RESULT_ERR;
> @@ -1980,15 +1997,6 @@ static int vhost_user_set_vring_err(struct virtio_net
> **pdev __rte_unused,
>  		"set queue enable: %d to qp idx: %d\n",
>  		enable, index);
> 
> -	did = dev->vdpa_dev_id;
> -	vdpa_dev = rte_vdpa_get_device(did);
> -	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
> -		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
> -
> -	if (dev->notify_ops->vring_state_changed)
> -		dev->notify_ops->vring_state_changed(dev->vid,
> -				index, enable);
> -
>  	/* On disable, rings have to be stopped being processed. */
>  	if (!enable && dev->dequeue_zero_copy)
>  		drain_zmbuf_list(dev->virtqueue[index]);
> @@ -2618,6 +2626,7 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
>  	int unlock_required = 0;
>  	bool handled;
>  	int request;
> +	uint32_t i;
> 
>  	dev = get_device(vid);
>  	if (dev == NULL)
> @@ -2793,6 +2802,17 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
>  		return -1;
>  	}
> 
> +	for (i = 0; i < dev->nr_vring; i++) {
> +		struct vhost_virtqueue *vq = dev->virtqueue[i];
> +		bool cur_ready = vq_is_ready(dev, vq);
> +
> +		if (cur_ready != (vq && vq->ready)) {
> +			vhost_user_notify_queue_state(dev, i, cur_ready);
> +			vq->ready = cur_ready;
> +		}
> +	}
> +
> +
>  	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
>  		dev->flags |= VIRTIO_DEV_READY;
> 
> @@ -2810,8 +2830,7 @@ typedef int (*vhost_message_handler_t)(struct
> virtio_net **pdev,
>  	did = dev->vdpa_dev_id;
>  	vdpa_dev = rte_vdpa_get_device(did);
>  	if (vdpa_dev && virtio_is_ready(dev) &&
> -			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
> -			msg.request.master == VHOST_USER_SET_VRING_CALL)
> {
> +	    !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
>  		if (vdpa_dev->ops->dev_conf)
>  			vdpa_dev->ops->dev_conf(dev->vid);
>  		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
> --
> 1.8.3.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
  2020-06-26 12:15     ` Maxime Coquelin
@ 2020-06-28  3:18     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Xia, Chenbo @ 2020-06-28  3:18 UTC (permalink / raw)
  To: Matan Azrad, Maxime Coquelin; +Cc: dev, Wang, Xiao W


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Thursday, June 25, 2020 9:38 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: dev@dpdk.org; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA
> devices
> 
> Some vDPA drivers' basic configurations should be updated when the guest
> memory is hotplugged.
> 
> Close vDPA device before hotplug operation and recreate it after the hotplug
> operation is done.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index
> b90fc78..f690fdb 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1073,6 +1073,15 @@
>  	}
> 
>  	if (dev->mem) {
> +		if (dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) {
> +			int did = dev->vdpa_dev_id;
> +			struct rte_vdpa_device *vdpa_dev =
> +						rte_vdpa_get_device(did);
> +
> +			if (vdpa_dev && vdpa_dev->ops->dev_close)
> +				vdpa_dev->ops->dev_close(dev->vid);
> +			dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED;
> +		}

For now, this solution is general for all vendors. Later we may improve this by calling vendor's
callback as they may behave differently. For now this looks good to me :) (sorry that the reply
is late for memory hotplug discussion because of holiday). Thanks!

>  		free_mem_region(dev);
>  		rte_free(dev->mem);
>  		dev->mem = NULL;
> --
> 1.8.3.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
  2020-06-26 12:19     ` Maxime Coquelin
@ 2020-06-28  3:19     ` Xia, Chenbo
  1 sibling, 0 replies; 59+ messages in thread
From: Xia, Chenbo @ 2020-06-28  3:19 UTC (permalink / raw)
  To: Matan Azrad, Maxime Coquelin; +Cc: dev, Wang, Xiao W


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Matan Azrad
> Sent: Thursday, June 25, 2020 9:38 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: dev@dpdk.org; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update
> 
> When virtq call or kick file descriptors are changed in the device configuration
> when the queue is ready, the application and the vDPA driver should be notified
> to be aligned to the new file descriptors.
> 
> Notify the state to be disabled before the file descriptor update and return it
> back to be enabled after the update.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  lib/librte_vhost/vhost_user.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index
> f690fdb..f3966b6 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1624,6 +1624,12 @@
>  		"vring call idx:%d file:%d\n", file.index, file.fd);
> 
>  	vq = dev->virtqueue[file.index];
> +
> +	if (vq->ready) {
> +		vhost_user_notify_queue_state(dev, file.index, 0);
> +		vq->ready = 0;
> +	}
> +
>  	if (vq->callfd >= 0)
>  		close(vq->callfd);
> 
> @@ -1882,6 +1888,11 @@ static int vhost_user_set_vring_err(struct virtio_net
> **pdev __rte_unused,
>  				dev->vid, file.index, 1);
>  	}
> 
> +	if (vq->ready) {
> +		vhost_user_notify_queue_state(dev, file.index, 0);
> +		vq->ready = 0;
> +	}
> +
>  	if (vq->kickfd >= 0)
>  		close(vq->kickfd);
>  	vq->kickfd = file.fd;
> --
> 1.8.3.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 0/6]  vhost: improve ready state
  2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
                     ` (4 preceding siblings ...)
  2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
@ 2020-06-29 14:08   ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
                       ` (6 more replies)
  5 siblings, 7 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Due to the issue described in "vhost: improve device ready definition"
patch here, we need to change the ready state definition in vhost device.

To support the suggestion improvement there is update for the host notifier control API.

Also need to skip access lock when vDPA device is configured.

Add also support for configuration change when the device is ready.

v3:
Rebase.
Add missing commit: vhost: support host notifier queue configuration.

Matan Azrad (6):
  vhost: support host notifier queue configuration
  vhost: skip access lock when vDPA is configured
  vhost: improve device readiness notifications
  vhost: handle memory hotplug with vDPA devices
  vhost: notify virtq file descriptor update
  vdpa/mlx5: support queue update

 doc/guides/rel_notes/release_20_08.rst |  3 ++
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +--
 drivers/vdpa/mlx5/mlx5_vdpa.c          | 24 ---------
 drivers/vdpa/mlx5/mlx5_vdpa.h          |  8 ++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 80 +++++++++++++++++++--------
 lib/librte_vhost/rte_vdpa.h            |  8 ++-
 lib/librte_vhost/rte_vhost.h           |  1 -
 lib/librte_vhost/vhost.h               |  1 +
 lib/librte_vhost/vhost_user.c          | 99 +++++++++++++++++++++++++---------
 9 files changed, 152 insertions(+), 78 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:

The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.

A `qid` parameter was added to the API arguments list.

Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the
host notifier to all the device queues as done before this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_20_08.rst |  3 +++
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
 drivers/vdpa/mlx5/mlx5_vdpa.c          |  6 ++++--
 lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
 lib/librte_vhost/rte_vhost.h           |  1 -
 lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
 6 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 44383b8..2d5a3f7 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -125,6 +125,9 @@ API Changes
 
 * ``rte_page_sizes`` enumeration is replaced with ``RTE_PGSIZE_xxx`` defines.
 
+* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
+  queue and not per device, a qid parameter was added to the arguments list.
+
 
 ABI Changes
 -----------
diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
index ec97178..6a2fed3 100644
--- a/drivers/vdpa/ifc/ifcvf_vdpa.c
+++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
@@ -839,7 +839,7 @@ struct internal_list {
 	vdpa_ifcvf_stop(internal);
 	vdpa_disable_vfio_intr(internal);
 
-	ret = rte_vhost_host_notifier_ctrl(vid, false);
+	ret = rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false);
 	if (ret && ret != -ENOTSUP)
 		goto error;
 
@@ -858,7 +858,7 @@ struct internal_list {
 	if (ret)
 		goto stop_vf;
 
-	rte_vhost_host_notifier_ctrl(vid, true);
+	rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true);
 
 	internal->sw_fallback_running = true;
 
@@ -893,7 +893,7 @@ struct internal_list {
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
-	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+	if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0)
 		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
 
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 159653f..97f87c5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -146,7 +146,8 @@
 	int ret;
 
 	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
+		ret = rte_vhost_host_notifier_ctrl(priv->vid,
+						   RTE_VHOST_QUEUE_ALL, false);
 		if (ret != 0) {
 			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
 				"destroyed for device %d: %d.", priv->vid, ret);
@@ -154,7 +155,8 @@
 		}
 		priv->direct_notifier = 0;
 	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
+	ret = rte_vhost_host_notifier_ctrl(priv->vid, RTE_VHOST_QUEUE_ALL,
+					   true);
 	if (ret != 0)
 		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
 			" device %d: %d.", priv->vid, ret);
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index ecb3d91..fd42085 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -202,22 +202,26 @@ struct rte_vdpa_device *
 int
 rte_vdpa_get_device_num(void);
 
+#define RTE_VHOST_QUEUE_ALL UINT16_MAX
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
- * Enable/Disable host notifier mapping for a vdpa port.
+ * Enable/Disable host notifier mapping for a vdpa queue.
  *
  * @param vid
  *  vhost device id
  * @param enable
  *  true for host notifier map, false for host notifier unmap
+ * @param qid
+ *  vhost queue id, RTE_VHOST_QUEUE_ALL to configure all the device queues
  * @return
  *  0 on success, -1 on failure
  */
 __rte_experimental
 int
-rte_vhost_host_notifier_ctrl(int vid, bool enable);
+rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
 
 /**
  * @warning
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 329ed8a..1ac7eaf 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -107,7 +107,6 @@
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 #endif
 
-
 /**
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ea9cd10..4e1af91 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2951,13 +2951,13 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int rte_vhost_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
 	int vfio_device_fd, did, ret = 0;
 	uint64_t offset, size;
-	unsigned int i;
+	unsigned int i, q_start, q_last;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -2981,6 +2981,16 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 	if (!vdpa_dev)
 		return -ENODEV;
 
+	if (qid == RTE_VHOST_QUEUE_ALL) {
+		q_start = 0;
+		q_last = dev->nr_vring - 1;
+	} else {
+		if (qid >= dev->nr_vring)
+			return -EINVAL;
+		q_start = qid;
+		q_last = qid;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_vfio_device_fd, -ENOTSUP);
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_notify_area, -ENOTSUP);
 
@@ -2989,7 +2999,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		return -ENOTSUP;
 
 	if (enable) {
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			if (vdpa_dev->ops->get_notify_area(vid, i, &offset,
 					&size) < 0) {
 				ret = -ENOTSUP;
@@ -3004,7 +3014,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		}
 	} else {
 disable:
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			vhost_user_slave_set_vring_host_notifier(dev, i, -1,
 					0, 0);
 		}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

No need to take access lock in the vhost-user message handler when
vDPA driver controls all the data-path of the vhost device.

It allows the vDPA set_vring_state operation callback to configure
guest notifications.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 4e1af91..8d8050b 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2690,8 +2690,10 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	case VHOST_USER_SEND_RARP:
 	case VHOST_USER_NET_SET_MTU:
 	case VHOST_USER_SET_SLAVE_REQ_FD:
-		vhost_user_lock_all_queue_pairs(dev);
-		unlock_required = 1;
+		if (!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
+			vhost_user_lock_all_queue_pairs(dev);
+			unlock_required = 1;
+		}
 		break;
 	default:
 		break;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Some guest drivers may not configure disabled virtio queues.

In this case, the vhost management never notifies the application and
the vDPA device readiness because it waits to the device to be ready.

The current ready state means that all the virtio queues should be
configured regardless the enablement status.

In order to support this case, this patch changes the ready state:
The device is ready when at least 1 queue pair is configured and
enabled.

So, now, the application and vDPA driver are notifies when the first
queue pair is configured and enabled.

Also the queue notifications will be triggered according to the new
ready definition.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 55 +++++++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 17f1e9a..8a74f33 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -151,6 +151,7 @@ struct vhost_virtqueue {
 	int			backend;
 	int			enabled;
 	int			access_ok;
+	int			ready;
 	rte_spinlock_t		access_lock;
 
 	/* Used to notify the guest (trigger interrupt) */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8d8050b..b90fc78 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -228,6 +228,21 @@
 	dev->postcopy_listening = 0;
 }
 
+static void
+vhost_user_notify_queue_state(struct virtio_net *dev, uint16_t index,
+			      int enable)
+{
+	int did = dev->vdpa_dev_id;
+	struct rte_vdpa_device *vdpa_dev = rte_vdpa_get_device(did);
+
+	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
+		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
+
+	if (dev->notify_ops->vring_state_changed)
+		dev->notify_ops->vring_state_changed(dev->vid,
+				index, enable);
+}
+
 /*
  * This function just returns success at the moment unless
  * the device hasn't been initialised.
@@ -1306,27 +1321,31 @@
 
 	return rings_ok &&
 	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
-	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
+	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
+	       vq->enabled;
 }
 
+#define VIRTIO_DEV_NUM_VQS_TO_BE_READY 2u
+
 static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *vq;
 	uint32_t i;
 
-	if (dev->nr_vring == 0)
+	if (dev->nr_vring < VIRTIO_DEV_NUM_VQS_TO_BE_READY)
 		return 0;
 
-	for (i = 0; i < dev->nr_vring; i++) {
+	for (i = 0; i < VIRTIO_DEV_NUM_VQS_TO_BE_READY; i++) {
 		vq = dev->virtqueue[i];
 
 		if (!vq_is_ready(dev, vq))
 			return 0;
 	}
 
-	VHOST_LOG_CONFIG(INFO,
-		"virtio is now ready for processing.\n");
+	if (!(dev->flags & VIRTIO_DEV_RUNNING))
+		VHOST_LOG_CONFIG(INFO,
+			"virtio is now ready for processing.\n");
 	return 1;
 }
 
@@ -1970,8 +1989,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
 	int index = (int)msg->payload.state.index;
-	struct rte_vdpa_device *vdpa_dev;
-	int did = -1;
 
 	if (validate_msg_fds(msg, 0) != 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
@@ -1980,15 +1997,6 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, index);
 
-	did = dev->vdpa_dev_id;
-	vdpa_dev = rte_vdpa_get_device(did);
-	if (vdpa_dev && vdpa_dev->ops->set_vring_state)
-		vdpa_dev->ops->set_vring_state(dev->vid, index, enable);
-
-	if (dev->notify_ops->vring_state_changed)
-		dev->notify_ops->vring_state_changed(dev->vid,
-				index, enable);
-
 	/* On disable, rings have to be stopped being processed. */
 	if (!enable && dev->dequeue_zero_copy)
 		drain_zmbuf_list(dev->virtqueue[index]);
@@ -2618,6 +2626,7 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	int unlock_required = 0;
 	bool handled;
 	int request;
+	uint32_t i;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -2793,6 +2802,17 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 		return -1;
 	}
 
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+		bool cur_ready = vq_is_ready(dev, vq);
+
+		if (cur_ready != (vq && vq->ready)) {
+			vhost_user_notify_queue_state(dev, i, cur_ready);
+			vq->ready = cur_ready;
+		}
+	}
+
+
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
 		dev->flags |= VIRTIO_DEV_READY;
 
@@ -2810,8 +2830,7 @@ typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
 	did = dev->vdpa_dev_id;
 	vdpa_dev = rte_vdpa_get_device(did);
 	if (vdpa_dev && virtio_is_ready(dev) &&
-			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
-			msg.request.master == VHOST_USER_SET_VRING_CALL) {
+	    !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
                       ` (2 preceding siblings ...)
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Some vDPA drivers' basic configurations should be updated when the
guest memory is hotplugged.

Close vDPA device before hotplug operation and recreate it after the
hotplug operation is done.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost_user.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index b90fc78..f690fdb 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1073,6 +1073,15 @@
 	}
 
 	if (dev->mem) {
+		if (dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) {
+			int did = dev->vdpa_dev_id;
+			struct rte_vdpa_device *vdpa_dev =
+						rte_vdpa_get_device(did);
+
+			if (vdpa_dev && vdpa_dev->ops->dev_close)
+				vdpa_dev->ops->dev_close(dev->vid);
+			dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED;
+		}
 		free_mem_region(dev);
 		rte_free(dev->mem);
 		dev->mem = NULL;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
                       ` (3 preceding siblings ...)
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
  2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

When virtq call or kick file descriptors are changed in the device
configuration when the queue is ready, the application and the vDPA
driver should be notified to be aligned to the new file descriptors.

Notify the state to be disabled before the file descriptor update and
return it back to be enabled after the update.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost_user.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f690fdb..f3966b6 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1624,6 +1624,12 @@
 		"vring call idx:%d file:%d\n", file.index, file.fd);
 
 	vq = dev->virtqueue[file.index];
+
+	if (vq->ready) {
+		vhost_user_notify_queue_state(dev, file.index, 0);
+		vq->ready = 0;
+	}
+
 	if (vq->callfd >= 0)
 		close(vq->callfd);
 
@@ -1882,6 +1888,11 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 				dev->vid, file.index, 1);
 	}
 
+	if (vq->ready) {
+		vhost_user_notify_queue_state(dev, file.index, 0);
+		vq->ready = 0;
+	}
+
 	if (vq->kickfd >= 0)
 		close(vq->kickfd);
 	vq->kickfd = file.fd;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
                       ` (4 preceding siblings ...)
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
@ 2020-06-29 14:08     ` Matan Azrad
  2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
  6 siblings, 0 replies; 59+ messages in thread
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

Last changes in vDPA device management by vhost library may cause queue
ready state update after the device configuration.

So, there is chance that some queue configuration information will be
known only after the device was configured.

Add support to reconfigure a queue after the device configuration
according to the queue state update and the configuration changes.

Adjust the host notifier and the guest notification configuration to be
per queue and to be applied in the enablement process.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/vdpa/mlx5/mlx5_vdpa.c       | 26 ------------
 drivers/vdpa/mlx5/mlx5_vdpa.h       |  8 +++-
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 80 ++++++++++++++++++++++++++-----------
 3 files changed, 64 insertions(+), 50 deletions(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 97f87c5..2aa168d 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -141,31 +141,6 @@
 }
 
 static int
-mlx5_vdpa_direct_db_prepare(struct mlx5_vdpa_priv *priv)
-{
-	int ret;
-
-	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid,
-						   RTE_VHOST_QUEUE_ALL, false);
-		if (ret != 0) {
-			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
-				"destroyed for device %d: %d.", priv->vid, ret);
-			return -1;
-		}
-		priv->direct_notifier = 0;
-	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, RTE_VHOST_QUEUE_ALL,
-					   true);
-	if (ret != 0)
-		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
-			" device %d: %d.", priv->vid, ret);
-	else
-		priv->direct_notifier = 1;
-	return 0;
-}
-
-static int
 mlx5_vdpa_features_set(int vid)
 {
 	int did = rte_vhost_get_vdpa_device_id(vid);
@@ -330,7 +305,6 @@
 	if (mlx5_vdpa_mtu_set(priv))
 		DRV_LOG(WARNING, "MTU cannot be set on device %d.", did);
 	if (mlx5_vdpa_pd_create(priv) || mlx5_vdpa_mem_register(priv) ||
-	    mlx5_vdpa_direct_db_prepare(priv) ||
 	    mlx5_vdpa_virtqs_prepare(priv) || mlx5_vdpa_steer_setup(priv) ||
 	    mlx5_vdpa_cqe_event_setup(priv)) {
 		mlx5_vdpa_dev_close(vid);
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.h b/drivers/vdpa/mlx5/mlx5_vdpa.h
index 2ee5aae..8f349d4 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.h
@@ -73,11 +73,18 @@ struct mlx5_vdpa_query_mr {
 	int is_indirect;
 };
 
+enum {
+	MLX5_VDPA_NOTIFIER_STATE_DISABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ENABLED,
+	MLX5_VDPA_NOTIFIER_STATE_ERR
+};
+
 struct mlx5_vdpa_virtq {
 	SLIST_ENTRY(mlx5_vdpa_virtq) next;
 	uint8_t enable;
 	uint16_t index;
 	uint16_t vq_size;
+	uint8_t notifier_state;
 	struct mlx5_vdpa_priv *priv;
 	struct mlx5_devx_obj *virtq;
 	struct mlx5_devx_obj *counters;
@@ -112,7 +119,6 @@ enum {
 struct mlx5_vdpa_priv {
 	TAILQ_ENTRY(mlx5_vdpa_priv) next;
 	uint8_t configured;
-	uint8_t direct_notifier; /* Whether direct notifier is on or off. */
 	uint64_t last_traffic_tic;
 	pthread_t timer_tid;
 	pthread_mutex_t timer_lock;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index 4b4d019..3e61264 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -36,6 +36,17 @@
 		break;
 	} while (1);
 	rte_write32(virtq->index, priv->virtq_db_addr);
+	if (virtq->notifier_state == MLX5_VDPA_NOTIFIER_STATE_DISABLED) {
+		if (rte_vhost_host_notifier_ctrl(priv->vid, virtq->index, true))
+			virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_ERR;
+		else
+			virtq->notifier_state =
+					       MLX5_VDPA_NOTIFIER_STATE_ENABLED;
+		DRV_LOG(INFO, "Virtq %u notifier state is %s.", virtq->index,
+			virtq->notifier_state ==
+				MLX5_VDPA_NOTIFIER_STATE_ENABLED ? "enabled" :
+								    "disabled");
+	}
 	DRV_LOG(DEBUG, "Ring virtq %u doorbell.", virtq->index);
 }
 
@@ -79,6 +90,7 @@
 	memset(&virtq->reset, 0, sizeof(virtq->reset));
 	if (virtq->eqp.fw_qp)
 		mlx5_vdpa_event_qp_destroy(&virtq->eqp);
+	virtq->notifier_state = MLX5_VDPA_NOTIFIER_STATE_DISABLED;
 	return 0;
 }
 
@@ -87,10 +99,8 @@
 {
 	int i;
 
-	for (i = 0; i < priv->nr_virtqs; i++) {
+	for (i = 0; i < priv->nr_virtqs; i++)
 		mlx5_vdpa_virtq_unset(&priv->virtqs[i]);
-		priv->virtqs[i].enable = 0;
-	}
 	if (priv->tis) {
 		claim_zero(mlx5_devx_cmd_destroy(priv->tis));
 		priv->tis = NULL;
@@ -143,6 +153,7 @@
 		DRV_LOG(ERR, "Failed to set virtq %d base.", index);
 		return -1;
 	}
+	DRV_LOG(DEBUG, "vid %u virtq %u was stopped.", priv->vid, index);
 	return 0;
 }
 
@@ -289,6 +300,7 @@
 	virtq->priv = priv;
 	if (!virtq->virtq)
 		goto error;
+	claim_zero(rte_vhost_enable_guest_notification(priv->vid, index, 1));
 	if (mlx5_vdpa_virtq_modify(virtq, 1))
 		goto error;
 	virtq->priv = priv;
@@ -297,10 +309,6 @@
 	virtq->intr_handle.fd = vq.kickfd;
 	if (virtq->intr_handle.fd == -1) {
 		DRV_LOG(WARNING, "Virtq %d kickfd is invalid.", index);
-		if (!priv->direct_notifier) {
-			DRV_LOG(ERR, "Virtq %d cannot be notified.", index);
-			goto error;
-		}
 	} else {
 		virtq->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		if (rte_intr_callback_register(&virtq->intr_handle,
@@ -315,6 +323,8 @@
 				virtq->intr_handle.fd, index);
 		}
 	}
+	DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
+		index);
 	return 0;
 error:
 	mlx5_vdpa_virtq_unset(virtq);
@@ -418,18 +428,35 @@
 		goto error;
 	}
 	priv->nr_virtqs = nr_vring;
-	for (i = 0; i < nr_vring; i++) {
-		claim_zero(rte_vhost_enable_guest_notification(priv->vid, i,
-							       1));
-		if (mlx5_vdpa_virtq_setup(priv, i))
+	for (i = 0; i < nr_vring; i++)
+		if (priv->virtqs[i].enable && mlx5_vdpa_virtq_setup(priv, i))
 			goto error;
-	}
 	return 0;
 error:
 	mlx5_vdpa_virtqs_release(priv);
 	return -1;
 }
 
+static int
+mlx5_vdpa_virtq_is_modified(struct mlx5_vdpa_priv *priv,
+			    struct mlx5_vdpa_virtq *virtq)
+{
+	struct rte_vhost_vring vq;
+	int ret = rte_vhost_get_vhost_vring(priv->vid, virtq->index, &vq);
+
+	if (ret)
+		return -1;
+	if (vq.size != virtq->vq_size || vq.kickfd != virtq->intr_handle.fd)
+		return 1;
+	if (virtq->eqp.cq.cq) {
+		if (vq.callfd != virtq->eqp.cq.callfd)
+			return 1;
+	} else if (vq.callfd != -1) {
+		return 1;
+	}
+	return 0;
+}
+
 int
 mlx5_vdpa_virtq_enable(struct mlx5_vdpa_priv *priv, int index, int enable)
 {
@@ -438,26 +465,33 @@
 
 	DRV_LOG(INFO, "Update virtq %d status %sable -> %sable.", index,
 		virtq->enable ? "en" : "dis", enable ? "en" : "dis");
-	if (virtq->enable == !!enable)
-		return 0;
 	if (!priv->configured) {
 		virtq->enable = !!enable;
 		return 0;
 	}
-	if (enable) {
-		/* Configuration might have been updated - reconfigure virtq. */
-		if (virtq->virtq) {
-			ret = mlx5_vdpa_virtq_stop(priv, index);
-			if (ret)
-				DRV_LOG(WARNING, "Failed to stop virtq %d.",
-					index);
-			mlx5_vdpa_virtq_unset(virtq);
+	if (virtq->enable == !!enable) {
+		if (!enable)
+			return 0;
+		ret = mlx5_vdpa_virtq_is_modified(priv, virtq);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Virtq %d modify check failed.", index);
+			return -1;
 		}
+		if (ret == 0)
+			return 0;
+		DRV_LOG(INFO, "Virtq %d was modified, recreate it.", index);
+	}
+	if (virtq->virtq) {
+		ret = mlx5_vdpa_virtq_stop(priv, index);
+		if (ret)
+			DRV_LOG(WARNING, "Failed to stop virtq %d.", index);
+		mlx5_vdpa_virtq_unset(virtq);
+	}
+	if (enable) {
 		ret = mlx5_vdpa_virtq_setup(priv, index);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to setup virtq %d.", index);
 			return ret;
-			/* The only case virtq can stay invalid. */
 		}
 	}
 	virtq->enable = !!enable;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state
  2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
                       ` (5 preceding siblings ...)
  2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
@ 2020-06-29 17:24     ` Maxime Coquelin
  2020-07-17  1:41       ` Wang, Yinan
  6 siblings, 1 reply; 59+ messages in thread
From: Maxime Coquelin @ 2020-06-29 17:24 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Xiao Wang



On 6/29/20 4:08 PM, Matan Azrad wrote:
> Due to the issue described in "vhost: improve device ready definition"
> patch here, we need to change the ready state definition in vhost device.
> 
> To support the suggestion improvement there is update for the host notifier control API.
> 
> Also need to skip access lock when vDPA device is configured.
> 
> Add also support for configuration change when the device is ready.
> 
> v3:
> Rebase.
> Add missing commit: vhost: support host notifier queue configuration.
> 
> Matan Azrad (6):
>   vhost: support host notifier queue configuration
>   vhost: skip access lock when vDPA is configured
>   vhost: improve device readiness notifications
>   vhost: handle memory hotplug with vDPA devices
>   vhost: notify virtq file descriptor update
>   vdpa/mlx5: support queue update
> 
>  doc/guides/rel_notes/release_20_08.rst |  3 ++
>  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +--
>  drivers/vdpa/mlx5/mlx5_vdpa.c          | 24 ---------
>  drivers/vdpa/mlx5/mlx5_vdpa.h          |  8 ++-
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 80 +++++++++++++++++++--------
>  lib/librte_vhost/rte_vdpa.h            |  8 ++-
>  lib/librte_vhost/rte_vhost.h           |  1 -
>  lib/librte_vhost/vhost.h               |  1 +
>  lib/librte_vhost/vhost_user.c          | 99 +++++++++++++++++++++++++---------
>  9 files changed, 152 insertions(+), 78 deletions(-)
> 

Applied to dpdk-next-virtio/master

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state
  2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
@ 2020-07-17  1:41       ` Wang, Yinan
  0 siblings, 0 replies; 59+ messages in thread
From: Wang, Yinan @ 2020-07-17  1:41 UTC (permalink / raw)
  To: Maxime Coquelin, Matan Azrad
  Cc: dev, Wang, Xiao W, Xia, Chenbo, Wang, Zhihong, Liu, Yong

Hi Matan,

In our daily regression test, we found this patch set will decrease virtio performance and causes virtio interrupt not working properly in multi queue scenarios.  
Could you help to take a look?
Details in Bugzilla  https://bugs.dpdk.org/show_bug.cgi?id=507, btw, pls use low qemu version(qemu<4.1) to reproduce the second issue since the case blocked by one qemu bug with high qemu version.

BR,
Yinan

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Maxime Coquelin
> Sent: 2020年6月30日 1:24
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org; Wang, Xiao W <xiao.w.wang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state
> 
> 
> 
> On 6/29/20 4:08 PM, Matan Azrad wrote:
> > Due to the issue described in "vhost: improve device ready definition"
> > patch here, we need to change the ready state definition in vhost device.
> >
> > To support the suggestion improvement there is update for the host notifier
> control API.
> >
> > Also need to skip access lock when vDPA device is configured.
> >
> > Add also support for configuration change when the device is ready.
> >
> > v3:
> > Rebase.
> > Add missing commit: vhost: support host notifier queue configuration.
> >
> > Matan Azrad (6):
> >   vhost: support host notifier queue configuration
> >   vhost: skip access lock when vDPA is configured
> >   vhost: improve device readiness notifications
> >   vhost: handle memory hotplug with vDPA devices
> >   vhost: notify virtq file descriptor update
> >   vdpa/mlx5: support queue update
> >
> >  doc/guides/rel_notes/release_20_08.rst |  3 ++
> >  drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +--
> >  drivers/vdpa/mlx5/mlx5_vdpa.c          | 24 ---------
> >  drivers/vdpa/mlx5/mlx5_vdpa.h          |  8 ++-
> >  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c    | 80 +++++++++++++++++++--------
> >  lib/librte_vhost/rte_vdpa.h            |  8 ++-
> >  lib/librte_vhost/rte_vhost.h           |  1 -
> >  lib/librte_vhost/vhost.h               |  1 +
> >  lib/librte_vhost/vhost_user.c          | 99 +++++++++++++++++++++++++---------
> >  9 files changed, 152 insertions(+), 78 deletions(-)
> >
> 
> Applied to dpdk-next-virtio/master
> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2020-07-17  1:42 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44   ` Maxime Coquelin
2020-06-19 13:28     ` Matan Azrad
2020-06-19 14:01       ` Maxime Coquelin
2020-06-21  6:26         ` Matan Azrad
2020-06-22  8:06           ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19  6:49   ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19  7:41   ` Maxime Coquelin
2020-06-19 12:04     ` Maxime Coquelin
2020-06-19 13:11     ` Matan Azrad
2020-06-19 13:54       ` Maxime Coquelin
2020-06-21  6:20         ` Matan Azrad
2020-06-22  8:04           ` Maxime Coquelin
2020-06-22  8:41             ` Matan Azrad
2020-06-22  8:56               ` Maxime Coquelin
2020-06-22 10:06                 ` Matan Azrad
2020-06-22 12:32                   ` Maxime Coquelin
2020-06-22 13:43                     ` Matan Azrad
2020-06-22 14:55                       ` Maxime Coquelin
2020-06-22 15:51                         ` Matan Azrad
2020-06-22 16:47                           ` Maxime Coquelin
2020-06-23  9:02                             ` Matan Azrad
2020-06-23  9:19                               ` Maxime Coquelin
2020-06-23 11:53                                 ` Matan Azrad
2020-06-23 13:55                                   ` Maxime Coquelin
2020-06-23 14:33                                     ` Maxime Coquelin
2020-06-23 14:52                                     ` Matan Azrad
2020-06-23 15:18                                       ` Maxime Coquelin
2020-06-24  5:54                                         ` Matan Azrad
2020-06-24  7:22                                           ` Maxime Coquelin
2020-06-24  8:38                                             ` Matan Azrad
2020-06-24  9:12                                               ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28  3:06     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10     ` Maxime Coquelin
2020-06-28  3:08     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15     ` Maxime Coquelin
2020-06-28  3:18     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19     ` Maxime Coquelin
2020-06-28  3:19     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29     ` Maxime Coquelin
2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17  1:41       ` Wang, Yinan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).