[PATCH v3 00/28] Add VDUSE support to Vhost library

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH v3 00/28] Add VDUSE support to Vhost library
@ 2023-05-25 16:25 Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 01/28] vhost: fix missing guest notif stat increment Maxime Coquelin
                   ` (28 more replies)
  0 siblings, 29 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

Note: v2 is identical to v3, it is just a resend because
of an issue when posting v2 breaking the series in patchwork.

This series introduces a new type of backend, VDUSE,
to the Vhost library.

VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.

Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.

It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.

While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:

https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc

In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.

Benchmark results:
==================

On this v2, PVP reference benchmark has been run & compared with
Vhost-user.

When doing macswap forwarding in the worload, no difference is seen.
When doing io forwarding in the workload, we see 4% performance
degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
explained by the use of the IOTLB layer in the Vhost-library when using
VDUSE, whereas Vhost-user/Virtio-user does not make use of it.

Usage:
======

1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa

2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build

3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
 
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff

5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
  ######################## NIC statistics for port 0  ########################
  RX-packets: 11         RX-missed: 0          RX-bytes:  1482
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 1          TX-errors: 0          TX-bytes:  62

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################
testpmd> stop

6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0

7. Quit testpmd
testpmd> quit

Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
 -> a temporary patch to support reconnection via a tmpfs file is available,
    upstream solution would be in-kernel and is being developed.
 -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
- Support packed ring
- Provide more performance benchmark results

Changes in v2/v3:
=================
- Fixed mem_set_dump() parameter (patch 4)
- Fixed accidental comment change (patch 7, Chenbo)
- Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
- move change from patch 12 to 13 (Chenbo)
- Enable locks annotation for control queue (Patch 17)
- Send control queue notification when used descriptors enqueued (Patch 17)
- Lock control queue IOTLB lock (Patch 17)
- Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
- Set VDUSE dev FD as NONBLOCK (Patch 18)
- Enable more Virtio features (Patch 18)
- Remove calls to pthread_setcancelstate() (Patch 22)
- Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
- Use RTE_DIM() to get requests string array size (Patch 22)
- Set reply result for IOTLB update message (Patch 25, Chenbo)
- Fix queues enablement with multiqueue (Patch 26)
- Move kickfd creation for better logging (Patch 26)
- Improve logging (Patch 26)
- Uninstall cvq kickfd in case of handler installation failure (Patch 27)
- Enable CVQ notifications once handler is installed (Patch 27)
- Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
- Add release notes

Maxime Coquelin (28):
  vhost: fix missing guest notif stat increment
  vhost: fix invalid call FD handling
  vhost: fix IOTLB entries overlap check with previous entry
  vhost: add helper of IOTLB entries coredump
  vhost: add helper for IOTLB entries shared page check
  vhost: don't dump unneeded pages with IOTLB
  vhost: change to single IOTLB cache per device
  vhost: add offset field to IOTLB entries
  vhost: add page size info to IOTLB entry
  vhost: retry translating IOVA after IOTLB miss
  vhost: introduce backend ops
  vhost: add IOTLB cache entry removal callback
  vhost: add helper for IOTLB misses
  vhost: add helper for interrupt injection
  vhost: add API to set max queue pairs
  net/vhost: use API to set max queue pairs
  vhost: add control virtqueue support
  vhost: add VDUSE device creation and destruction
  vhost: add VDUSE callback for IOTLB miss
  vhost: add VDUSE callback for IOTLB entry removal
  vhost: add VDUSE callback for IRQ injection
  vhost: add VDUSE events handler
  vhost: add support for virtqueue state get event
  vhost: add support for VDUSE status set event
  vhost: add support for VDUSE IOTLB update event
  vhost: add VDUSE device startup
  vhost: add multiqueue support to VDUSE
  vhost: add VDUSE device stop

 doc/guides/prog_guide/vhost_lib.rst    |   4 +
 doc/guides/rel_notes/release_23_07.rst |  11 +
 drivers/net/vhost/rte_eth_vhost.c      |   3 +
 lib/vhost/iotlb.c                      | 333 +++++++------
 lib/vhost/iotlb.h                      |  45 +-
 lib/vhost/meson.build                  |   5 +
 lib/vhost/rte_vhost.h                  |  17 +
 lib/vhost/socket.c                     |  72 ++-
 lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
 lib/vhost/vduse.h                      |  33 ++
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.c                      |  51 +-
 lib/vhost/vhost.h                      |  90 ++--
 lib/vhost/vhost_user.c                 |  51 +-
 lib/vhost/vhost_user.h                 |   2 +-
 lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
 lib/vhost/virtio_net_ctrl.h            |  10 +
 17 files changed, 1424 insertions(+), 238 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/28] vhost: fix missing guest notif stat increment
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-06-01 19:59   ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 02/28] vhost: fix invalid call FD handling Maxime Coquelin
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

Guest notification counter was only incremented for split
ring, this patch adds it also for packed ring.

Fixes: 1ea74efd7fa4 ("vhost: add statistics for guest notification")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8fdab13c70..8554ab4002 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -973,6 +973,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 kick:
 	if (kick) {
 		eventfd_write(vq->callfd, (eventfd_t)1);
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			vq->stats.guest_notifications++;
 		if (dev->notify_ops->guest_notified)
 			dev->notify_ops->guest_notified(dev->vid);
 	}
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/28] vhost: fix missing guest notif stat increment
  2023-05-25 16:25 ` [PATCH v3 01/28] vhost: fix missing guest notif stat increment Maxime Coquelin
@ 2023-06-01 19:59   ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 19:59 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: stable



On 5/25/23 18:25, Maxime Coquelin wrote:
> Guest notification counter was only incremented for split
> ring, this patch adds it also for packed ring.
> 
> Fixes: 1ea74efd7fa4 ("vhost: add statistics for guest notification")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>   lib/vhost/vhost.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8fdab13c70..8554ab4002 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -973,6 +973,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
>   kick:
>   	if (kick) {
>   		eventfd_write(vq->callfd, (eventfd_t)1);
> +		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +			vq->stats.guest_notifications++;
>   		if (dev->notify_ops->guest_notified)
>   			dev->notify_ops->guest_notified(dev->vid);
>   	}


Applied this single patch to dpdk-next-virtio/main, as it is needed for
Eelco's series, and the rest of the VDUSE series has to be rebased on
top of Eelco's.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 02/28] vhost: fix invalid call FD handling
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 01/28] vhost: fix missing guest notif stat increment Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 03/28] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

This patch fixes cases where IRQ injection is tried while
the call FD is not valid, which should not happen.

Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification suppression")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8554ab4002..40863f7bfd 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -902,9 +902,9 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			"%s: used_event_idx=%d, old=%d, new=%d\n",
 			__func__, vhost_used_event(vq), old, new);
 
-		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
-					(vq->callfd >= 0)) ||
-				unlikely(!signalled_used_valid)) {
+		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
+					unlikely(!signalled_used_valid)) &&
+				vq->callfd >= 0) {
 			eventfd_write(vq->callfd, (eventfd_t) 1);
 			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 				vq->stats.guest_notifications++;
@@ -971,7 +971,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (vhost_need_event(off, new, old))
 		kick = true;
 kick:
-	if (kick) {
+	if (kick && vq->callfd >= 0) {
 		eventfd_write(vq->callfd, (eventfd_t)1);
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			vq->stats.guest_notifications++;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 03/28] vhost: fix IOTLB entries overlap check with previous entry
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 01/28] vhost: fix missing guest notif stat increment Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 02/28] vhost: fix invalid call FD handling Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump Maxime Coquelin
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap check")
fixed the check to ensure the entry to be removed does not
overlap with the next one in the IOTLB cache before marking
it as DONTDUMP with madvise(). This is not enough, because
the same issue is present when comparing with the previous
entry in the cache, where the end address of the previous
entry should be used, not the start one.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 3f45bc6061..870c8acb88 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 			mask = ~(alignment - 1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
@@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			mask = ~(alignment-1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (2 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 03/28] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-26  8:46   ` David Marchand
  2023-05-25 16:25 ` [PATCH v3 05/28] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch reworks IOTLB code to extract madvise-related
bits into dedicated helper. This refactoring improves code
sharing.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 77 +++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 870c8acb88..268352bf82 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,34 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+{
+	uint64_t align;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+
+	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+}
+
+static void
+vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
+{
+	uint64_t align, mask;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	mask = ~(align - 1);
+
+	/* Don't disable coredump if the previous node is in the same page */
+	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+		/* Don't disable coredump if the next node is in the same page */
+		if (next == NULL ||
+				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+	}
+}
+
 static struct vhost_iotlb_entry *
 vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
 {
@@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
-		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
-			hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
+		vhost_user_iotlb_set_dump(dev, node);
+
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(vq, node);
 	}
@@ -164,7 +192,6 @@ static void
 vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 	int entry_idx;
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
@@ -173,20 +200,10 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment - 1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
 			vq->iotlb_cache_nr--;
@@ -240,16 +257,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 			vhost_user_iotlb_pool_put(vq, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-				hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+			vhost_user_iotlb_set_dump(dev, new_node);
+
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			vq->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
-	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+	vhost_user_iotlb_set_dump(dev, new_node);
+
 	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
 	vq->iotlb_cache_nr++;
 
@@ -265,7 +282,6 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 					uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 
 	if (unlikely(!size))
 		return;
@@ -278,20 +294,9 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			break;
 
 		if (iova < node->iova + node->size) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment-1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump
  2023-05-25 16:25 ` [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-05-26  8:46   ` David Marchand
  2023-06-01 13:43     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-05-26  8:46 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, May 25, 2023 at 6:26 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> @@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
>         rte_rwlock_write_lock(&vq->iotlb_lock);
>
>         RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> -               mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
> -                       hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
> +               vhost_user_iotlb_set_dump(dev, node);

vhost_user_iotlb_clear_dump ?


> +
>                 TAILQ_REMOVE(&vq->iotlb_list, node, next);
>                 vhost_user_iotlb_pool_put(vq, node);
>         }


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump
  2023-05-26  8:46   ` David Marchand
@ 2023-06-01 13:43     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 13:43 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu



On 5/26/23 10:46, David Marchand wrote:
> On Thu, May 25, 2023 at 6:26 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>> @@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
>>          rte_rwlock_write_lock(&vq->iotlb_lock);
>>
>>          RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
>> -               mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
>> -                       hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
>> +               vhost_user_iotlb_set_dump(dev, node);
> 
> vhost_user_iotlb_clear_dump ?


Yes, good catch!
Will be fixed in v4.

Thanks,
Maxime

> 
>> +
>>                  TAILQ_REMOVE(&vq->iotlb_list, node, next);
>>                  vhost_user_iotlb_pool_put(vq, node);
>>          }
> 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 05/28] vhost: add helper for IOTLB entries shared page check
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (3 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 06/28] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces a helper to check whether two IOTLB
entries share a page.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 268352bf82..59a2b2bbac 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static bool
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
+		uint64_t align)
+{
+	uint64_t a_end, b_start;
+
+	if (a == NULL || b == NULL)
+		return false;
+
+	/* Assumes entry a lower than entry b */
+	RTE_ASSERT(a->uaddr < b->uaddr);
+	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+
+	return a_end > b_start;
+}
+
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
@@ -37,16 +54,14 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, mask;
+	uint64_t align;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-	mask = ~(align - 1);
 
 	/* Don't disable coredump if the previous node is in the same page */
-	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+	if (!vhost_user_iotlb_share_page(prev, node, align)) {
 		/* Don't disable coredump if the next node is in the same page */
-		if (next == NULL ||
-				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+		if (!vhost_user_iotlb_share_page(node, next, align))
 			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
 	}
 }
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 06/28] vhost: don't dump unneeded pages with IOTLB
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (4 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 05/28] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 07/28] vhost: change to single IOTLB cache per device Maxime Coquelin
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

On IOTLB entry removal, previous fixes took care of not
marking pages shared with other IOTLB entries as DONTDUMP.

However, if an IOTLB entry is spanned on multiple pages,
the other pages were kept as DODUMP while they might not
have been shared with other entries, increasing needlessly
the coredump size.

This patch addresses this issue by excluding only the
shared pages from madvise's DONTDUMP.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 59a2b2bbac..5c5200114e 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -54,16 +54,23 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align;
+	uint64_t align, start, end;
+
+	start = node->uaddr;
+	end = node->uaddr + node->size;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
 
-	/* Don't disable coredump if the previous node is in the same page */
-	if (!vhost_user_iotlb_share_page(prev, node, align)) {
-		/* Don't disable coredump if the next node is in the same page */
-		if (!vhost_user_iotlb_share_page(node, next, align))
-			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
-	}
+	/* Skip first page if shared with previous entry. */
+	if (vhost_user_iotlb_share_page(prev, node, align))
+		start = RTE_ALIGN_CEIL(start, align);
+
+	/* Skip last page if shared with next entry. */
+	if (vhost_user_iotlb_share_page(node, next, align))
+		end = RTE_ALIGN_FLOOR(end, align);
+
+	if (end > start)
+		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
 }
 
 static struct vhost_iotlb_entry *
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 07/28] vhost: change to single IOTLB cache per device
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (5 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 06/28] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:32   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 08/28] vhost: add offset field to IOTLB entries Maxime Coquelin
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch simplifies IOTLB implementation and improves
IOTLB memory consumption by having a single IOTLB cache
per device, instead of having one per queue.

In order to not impact performance, it keeps an IOTLB lock
per virtqueue, so that there is no contention between
multiple queue trying to acquire it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  43 ++++++---
 lib/vhost/vhost.c      |  18 ++--
 lib/vhost/vhost.h      |  16 ++--
 lib/vhost/vhost_user.c |  23 +++--
 5 files changed, 159 insertions(+), 153 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 5c5200114e..f28483cb7a 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -74,86 +74,81 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 }
 
 static struct vhost_iotlb_entry *
-vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pool_get(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node;
 
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	node = SLIST_FIRST(&vq->iotlb_free_list);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	node = SLIST_FIRST(&dev->iotlb_free_list);
 	if (node != NULL)
-		SLIST_REMOVE_HEAD(&vq->iotlb_free_list, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+		SLIST_REMOVE_HEAD(&dev->iotlb_free_list, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 	return node;
 }
 
 static void
-vhost_user_iotlb_pool_put(struct vhost_virtqueue *vq,
-	struct vhost_iotlb_entry *node)
+vhost_user_iotlb_pool_put(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	SLIST_INSERT_HEAD(&vq->iotlb_free_list, node, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	SLIST_INSERT_HEAD(&dev->iotlb_free_list, node, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq);
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev);
 
 static void
-vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pending_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next, temp_node) {
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 bool
-vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-				uint8_t perm)
+vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	bool found = false;
 
-	rte_rwlock_read_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_pending_list, next) {
 		if ((node->iova == iova) && (node->perm == perm)) {
 			found = true;
 			break;
 		}
 	}
 
-	rte_rwlock_read_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_unlock(&dev->iotlb_pending_lock);
 
 	return found;
 }
 
 void
-vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint8_t perm)
+vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 
-	node = vhost_user_iotlb_pool_get(vq);
+	node = vhost_user_iotlb_pool_get(dev);
 	if (node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool for vq %"PRIu32" empty, clear entries for pending insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
-			vhost_user_iotlb_pending_remove_all(vq);
+			"IOTLB pool empty, clear entries for pending insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_pending_list))
+			vhost_user_iotlb_pending_remove_all(dev);
 		else
-			vhost_user_iotlb_cache_random_evict(dev, vq);
-		node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_cache_random_evict(dev);
+		node = vhost_user_iotlb_pool_get(dev);
 		if (node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, pending insertion failure\n",
-				vq->index);
+				"IOTLB pool still empty, pending insertion failure\n");
 			return;
 		}
 	}
@@ -161,22 +156,21 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *
 	node->iova = iova;
 	node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
+	TAILQ_INSERT_TAIL(&dev->iotlb_pending_list, node, next);
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 void
-vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t size, uint8_t perm)
+vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next,
 				temp_node) {
 		if (node->iova < iova)
 			continue;
@@ -184,81 +178,78 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 			continue;
 		if ((node->perm & perm) != node->perm)
 			continue;
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 static void
-vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		vhost_user_iotlb_set_dump(dev, node);
 
-		TAILQ_REMOVE(&vq->iotlb_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 	int entry_idx;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	entry_idx = rte_rand() % vq->iotlb_cache_nr;
+	entry_idx = rte_rand() % dev->iotlb_cache_nr;
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
 			break;
 		}
 		prev_node = node;
 		entry_idx--;
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t uaddr,
+vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 				uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
-	new_node = vhost_user_iotlb_pool_get(vq);
+	new_node = vhost_user_iotlb_pool_get(dev);
 	if (new_node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool vq %"PRIu32" empty, clear entries for cache insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_list))
-			vhost_user_iotlb_cache_random_evict(dev, vq);
+			"IOTLB pool empty, clear entries for cache insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_list))
+			vhost_user_iotlb_cache_random_evict(dev);
 		else
-			vhost_user_iotlb_pending_remove_all(vq);
-		new_node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_pending_remove_all(dev);
+		new_node = vhost_user_iotlb_pool_get(dev);
 		if (new_node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, cache insertion failed\n",
-				vq->index);
+				"IOTLB pool still empty, cache insertion failed\n");
 			return;
 		}
 	}
@@ -268,49 +259,47 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 	new_node->size = size;
 	new_node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/*
 		 * Entries must be invalidated before being updated.
 		 * So if iova already in list, assume identical.
 		 */
 		if (node->iova == new_node->iova) {
-			vhost_user_iotlb_pool_put(vq, new_node);
+			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
 			vhost_user_iotlb_set_dump(dev, new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
-			vq->iotlb_cache_nr++;
+			dev->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
 	vhost_user_iotlb_set_dump(dev, new_node);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
-	vq->iotlb_cache_nr++;
+	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
+	dev->iotlb_cache_nr++;
 
 unlock:
-	vhost_user_iotlb_pending_remove(vq, iova, size, perm);
-
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_pending_remove(dev, iova, size, perm);
 
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size)
+vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 
 	if (unlikely(!size))
 		return;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
@@ -320,19 +309,19 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
-		} else
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
+		} else {
 			prev_node = node;
+		}
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 uint64_t
-vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
-						uint64_t *size, uint8_t perm)
+vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	uint64_t offset, vva = 0, mapped = 0;
@@ -340,7 +329,7 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 	if (unlikely(!*size))
 		goto out;
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/* List sorted by iova */
 		if (unlikely(iova < node->iova))
 			break;
@@ -373,60 +362,57 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 }
 
 void
-vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_flush_all(struct virtio_net *dev)
 {
-	vhost_user_iotlb_cache_remove_all(dev, vq);
-	vhost_user_iotlb_pending_remove_all(vq);
+	vhost_user_iotlb_cache_remove_all(dev);
+	vhost_user_iotlb_pending_remove_all(dev);
 }
 
 int
-vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_init(struct virtio_net *dev)
 {
 	unsigned int i;
 	int socket = 0;
 
-	if (vq->iotlb_pool) {
+	if (dev->iotlb_pool) {
 		/*
 		 * The cache has already been initialized,
 		 * just drop all cached and pending entries.
 		 */
-		vhost_user_iotlb_flush_all(dev, vq);
-		rte_free(vq->iotlb_pool);
+		vhost_user_iotlb_flush_all(dev);
+		rte_free(dev->iotlb_pool);
 	}
 
 #ifdef RTE_LIBRTE_VHOST_NUMA
-	if (get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR) != 0)
+	if (get_mempolicy(&socket, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR) != 0)
 		socket = 0;
 #endif
 
-	rte_spinlock_init(&vq->iotlb_free_lock);
-	rte_rwlock_init(&vq->iotlb_lock);
-	rte_rwlock_init(&vq->iotlb_pending_lock);
+	rte_spinlock_init(&dev->iotlb_free_lock);
+	rte_rwlock_init(&dev->iotlb_pending_lock);
 
-	SLIST_INIT(&vq->iotlb_free_list);
-	TAILQ_INIT(&vq->iotlb_list);
-	TAILQ_INIT(&vq->iotlb_pending_list);
+	SLIST_INIT(&dev->iotlb_free_list);
+	TAILQ_INIT(&dev->iotlb_list);
+	TAILQ_INIT(&dev->iotlb_pending_list);
 
 	if (dev->flags & VIRTIO_DEV_SUPPORT_IOMMU) {
-		vq->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
+		dev->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
 			sizeof(struct vhost_iotlb_entry), 0, socket);
-		if (!vq->iotlb_pool) {
-			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"Failed to create IOTLB cache pool for vq %"PRIu32"\n",
-				vq->index);
+		if (!dev->iotlb_pool) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB cache pool\n");
 			return -1;
 		}
 		for (i = 0; i < IOTLB_CACHE_SIZE; i++)
-			vhost_user_iotlb_pool_put(vq, &vq->iotlb_pool[i]);
+			vhost_user_iotlb_pool_put(dev, &dev->iotlb_pool[i]);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
 	return 0;
 }
 
 void
-vhost_user_iotlb_destroy(struct vhost_virtqueue *vq)
+vhost_user_iotlb_destroy(struct virtio_net *dev)
 {
-	rte_free(vq->iotlb_pool);
+	rte_free(dev->iotlb_pool);
 }
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 73b5465b41..3490b9e6be 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -37,20 +37,37 @@ vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq)
 	rte_rwlock_write_unlock(&vq->iotlb_lock);
 }
 
-void vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t uaddr,
+static __rte_always_inline void
+vhost_user_iotlb_wr_lock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_lock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_unlock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 					uint64_t size, uint8_t perm);
-void vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size);
-uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
+uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
-bool vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-						uint8_t perm);
-void vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-						uint64_t iova, uint8_t perm);
-void vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq, uint64_t iova,
+bool vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova,
 						uint64_t size, uint8_t perm);
-void vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq);
-int vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq);
-void vhost_user_iotlb_destroy(struct vhost_virtqueue *vq);
+void vhost_user_iotlb_flush_all(struct virtio_net *dev);
+int vhost_user_iotlb_init(struct virtio_net *dev);
+void vhost_user_iotlb_destroy(struct virtio_net *dev);
+
 #endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index ef37943817..d35075b96c 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -63,7 +63,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	tmp_size = *size;
 
-	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
 	if (tmp_size == *size) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			vq->stats.iotlb_hits++;
@@ -75,7 +75,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	iova += tmp_size;
 
-	if (!vhost_user_iotlb_pending_miss(vq, iova, perm)) {
+	if (!vhost_user_iotlb_pending_miss(dev, iova, perm)) {
 		/*
 		 * iotlb_lock is read-locked for a full burst,
 		 * but it only protects the iotlb cache.
@@ -85,12 +85,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		 */
 		vhost_user_iotlb_rd_unlock(vq);
 
-		vhost_user_iotlb_pending_insert(dev, vq, iova, perm);
+		vhost_user_iotlb_pending_insert(dev, iova, perm);
 		if (vhost_user_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
-			vhost_user_iotlb_pending_remove(vq, iova, 1, perm);
+			vhost_user_iotlb_pending_remove(dev, iova, 1, perm);
 		}
 
 		vhost_user_iotlb_rd_lock(vq);
@@ -397,7 +397,6 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	vhost_free_async_mem(vq);
 	rte_spinlock_unlock(&vq->access_lock);
 	rte_free(vq->batch_copy_elems);
-	vhost_user_iotlb_destroy(vq);
 	rte_free(vq->log_cache);
 	rte_free(vq);
 }
@@ -575,7 +574,7 @@ vring_invalidate(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq
 }
 
 static void
-init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
+init_vring_queue(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq,
 	uint32_t vring_idx)
 {
 	int numa_node = SOCKET_ID_ANY;
@@ -595,8 +594,6 @@ init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	}
 #endif
 	vq->numa_node = numa_node;
-
-	vhost_user_iotlb_init(dev, vq);
 }
 
 static void
@@ -631,6 +628,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 		dev->virtqueue[i] = vq;
 		init_vring_queue(dev, vq, i);
 		rte_spinlock_init(&vq->access_lock);
+		rte_rwlock_init(&vq->iotlb_lock);
 		vq->avail_wrap_counter = 1;
 		vq->used_wrap_counter = 1;
 		vq->signalled_used_valid = false;
@@ -795,6 +793,10 @@ vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags, bool stats
 		dev->flags |= VIRTIO_DEV_SUPPORT_IOMMU;
 	else
 		dev->flags &= ~VIRTIO_DEV_SUPPORT_IOMMU;
+
+	if (vhost_user_iotlb_init(dev) < 0)
+		VHOST_LOG_CONFIG("device", ERR, "failed to init IOTLB\n");
+
 }
 
 void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 40863f7bfd..67cc4a2fdb 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -302,13 +302,6 @@ struct vhost_virtqueue {
 	struct log_cache_entry	*log_cache;
 
 	rte_rwlock_t	iotlb_lock;
-	rte_rwlock_t	iotlb_pending_lock;
-	struct vhost_iotlb_entry *iotlb_pool;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
-	int				iotlb_cache_nr;
-	rte_spinlock_t	iotlb_free_lock;
-	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
 
 	/* Used to notify the guest (trigger interrupt) */
 	int			callfd;
@@ -483,6 +476,15 @@ struct virtio_net {
 	int			extbuf;
 	int			linearbuf;
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+
+	rte_rwlock_t	iotlb_pending_lock;
+	struct vhost_iotlb_entry *iotlb_pool;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
+	int				iotlb_cache_nr;
+	rte_spinlock_t	iotlb_free_lock;
+	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
+
 	struct inflight_mem_info *inflight_info;
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index d60e39b6bc..8d0f84348b 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -237,6 +237,8 @@ vhost_backend_cleanup(struct virtio_net *dev)
 	}
 
 	dev->postcopy_listening = 0;
+
+	vhost_user_iotlb_destroy(dev);
 }
 
 static void
@@ -539,7 +541,6 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 	if (vq != dev->virtqueue[vq->index]) {
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "reallocated virtqueue on node %d\n", node);
 		dev->virtqueue[vq->index] = vq;
-		vhost_user_iotlb_init(dev, vq);
 	}
 
 	if (vq_is_packed(dev)) {
@@ -664,6 +665,8 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 		return;
 	}
 	dev->guest_pages = gp;
+
+	vhost_user_iotlb_init(dev);
 }
 #else
 static void
@@ -1360,8 +1363,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 
 		/* Flush IOTLB cache as previous HVAs are now invalid */
 		if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-			for (i = 0; i < dev->nr_vring; i++)
-				vhost_user_iotlb_flush_all(dev, dev->virtqueue[i]);
+			vhost_user_iotlb_flush_all(dev);
 
 		free_mem_region(dev);
 		rte_free(dev->mem);
@@ -2194,7 +2196,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 	ctx->msg.size = sizeof(ctx->msg.payload.state);
 	ctx->fd_num = 0;
 
-	vhost_user_iotlb_flush_all(dev, vq);
+	vhost_user_iotlb_flush_all(dev);
 
 	vring_invalidate(dev, vq);
 
@@ -2639,15 +2641,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_insert(dev, vq, imsg->iova, vva,
-					len, imsg->perm);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_spinlock_lock(&vq->access_lock);
 				translate_ring_addresses(&dev, &vq);
@@ -2657,15 +2658,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_user_iotlb_cache_remove(dev, imsg->iova, imsg->size);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_remove(dev, vq, imsg->iova,
-					imsg->size);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_spinlock_lock(&vq->access_lock);
 				vring_invalidate(dev, vq);
@@ -2674,8 +2674,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	default:
-		VHOST_LOG_CONFIG(dev->ifname, ERR,
-			"invalid IOTLB message type (%d)\n",
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "invalid IOTLB message type (%d)\n",
 			imsg->type);
 		return RTE_VHOST_MSG_RESULT_ERR;
 	}
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 07/28] vhost: change to single IOTLB cache per device
  2023-05-25 16:25 ` [PATCH v3 07/28] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-05-29  6:32   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:32 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 07/28] vhost: change to single IOTLB cache per device
> 
> This patch simplifies IOTLB implementation and improves
> IOTLB memory consumption by having a single IOTLB cache
> per device, instead of having one per queue.
> 
> In order to not impact performance, it keeps an IOTLB lock
> per virtqueue, so that there is no contention between
> multiple queue trying to acquire it.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
>  lib/vhost/iotlb.h      |  43 ++++++---
>  lib/vhost/vhost.c      |  18 ++--
>  lib/vhost/vhost.h      |  16 ++--
>  lib/vhost/vhost_user.c |  23 +++--
>  5 files changed, 159 insertions(+), 153 deletions(-)
> 
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 08/28] vhost: add offset field to IOTLB entries
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (6 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 07/28] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 09/28] vhost: add page size info to IOTLB entry Maxime Coquelin
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch is a preliminary work to prepare for VDUSE
support, for which we need to keep track of the mmaped base
address and offset in order to be able to unmap it later
when IOTLB entry is invalidated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c      | 30 ++++++++++++++++++------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost_user.c |  2 +-
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index f28483cb7a..14d143366b 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -17,6 +17,7 @@ struct vhost_iotlb_entry {
 
 	uint64_t iova;
 	uint64_t uaddr;
+	uint64_t uoffset;
 	uint64_t size;
 	uint8_t perm;
 };
@@ -27,15 +28,18 @@ static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
 		uint64_t align)
 {
-	uint64_t a_end, b_start;
+	uint64_t a_start, a_end, b_start;
 
 	if (a == NULL || b == NULL)
 		return false;
 
+	a_start = a->uaddr + a->uoffset;
+	b_start = b->uaddr + b->uoffset;
+
 	/* Assumes entry a lower than entry b */
-	RTE_ASSERT(a->uaddr < b->uaddr);
-	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+	RTE_ASSERT(a_start < b_start);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b_start, align);
 
 	return a_end > b_start;
 }
@@ -43,11 +47,12 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	uint64_t align;
+	uint64_t align, start;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	start = node->uaddr + node->uoffset;
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
-	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
 }
 
 static void
@@ -56,10 +61,10 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 {
 	uint64_t align, start, end;
 
-	start = node->uaddr;
-	end = node->uaddr + node->size;
+	start = node->uaddr + node->uoffset;
+	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
 	/* Skip first page if shared with previous entry. */
 	if (vhost_user_iotlb_share_page(prev, node, align))
@@ -234,7 +239,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -256,6 +261,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 
 	new_node->iova = iova;
 	new_node->uaddr = uaddr;
+	new_node->uoffset = uoffset;
 	new_node->size = size;
 	new_node->perm = perm;
 
@@ -344,7 +350,7 @@ vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *siz
 
 		offset = iova - node->iova;
 		if (!vva)
-			vva = node->uaddr + offset;
+			vva = node->uaddr + node->uoffset + offset;
 
 		mapped += node->size - offset;
 		iova = node->iova + node->size;
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 3490b9e6be..bee36c5903 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t size, uint8_t perm);
+					uint64_t uoffset, uint64_t size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8d0f84348b..222ccbf819 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -2641,7 +2641,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 09/28] vhost: add page size info to IOTLB entry
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (7 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 08/28] vhost: add offset field to IOTLB entries Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:32   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 10/28] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

VDUSE will close the file descriptor after having mapped
the shared memory, so it will not be possible to get the
page size afterwards.

This patch adds an new page_shift field to the IOTLB entry,
so that the information will be passed at IOTLB cache
insertion time. The information is stored as a bit shift
value so that IOTLB entry keeps fitting in a single
cacheline.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost.h      |  1 -
 lib/vhost/vhost_user.c |  8 +++++---
 4 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 14d143366b..a23008909f 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
 	uint64_t uaddr;
 	uint64_t uoffset;
 	uint64_t size;
+	uint8_t page_shift;
 	uint8_t perm;
 };
 
 #define IOTLB_CACHE_SIZE 2048
 
 static bool
-vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
-		uint64_t align)
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
 	uint64_t a_start, a_end, b_start;
 
@@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 
 	/* Assumes entry a lower than entry b */
 	RTE_ASSERT(a_start < b_start);
-	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b_start, align);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
+	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
 
 	return a_end > b_start;
 }
 
 static void
-vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
 {
-	uint64_t align, start;
+	uint64_t start;
 
 	start = node->uaddr + node->uoffset;
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
-	mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, true, RTE_BIT64(node->page_shift));
 }
 
 static void
-vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, start, end;
+	uint64_t start, end;
 
 	start = node->uaddr + node->uoffset;
 	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
 	/* Skip first page if shared with previous entry. */
-	if (vhost_user_iotlb_share_page(prev, node, align))
-		start = RTE_ALIGN_CEIL(start, align);
+	if (vhost_user_iotlb_share_page(prev, node))
+		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
 
 	/* Skip last page if shared with next entry. */
-	if (vhost_user_iotlb_share_page(node, next, align))
-		end = RTE_ALIGN_FLOOR(end, align);
+	if (vhost_user_iotlb_share_page(node, next))
+		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
 
 	if (end > start)
-		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
+		mem_set_dump((void *)(uintptr_t)start, end - start, false,
+			RTE_BIT64(node->page_shift));
 }
 
 static struct vhost_iotlb_entry *
@@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 	vhost_user_iotlb_wr_lock_all(dev);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
-		vhost_user_iotlb_set_dump(dev, node);
+		vhost_user_iotlb_set_dump(node);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(dev, node);
@@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
@@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t uoffset, uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 	new_node->uaddr = uaddr;
 	new_node->uoffset = uoffset;
 	new_node->size = size;
+	new_node->page_shift = __builtin_ctzll(page_size);
 	new_node->perm = perm;
 
 	vhost_user_iotlb_wr_lock_all(dev);
@@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			vhost_user_iotlb_set_dump(dev, new_node);
+			vhost_user_iotlb_set_dump(new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			dev->iotlb_cache_nr++;
@@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 		}
 	}
 
-	vhost_user_iotlb_set_dump(dev, new_node);
+	vhost_user_iotlb_set_dump(new_node);
 
 	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
 	dev->iotlb_cache_nr++;
@@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 		if (iova < node->iova + node->size) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index bee36c5903..81ca04df21 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t uoffset, uint64_t size, uint8_t perm);
+		uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 67cc4a2fdb..4ace5ab081 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -1016,6 +1016,5 @@ mbuf_is_consumed(struct rte_mbuf *m)
 	return true;
 }
 
-uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
 void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 222ccbf819..11b265c1ba 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	return log_gpa;
 }
 
-uint64_t
+static uint64_t
 hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
 {
 	struct rte_vhost_mem_region *r;
@@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
 	uint16_t i;
-	uint64_t vva, len;
+	uint64_t vva, len, pg_sz;
 
 	switch (imsg->type) {
 	case VHOST_IOTLB_UPDATE:
@@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
+		pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
+
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, pg_sz, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 09/28] vhost: add page size info to IOTLB entry
  2023-05-25 16:25 ` [PATCH v3 09/28] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-05-29  6:32   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:32 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 09/28] vhost: add page size info to IOTLB entry
> 
> VDUSE will close the file descriptor after having mapped
> the shared memory, so it will not be possible to get the
> page size afterwards.
> 
> This patch adds an new page_shift field to the IOTLB entry,
> so that the information will be passed at IOTLB cache
> insertion time. The information is stored as a bit shift
> value so that IOTLB entry keeps fitting in a single
> cacheline.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
>  lib/vhost/iotlb.h      |  2 +-
>  lib/vhost/vhost.h      |  1 -
>  lib/vhost/vhost_user.c |  8 +++++---
>  4 files changed, 28 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 14d143366b..a23008909f 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
>  	uint64_t uaddr;
>  	uint64_t uoffset;
>  	uint64_t size;
> +	uint8_t page_shift;
>  	uint8_t perm;
>  };
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
>  static bool
> -vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b,
> -		uint64_t align)
> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b)
>  {
>  	uint64_t a_start, a_end, b_start;
> 
> @@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry
> *a, struct vhost_iotlb_entr
> 
>  	/* Assumes entry a lower than entry b */
>  	RTE_ASSERT(a_start < b_start);
> -	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
> -	b_start = RTE_ALIGN_FLOOR(b_start, align);
> +	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
> +	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
> 
>  	return a_end > b_start;
>  }
> 
>  static void
> -vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node)
> +vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
>  {
> -	uint64_t align, start;
> +	uint64_t start;
> 
>  	start = node->uaddr + node->uoffset;
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> -
> -	mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
> +	mem_set_dump((void *)(uintptr_t)start, node->size, true,
> RTE_BIT64(node->page_shift));
>  }
> 
>  static void
> -vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node,
> +vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
>  		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>  {
> -	uint64_t align, start, end;
> +	uint64_t start, end;
> 
>  	start = node->uaddr + node->uoffset;
>  	end = start + node->size;
> 
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> -
>  	/* Skip first page if shared with previous entry. */
> -	if (vhost_user_iotlb_share_page(prev, node, align))
> -		start = RTE_ALIGN_CEIL(start, align);
> +	if (vhost_user_iotlb_share_page(prev, node))
> +		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
> 
>  	/* Skip last page if shared with next entry. */
> -	if (vhost_user_iotlb_share_page(node, next, align))
> -		end = RTE_ALIGN_FLOOR(end, align);
> +	if (vhost_user_iotlb_share_page(node, next))
> +		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
> 
>  	if (end > start)
> -		mem_set_dump((void *)(uintptr_t)start, end - start, false,
> align);
> +		mem_set_dump((void *)(uintptr_t)start, end - start, false,
> +			RTE_BIT64(node->page_shift));
>  }
> 
>  static struct vhost_iotlb_entry *
> @@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
> *dev)
>  	vhost_user_iotlb_wr_lock_all(dev);
> 
>  	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
> -		vhost_user_iotlb_set_dump(dev, node);
> +		vhost_user_iotlb_set_dump(node);
> 
>  		TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  		vhost_user_iotlb_pool_put(dev, node);
> @@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
>  		if (!entry_idx) {
>  			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> 
> -			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> +			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(dev, node);
> @@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
> 
>  void
>  vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -				uint64_t uoffset, uint64_t size, uint8_t perm)
> +				uint64_t uoffset, uint64_t size, uint64_t
> page_size, uint8_t perm)
>  {
>  	struct vhost_iotlb_entry *node, *new_node;
> 
> @@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  	new_node->uaddr = uaddr;
>  	new_node->uoffset = uoffset;
>  	new_node->size = size;
> +	new_node->page_shift = __builtin_ctzll(page_size);
>  	new_node->perm = perm;
> 
>  	vhost_user_iotlb_wr_lock_all(dev);
> @@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  			vhost_user_iotlb_pool_put(dev, new_node);
>  			goto unlock;
>  		} else if (node->iova > new_node->iova) {
> -			vhost_user_iotlb_set_dump(dev, new_node);
> +			vhost_user_iotlb_set_dump(new_node);
> 
>  			TAILQ_INSERT_BEFORE(node, new_node, next);
>  			dev->iotlb_cache_nr++;
> @@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  		}
>  	}
> 
> -	vhost_user_iotlb_set_dump(dev, new_node);
> +	vhost_user_iotlb_set_dump(new_node);
> 
>  	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
>  	dev->iotlb_cache_nr++;
> @@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> uint64_t iova, uint64_t si
>  		if (iova < node->iova + node->size) {
>  			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> 
> -			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> +			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(dev, node);
> diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
> index bee36c5903..81ca04df21 100644
> --- a/lib/vhost/iotlb.h
> +++ b/lib/vhost/iotlb.h
> @@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
>  }
> 
>  void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -					uint64_t uoffset, uint64_t size, uint8_t
> perm);
> +		uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t
> perm);
>  void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova,
> uint64_t size);
>  uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t
> iova,
>  					uint64_t *size, uint8_t perm);
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 67cc4a2fdb..4ace5ab081 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -1016,6 +1016,5 @@ mbuf_is_consumed(struct rte_mbuf *m)
>  	return true;
>  }
> 
> -uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
>  void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t
> alignment);
>  #endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 222ccbf819..11b265c1ba 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  	return log_gpa;
>  }
> 
> -uint64_t
> +static uint64_t
>  hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
>  {
>  	struct rte_vhost_mem_region *r;
> @@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  	struct virtio_net *dev = *pdev;
>  	struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
>  	uint16_t i;
> -	uint64_t vva, len;
> +	uint64_t vva, len, pg_sz;
> 
>  	switch (imsg->type) {
>  	case VHOST_IOTLB_UPDATE:
> @@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		if (!vva)
>  			return RTE_VHOST_MSG_RESULT_ERR;
> 
> -		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len,
> imsg->perm);
> +		pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
> +
> +		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len,
> pg_sz, imsg->perm);
> 
>  		for (i = 0; i < dev->nr_vring; i++) {
>  			struct vhost_virtqueue *vq = dev->virtqueue[i];
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 10/28] vhost: retry translating IOVA after IOTLB miss
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (8 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 09/28] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 11/28] vhost: introduce backend ops Maxime Coquelin
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

Vhost-user backend IOTLB misses and updates are
asynchronous, so IOVA address translation function
just fails after having sent an IOTLB miss update if needed
entry was not in the IOTLB cache.

This is not the case for VDUSE, for which the needed IOTLB
update is returned directly when sending an IOTLB miss.

This patch retry again finding the needed entry in the
IOTLB cache after having sent an IOTLB miss.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index d35075b96c..4f16307e4d 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -96,6 +96,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_lock(vq);
 	}
 
+	tmp_size = *size;
+	/* Retry in case of VDUSE, as it is synchronous */
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
+	if (tmp_size == *size)
+		return vva;
+
 	return 0;
 }
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 11/28] vhost: introduce backend ops
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (9 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 10/28] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback Maxime Coquelin
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces backend ops struct, that will enable
calling backend specifics callbacks (Vhost-user, VDUSE), in
shared code.

This is an empty shell for now, it will be filled in later
patches.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/socket.c     |  2 +-
 lib/vhost/vhost.c      |  8 +++++++-
 lib/vhost/vhost.h      | 10 +++++++++-
 lib/vhost/vhost_user.c |  8 ++++++++
 lib/vhost/vhost_user.h |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 669c322e12..ba54263824 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -221,7 +221,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		return;
 	}
 
-	vid = vhost_new_device();
+	vid = vhost_user_new_device();
 	if (vid == -1) {
 		goto err;
 	}
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 4f16307e4d..41f212315e 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -676,11 +676,16 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(struct vhost_backend_ops *ops)
 {
 	struct virtio_net *dev;
 	int i;
 
+	if (ops == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing backend ops.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
@@ -708,6 +713,7 @@ vhost_new_device(void)
 	dev->backend_req_fd = -1;
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->backend_req_lock);
+	dev->backend_ops = ops;
 
 	return i;
 }
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 4ace5ab081..cc5c707205 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,6 +89,12 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+/**
+ * Structure that contains backend-specific ops.
+ */
+struct vhost_backend_ops {
+};
+
 /**
  * Structure contains buffer address, length and descriptor index
  * from vring to do scatter RX.
@@ -513,6 +519,8 @@ struct virtio_net {
 	void			*extern_data;
 	/* pre and post vhost user message handlers for the device */
 	struct rte_vhost_user_extern_ops extern_ops;
+
+	struct vhost_backend_ops *backend_ops;
 } __rte_cache_aligned;
 
 static inline void
@@ -812,7 +820,7 @@ get_device(int vid)
 	return dev;
 }
 
-int vhost_new_device(void);
+int vhost_new_device(struct vhost_backend_ops *ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 11b265c1ba..7655082c4b 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3464,3 +3464,11 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 
 	return ret;
 }
+
+static struct vhost_backend_ops vhost_user_backend_ops;
+
+int
+vhost_user_new_device(void)
+{
+	return vhost_new_device(&vhost_user_backend_ops);
+}
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index a0987a58f9..61456049c8 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -185,5 +185,6 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
 		int *fd_num);
 int send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int vhost_user_new_device(void);
 
 #endif
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (10 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 11/28] vhost: introduce backend ops Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:33   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 13/28] vhost: add helper for IOTLB misses Maxime Coquelin
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

VDUSE will need to munmap() the IOTLB entry on removal
from the cache, as it performs mmap() before insertion.

This patch introduces a callback that VDUSE layer will
implement to achieve this.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 12 ++++++++++++
 lib/vhost/vhost.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index a23008909f..6dca0ba7d0 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct vhost_iotlb_entry *entry)
+{
+	if (dev->backend_ops->iotlb_remove_notify == NULL)
+		return;
+
+	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset, entry->size);
+}
+
 static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
@@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 		vhost_user_iotlb_set_dump(node);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_remove_notify(dev, node);
 		vhost_user_iotlb_pool_put(dev, node);
 	}
 
@@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 			break;
@@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 		} else {
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index cc5c707205..f37e0b83b8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,10 +89,13 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
+	vhost_iotlb_remove_notify iotlb_remove_notify;
 };
 
 /**
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback
  2023-05-25 16:25 ` [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-05-29  6:33   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:33 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback
> 
> VDUSE will need to munmap() the IOTLB entry on removal
> from the cache, as it performs mmap() before insertion.
> 
> This patch introduces a callback that VDUSE layer will
> implement to achieve this.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 12 ++++++++++++
>  lib/vhost/vhost.h |  3 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index a23008909f..6dca0ba7d0 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
> +static void
> +vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct
> vhost_iotlb_entry *entry)
> +{
> +	if (dev->backend_ops->iotlb_remove_notify == NULL)
> +		return;
> +
> +	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset,
> entry->size);
> +}
> +
>  static bool
>  vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b)
>  {
> @@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
> *dev)
>  		vhost_user_iotlb_set_dump(node);
> 
>  		TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +		vhost_user_iotlb_remove_notify(dev, node);
>  		vhost_user_iotlb_pool_put(dev, node);
>  	}
> 
> @@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
>  			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +			vhost_user_iotlb_remove_notify(dev, node);
>  			vhost_user_iotlb_pool_put(dev, node);
>  			dev->iotlb_cache_nr--;
>  			break;
> @@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> uint64_t iova, uint64_t si
>  			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +			vhost_user_iotlb_remove_notify(dev, node);
>  			vhost_user_iotlb_pool_put(dev, node);
>  			dev->iotlb_cache_nr--;
>  		} else {
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index cc5c707205..f37e0b83b8 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -89,10 +89,13 @@
>  	for (iter = val; iter < num; iter++)
>  #endif
> 
> +typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off,
> uint64_t size);
> +
>  /**
>   * Structure that contains backend-specific ops.
>   */
>  struct vhost_backend_ops {
> +	vhost_iotlb_remove_notify iotlb_remove_notify;
>  };
> 
>  /**
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 13/28] vhost: add helper for IOTLB misses
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (11 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:33   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 14/28] vhost: add helper for interrupt injection Maxime Coquelin
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds a helper for sending IOTLB misses as VDUSE
will use an ioctl while Vhost-user use a dedicated
Vhost-user backend request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c      | 13 ++++++++++++-
 lib/vhost/vhost.h      |  4 ++++
 lib/vhost/vhost_user.c |  6 ++++--
 lib/vhost/vhost_user.h |  1 -
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 41f212315e..790eb06b28 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -52,6 +52,12 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
 
 #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
 
+static int
+vhost_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+	return dev->backend_ops->iotlb_miss(dev, iova, perm);
+}
+
 uint64_t
 __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		    uint64_t iova, uint64_t *size, uint8_t perm)
@@ -86,7 +92,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_unlock(vq);
 
 		vhost_user_iotlb_pending_insert(dev, iova, perm);
-		if (vhost_user_iotlb_miss(dev, iova, perm)) {
+		if (vhost_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
@@ -686,6 +692,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->iotlb_miss == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IOTLB miss backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index f37e0b83b8..ee7640e901 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,13 +89,17 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+struct virtio_net;
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
+typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
+	vhost_iotlb_miss_cb iotlb_miss;
 };
 
 /**
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 7655082c4b..972559a2b5 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3305,7 +3305,7 @@ vhost_user_msg_handler(int vid, int fd)
 	return ret;
 }
 
-int
+static int
 vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	int ret;
@@ -3465,7 +3465,9 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
-static struct vhost_backend_ops vhost_user_backend_ops;
+static struct vhost_backend_ops vhost_user_backend_ops = {
+	.iotlb_miss = vhost_user_iotlb_miss,
+};
 
 int
 vhost_user_new_device(void)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index 61456049c8..1ffeca92f3 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -179,7 +179,6 @@ struct __rte_packed vhu_msg_context {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
-int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 /* socket.c */
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 13/28] vhost: add helper for IOTLB misses
  2023-05-25 16:25 ` [PATCH v3 13/28] vhost: add helper for IOTLB misses Maxime Coquelin
@ 2023-05-29  6:33   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:33 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 13/28] vhost: add helper for IOTLB misses
> 
> This patch adds a helper for sending IOTLB misses as VDUSE
> will use an ioctl while Vhost-user use a dedicated
> Vhost-user backend request.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.c      | 13 ++++++++++++-
>  lib/vhost/vhost.h      |  4 ++++
>  lib/vhost/vhost_user.c |  6 ++++--
>  lib/vhost/vhost_user.h |  1 -
>  4 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index 41f212315e..790eb06b28 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -52,6 +52,12 @@ static const struct vhost_vq_stats_name_off
> vhost_vq_stat_strings[] = {
> 
>  #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
> 
> +static int
> +vhost_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
> +{
> +	return dev->backend_ops->iotlb_miss(dev, iova, perm);
> +}
> +
>  uint64_t
>  __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		    uint64_t iova, uint64_t *size, uint8_t perm)
> @@ -86,7 +92,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  		vhost_user_iotlb_rd_unlock(vq);
> 
>  		vhost_user_iotlb_pending_insert(dev, iova, perm);
> -		if (vhost_user_iotlb_miss(dev, iova, perm)) {
> +		if (vhost_iotlb_miss(dev, iova, perm)) {
>  			VHOST_LOG_DATA(dev->ifname, ERR,
>  				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
>  				iova);
> @@ -686,6 +692,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
>  		return -1;
>  	}
> 
> +	if (ops->iotlb_miss == NULL) {
> +		VHOST_LOG_CONFIG("device", ERR, "missing IOTLB miss backend
> op.\n");
> +		return -1;
> +	}
> +
>  	pthread_mutex_lock(&vhost_dev_lock);
>  	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
>  		if (vhost_devices[i] == NULL)
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index f37e0b83b8..ee7640e901 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -89,13 +89,17 @@
>  	for (iter = val; iter < num; iter++)
>  #endif
> 
> +struct virtio_net;
>  typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off,
> uint64_t size);
> 
> +typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova,
> uint8_t perm);
> +
>  /**
>   * Structure that contains backend-specific ops.
>   */
>  struct vhost_backend_ops {
>  	vhost_iotlb_remove_notify iotlb_remove_notify;
> +	vhost_iotlb_miss_cb iotlb_miss;
>  };
> 
>  /**
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 7655082c4b..972559a2b5 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -3305,7 +3305,7 @@ vhost_user_msg_handler(int vid, int fd)
>  	return ret;
>  }
> 
> -int
> +static int
>  vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
>  {
>  	int ret;
> @@ -3465,7 +3465,9 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
>  	return ret;
>  }
> 
> -static struct vhost_backend_ops vhost_user_backend_ops;
> +static struct vhost_backend_ops vhost_user_backend_ops = {
> +	.iotlb_miss = vhost_user_iotlb_miss,
> +};
> 
>  int
>  vhost_user_new_device(void)
> diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
> index 61456049c8..1ffeca92f3 100644
> --- a/lib/vhost/vhost_user.h
> +++ b/lib/vhost/vhost_user.h
> @@ -179,7 +179,6 @@ struct __rte_packed vhu_msg_context {
> 
>  /* vhost_user.c */
>  int vhost_user_msg_handler(int vid, int fd);
> -int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t
> perm);
> 
>  /* socket.c */
>  int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int
> *fds, int max_fds,
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 14/28] vhost: add helper for interrupt injection
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (12 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 13/28] vhost: add helper for IOTLB misses Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-26  8:54   ` David Marchand
  2023-05-25 16:25 ` [PATCH v3 15/28] vhost: add API to set max queue pairs Maxime Coquelin
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

Vhost-user uses eventfd to inject IRQs, but VDUSE uses
an ioctl.

This patch prepares vhost_vring_call_split() and
vhost_vring_call_packed() to support VDUSE by introducing
a new helper.

It also adds a new counter for guest notification failures,
which could happen in case of uninitialized call file
descriptor for example.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.c      |  6 +++++
 lib/vhost/vhost.h      | 54 +++++++++++++++++++++++-------------------
 lib/vhost/vhost_user.c | 10 ++++++++
 3 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 790eb06b28..c07028f2b3 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -44,6 +44,7 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
 	{"size_1024_1518_packets", offsetof(struct vhost_virtqueue, stats.size_bins[6])},
 	{"size_1519_max_packets",  offsetof(struct vhost_virtqueue, stats.size_bins[7])},
 	{"guest_notifications",    offsetof(struct vhost_virtqueue, stats.guest_notifications)},
+	{"guest_notifications_error",    offsetof(struct vhost_virtqueue, stats.guest_notifications_error)},
 	{"iotlb_hits",             offsetof(struct vhost_virtqueue, stats.iotlb_hits)},
 	{"iotlb_misses",           offsetof(struct vhost_virtqueue, stats.iotlb_misses)},
 	{"inflight_submitted",     offsetof(struct vhost_virtqueue, stats.inflight_submitted)},
@@ -697,6 +698,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->inject_irq == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IRQ injection backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ee7640e901..8f0875b4e2 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -90,16 +90,20 @@
 #endif
 
 struct virtio_net;
+struct vhost_virtqueue;
+
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
 typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
+typedef int (*vhost_vring_inject_irq_cb)(struct virtio_net *dev, struct vhost_virtqueue *vq);
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
 	vhost_iotlb_miss_cb iotlb_miss;
+	vhost_vring_inject_irq_cb inject_irq;
 };
 
 /**
@@ -149,6 +153,7 @@ struct virtqueue_stats {
 	/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
 	uint64_t size_bins[8];
 	uint64_t guest_notifications;
+	uint64_t guest_notifications_error;
 	uint64_t iotlb_hits;
 	uint64_t iotlb_misses;
 	uint64_t inflight_submitted;
@@ -900,6 +905,24 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
 	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
 }
 
+static __rte_always_inline void
+vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	int ret;
+
+	ret = dev->backend_ops->inject_irq(dev, vq);
+	if (ret) {
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			vq->stats.guest_notifications_error++;
+		return;
+	}
+
+	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+		vq->stats.guest_notifications++;
+	if (dev->notify_ops->guest_notified)
+		dev->notify_ops->guest_notified(dev->vid);
+}
+
 static __rte_always_inline void
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -919,25 +942,13 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			"%s: used_event_idx=%d, old=%d, new=%d\n",
 			__func__, vhost_used_event(vq), old, new);
 
-		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
-					unlikely(!signalled_used_valid)) &&
-				vq->callfd >= 0) {
-			eventfd_write(vq->callfd, (eventfd_t) 1);
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				vq->stats.guest_notifications++;
-			if (dev->notify_ops->guest_notified)
-				dev->notify_ops->guest_notified(dev->vid);
-		}
+		if (vhost_need_event(vhost_used_event(vq), new, old) ||
+					unlikely(!signalled_used_valid))
+			vhost_vring_inject_irq(dev, vq);
 	} else {
 		/* Kick the guest if necessary. */
-		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-				&& (vq->callfd >= 0)) {
-			eventfd_write(vq->callfd, (eventfd_t)1);
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				vq->stats.guest_notifications++;
-			if (dev->notify_ops->guest_notified)
-				dev->notify_ops->guest_notified(dev->vid);
-		}
+		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
+			vhost_vring_inject_irq(dev, vq);
 	}
 }
 
@@ -988,13 +999,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (vhost_need_event(off, new, old))
 		kick = true;
 kick:
-	if (kick && vq->callfd >= 0) {
-		eventfd_write(vq->callfd, (eventfd_t)1);
-		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			vq->stats.guest_notifications++;
-		if (dev->notify_ops->guest_notified)
-			dev->notify_ops->guest_notified(dev->vid);
-	}
+	if (kick)
+		vhost_vring_inject_irq(dev, vq);
 }
 
 static __rte_always_inline void
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 972559a2b5..1362cf16e5 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3465,8 +3465,18 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
+static int
+vhost_user_inject_irq(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq)
+{
+	if (vq->callfd < 0)
+		return -1;
+
+	return eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
 static struct vhost_backend_ops vhost_user_backend_ops = {
 	.iotlb_miss = vhost_user_iotlb_miss,
+	.inject_irq = vhost_user_inject_irq,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 14/28] vhost: add helper for interrupt injection
  2023-05-25 16:25 ` [PATCH v3 14/28] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-05-26  8:54   ` David Marchand
  2023-06-01 13:58     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-05-26  8:54 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, May 25, 2023 at 6:26 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> @@ -900,6 +905,24 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
>         return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
>  }
>
> +static __rte_always_inline void
> +vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
> +{
> +       int ret;
> +
> +       ret = dev->backend_ops->inject_irq(dev, vq);
> +       if (ret) {

No need for ret.

> +               if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +                       vq->stats.guest_notifications_error++;
> +               return;
> +       }
> +
> +       if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +               vq->stats.guest_notifications++;
> +       if (dev->notify_ops->guest_notified)
> +               dev->notify_ops->guest_notified(dev->vid);
> +}
> +
>  static __rte_always_inline void
>  vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  {


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 14/28] vhost: add helper for interrupt injection
  2023-05-26  8:54   ` David Marchand
@ 2023-06-01 13:58     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 13:58 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu



On 5/26/23 10:54, David Marchand wrote:
> On Thu, May 25, 2023 at 6:26 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>> @@ -900,6 +905,24 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
>>          return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
>>   }
>>
>> +static __rte_always_inline void
>> +vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
>> +{
>> +       int ret;
>> +
>> +       ret = dev->backend_ops->inject_irq(dev, vq);
>> +       if (ret) {
> 
> No need for ret.

Agree. Fixed here and also in the new API introduced by Eelco.

Thanks,
Maxime

>> +               if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>> +                       vq->stats.guest_notifications_error++;
>> +               return;
>> +       }
>> +
>> +       if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>> +               vq->stats.guest_notifications++;
>> +       if (dev->notify_ops->guest_notified)
>> +               dev->notify_ops->guest_notified(dev->vid);
>> +}
>> +
>>   static __rte_always_inline void
>>   vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>>   {
> 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 15/28] vhost: add API to set max queue pairs
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (13 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 14/28] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-26  8:58   ` David Marchand
  2023-05-25 16:25 ` [PATCH v3 16/28] net/vhost: use " Maxime Coquelin
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces a new rte_vhost_driver_set_max_queues
API as preliminary work for multiqueue support with VDUSE.

Indeed, with VDUSE we need to pre-allocate the vrings at
device creation time, so we need such API not to allocate
the 128 queue pairs supported by the Vhost library.

Calling the API is optional, 128 queue pairs remaining the
default.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |  4 +++
 doc/guides/rel_notes/release_23_07.rst |  5 ++++
 lib/vhost/rte_vhost.h                  | 17 ++++++++++++
 lib/vhost/socket.c                     | 36 ++++++++++++++++++++++++--
 lib/vhost/version.map                  |  3 +++
 5 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index e8bb8c9b7b..cd4b109139 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -334,6 +334,10 @@ The following is an overview of some key Vhost API functions:
   Clean DMA vChannel finished to use. After this function is called,
   the specified DMA vChannel should no longer be used by the Vhost library.
 
+* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
+
+  Set the maximum number of queue pairs supported by the device.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..fa889a5ee7 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added Vhost API to set maximum queue pairs supported
+
+  Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
+  maximum number of supported queue pairs, required for VDUSE support.
+
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 58a5d4be92..44cbfcb469 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -588,6 +588,23 @@ rte_vhost_driver_get_protocol_features(const char *path,
 int
 rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set the maximum number of queue pairs supported by the device.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param max_queue_pairs
+ *  The maximum number of queue pairs
+ * @return
+ *  0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs);
+
 /**
  * Get the feature bits after negotiation
  *
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index ba54263824..e95c3ffeac 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -56,6 +56,8 @@ struct vhost_user_socket {
 
 	uint64_t protocol_features;
 
+	uint32_t max_queue_pairs;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	struct rte_vhost_device_ops const *notify_ops;
@@ -821,7 +823,7 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 
 	vdpa_dev = vsocket->vdpa_dev;
 	if (!vdpa_dev) {
-		*queue_num = VHOST_MAX_QUEUE_PAIRS;
+		*queue_num = vsocket->max_queue_pairs;
 		goto unlock_exit;
 	}
 
@@ -831,7 +833,36 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 		goto unlock_exit;
 	}
 
-	*queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS, vdpa_queue_num);
+	*queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
+
+unlock_exit:
+	pthread_mutex_unlock(&vhost_user.mutex);
+	return ret;
+}
+
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)
+{
+	struct vhost_user_socket *vsocket;
+	int ret = 0;
+
+	VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n", max_queue_pairs);
+
+	if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
+		VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u queue pairs\n",
+				VHOST_MAX_QUEUE_PAIRS);
+		return -1;
+	}
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	if (!vsocket) {
+		VHOST_LOG_CONFIG(path, ERR, "socket file is not registered yet.\n");
+		ret = -1;
+		goto unlock_exit;
+	}
+
+	vsocket->max_queue_pairs = max_queue_pairs;
 
 unlock_exit:
 	pthread_mutex_unlock(&vhost_user.mutex);
@@ -890,6 +921,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		goto out_free;
 	}
 	vsocket->vdpa_dev = NULL;
+	vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
 	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
 	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
 	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index d322a4a888..dffb126aa8 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -98,6 +98,9 @@ EXPERIMENTAL {
 	# added in 22.11
 	rte_vhost_async_dma_unconfigure;
 	rte_vhost_vring_call_nonblock;
+
+	# added in 23.07
+	rte_vhost_driver_set_max_queue_num;
 };
 
 INTERNAL {
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 15/28] vhost: add API to set max queue pairs
  2023-05-25 16:25 ` [PATCH v3 15/28] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-05-26  8:58   ` David Marchand
  2023-06-01 14:00     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-05-26  8:58 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, May 25, 2023 at 6:27 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
> index e8bb8c9b7b..cd4b109139 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -334,6 +334,10 @@ The following is an overview of some key Vhost API functions:
>    Clean DMA vChannel finished to use. After this function is called,
>    the specified DMA vChannel should no longer be used by the Vhost library.
>
> +* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
> +
> +  Set the maximum number of queue pairs supported by the device.
> +
>  Vhost-user Implementations
>  --------------------------
>
> diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
> index a9b1293689..fa889a5ee7 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -55,6 +55,11 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>
> +* **Added Vhost API to set maximum queue pairs supported

**Added Vhost API to set maximum queue pairs supported.**


> +
> +  Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
> +  maximum number of supported queue pairs, required for VDUSE support.
> +
>
>  Removed Items
>  -------------

-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 15/28] vhost: add API to set max queue pairs
  2023-05-26  8:58   ` David Marchand
@ 2023-06-01 14:00     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 14:00 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu



On 5/26/23 10:58, David Marchand wrote:
> On Thu, May 25, 2023 at 6:27 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>> diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
>> index e8bb8c9b7b..cd4b109139 100644
>> --- a/doc/guides/prog_guide/vhost_lib.rst
>> +++ b/doc/guides/prog_guide/vhost_lib.rst
>> @@ -334,6 +334,10 @@ The following is an overview of some key Vhost API functions:
>>     Clean DMA vChannel finished to use. After this function is called,
>>     the specified DMA vChannel should no longer be used by the Vhost library.
>>
>> +* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
>> +
>> +  Set the maximum number of queue pairs supported by the device.
>> +
>>   Vhost-user Implementations
>>   --------------------------
>>
>> diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
>> index a9b1293689..fa889a5ee7 100644
>> --- a/doc/guides/rel_notes/release_23_07.rst
>> +++ b/doc/guides/rel_notes/release_23_07.rst
>> @@ -55,6 +55,11 @@ New Features
>>        Also, make sure to start the actual text at the margin.
>>        =======================================================
>>
>> +* **Added Vhost API to set maximum queue pairs supported
> 
> **Added Vhost API to set maximum queue pairs supported.**

Fixed.

Thanks,
Maxime

> 
>> +
>> +  Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
>> +  maximum number of supported queue pairs, required for VDUSE support.
>> +
>>
>>   Removed Items
>>   -------------
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 16/28] net/vhost: use API to set max queue pairs
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (14 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 15/28] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 17/28] vhost: add control virtqueue support Maxime Coquelin
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

In order to support multiqueue with VDUSE, we need to
be able to limit the maximum number of queue pairs, to
avoid unnecessary memory consumption since the maximum
number of queue pairs need to be allocated at device
creation time, as opposed to Vhost-user which allocate
only when the frontend initialize them.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 62ef955ebc..8d37ec9775 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1013,6 +1013,9 @@ vhost_driver_setup(struct rte_eth_dev *eth_dev)
 			goto drv_unreg;
 	}
 
+	if (rte_vhost_driver_set_max_queue_num(internal->iface_name, internal->max_queues))
+		goto drv_unreg;
+
 	if (rte_vhost_driver_callback_register(internal->iface_name,
 					       &vhost_ops) < 0) {
 		VHOST_LOG(ERR, "Can't register callbacks\n");
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 17/28] vhost: add control virtqueue support
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (15 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 16/28] net/vhost: use " Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:51   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 18/28] vhost: add VDUSE device creation and destruction Maxime Coquelin
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

In order to support multi-queue with VDUSE, having
control queue support is required.

This patch adds control queue implementation, it will be
used later when adding VDUSE support. Only split ring
layout is supported for now, packed ring support will be
added later.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/meson.build       |   1 +
 lib/vhost/vhost.h           |   2 +
 lib/vhost/virtio_net_ctrl.c | 286 ++++++++++++++++++++++++++++++++++++
 lib/vhost/virtio_net_ctrl.h |  10 ++
 4 files changed, 299 insertions(+)
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index 0d1abf6283..83c8482c9e 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -27,6 +27,7 @@ sources = files(
         'vhost_crypto.c',
         'vhost_user.c',
         'virtio_net.c',
+        'virtio_net_ctrl.c',
 )
 headers = files(
         'rte_vdpa.h',
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8f0875b4e2..76663aed24 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -525,6 +525,8 @@ struct virtio_net {
 	int			postcopy_ufd;
 	int			postcopy_listening;
 
+	struct vhost_virtqueue	*cvq;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	/* context data for the external message handlers */
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
new file mode 100644
index 0000000000..f4b8d5f7cc
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -0,0 +1,286 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "iotlb.h"
+#include "vhost.h"
+#include "virtio_net_ctrl.h"
+
+struct virtio_net_ctrl {
+	uint8_t class;
+	uint8_t command;
+	uint8_t command_data[];
+};
+
+struct virtio_net_ctrl_elem {
+	struct virtio_net_ctrl *ctrl_req;
+	uint16_t head_idx;
+	uint16_t n_descs;
+	uint8_t *desc_ack;
+};
+
+static int
+virtio_net_ctrl_pop(struct virtio_net *dev, struct vhost_virtqueue *cvq,
+		struct virtio_net_ctrl_elem *ctrl_elem)
+	__rte_shared_locks_required(&cvq->iotlb_lock)
+{
+	uint16_t avail_idx, desc_idx, n_descs = 0;
+	uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
+	uint8_t *ctrl_req;
+	struct vring_desc *descs;
+
+	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	if (avail_idx == cvq->last_avail_idx) {
+		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
+		return 0;
+	}
+
+	desc_idx = cvq->avail->ring[cvq->last_avail_idx];
+	if (desc_idx >= cvq->size) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index, dropping\n");
+		goto err;
+	}
+
+	ctrl_elem->head_idx = desc_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (1) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		n_descs++;
+
+		if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			if (desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Invalid ack size for ctrl req, dropping\n");
+				goto err;
+			}
+
+			ctrl_elem->desc_ack = (uint8_t *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_WO);
+			if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Failed to map ctrl ack descriptor\n");
+				goto err;
+			}
+		} else {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			data_len += desc_len;
+		}
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	desc_idx = ctrl_elem->head_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
+		ctrl_elem->n_descs = 1;
+	else
+		ctrl_elem->n_descs = n_descs;
+
+	if (!ctrl_elem->desc_ack) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack descriptor\n");
+		goto err;
+	}
+
+	if (data_len < sizeof(ctrl_elem->ctrl_req->class) + sizeof(ctrl_elem->ctrl_req->command)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header size\n");
+		goto err;
+	}
+
+	ctrl_elem->ctrl_req = malloc(data_len);
+	if (!ctrl_elem->ctrl_req) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl request\n");
+		goto err;
+	}
+
+	ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto free_err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!desc_addr || desc_len < descs[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl descriptor\n");
+			goto free_err;
+		}
+
+		memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
+		ctrl_req += desc_len;
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return 1;
+
+free_err:
+	free(ctrl_elem->ctrl_req);
+err:
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return -1;
+}
+
+static uint8_t
+virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl *ctrl_req)
+{
+	uint8_t ret = VIRTIO_NET_ERR;
+
+	if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
+			ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+		uint16_t queue_pairs;
+		uint32_t i;
+
+		queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue pairs\n", queue_pairs);
+		ret = VIRTIO_NET_OK;
+
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+			bool enable;
+
+			if (vq == dev->cvq)
+				continue;
+
+			if (i < queue_pairs * 2)
+				enable = true;
+			else
+				enable = false;
+
+			vq->enabled = enable;
+			if (dev->notify_ops->vring_state_changed)
+				dev->notify_ops->vring_state_changed(dev->vid, i, enable);
+		}
+	}
+
+	return ret;
+}
+
+static int
+virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem *ctrl_elem)
+{
+	struct vhost_virtqueue *cvq = dev->cvq;
+	struct vring_used_elem *used_elem;
+
+	used_elem = &cvq->used->ring[cvq->last_used_idx];
+	used_elem->id = ctrl_elem->head_idx;
+	used_elem->len = ctrl_elem->n_descs;
+
+	cvq->last_used_idx++;
+	if (cvq->last_used_idx >= cvq->size)
+		cvq->last_used_idx -= cvq->size;
+
+	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+
+	vhost_vring_call_split(dev, dev->cvq);
+
+	free(ctrl_elem->ctrl_req);
+
+	return 0;
+}
+
+int
+virtio_net_ctrl_handle(struct virtio_net *dev)
+{
+	int ret = 0;
+
+	if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported yet\n");
+		return -1;
+	}
+
+	if (!dev->cvq) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&dev->cvq->access_lock);
+	vhost_user_iotlb_rd_lock(dev->cvq);
+
+	while (1) {
+		struct virtio_net_ctrl_elem ctrl_elem;
+
+		memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
+
+		ret = virtio_net_ctrl_pop(dev, dev->cvq, &ctrl_elem);
+		if (ret <= 0)
+			break;
+
+		*ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev, ctrl_elem.ctrl_req);
+
+		ret = virtio_net_ctrl_push(dev, &ctrl_elem);
+		if (ret < 0)
+			break;
+	}
+
+	vhost_user_iotlb_rd_unlock(dev->cvq);
+	rte_spinlock_unlock(&dev->cvq->access_lock);
+
+	return ret;
+}
diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
new file mode 100644
index 0000000000..9a90f4b9da
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VIRTIO_NET_CTRL_H
+#define _VIRTIO_NET_CTRL_H
+
+int virtio_net_ctrl_handle(struct virtio_net *dev);
+
+#endif
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 17/28] vhost: add control virtqueue support
  2023-05-25 16:25 ` [PATCH v3 17/28] vhost: add control virtqueue support Maxime Coquelin
@ 2023-05-29  6:51   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:51 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 17/28] vhost: add control virtqueue support
> 
> In order to support multi-queue with VDUSE, having
> control queue support is required.
> 
> This patch adds control queue implementation, it will be
> used later when adding VDUSE support. Only split ring
> layout is supported for now, packed ring support will be
> added later.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/meson.build       |   1 +
>  lib/vhost/vhost.h           |   2 +
>  lib/vhost/virtio_net_ctrl.c | 286 ++++++++++++++++++++++++++++++++++++
>  lib/vhost/virtio_net_ctrl.h |  10 ++
>  4 files changed, 299 insertions(+)
>  create mode 100644 lib/vhost/virtio_net_ctrl.c
>  create mode 100644 lib/vhost/virtio_net_ctrl.h
> 
> diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
> index 0d1abf6283..83c8482c9e 100644
> --- a/lib/vhost/meson.build
> +++ b/lib/vhost/meson.build
> @@ -27,6 +27,7 @@ sources = files(
>          'vhost_crypto.c',
>          'vhost_user.c',
>          'virtio_net.c',
> +        'virtio_net_ctrl.c',
>  )
>  headers = files(
>          'rte_vdpa.h',
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8f0875b4e2..76663aed24 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -525,6 +525,8 @@ struct virtio_net {
>  	int			postcopy_ufd;
>  	int			postcopy_listening;
> 
> +	struct vhost_virtqueue	*cvq;
> +
>  	struct rte_vdpa_device *vdpa_dev;
> 
>  	/* context data for the external message handlers */
> diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
> new file mode 100644
> index 0000000000..f4b8d5f7cc
> --- /dev/null
> +++ b/lib/vhost/virtio_net_ctrl.c
> @@ -0,0 +1,286 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +
> +#include "iotlb.h"
> +#include "vhost.h"
> +#include "virtio_net_ctrl.h"
> +
> +struct virtio_net_ctrl {
> +	uint8_t class;
> +	uint8_t command;
> +	uint8_t command_data[];
> +};
> +
> +struct virtio_net_ctrl_elem {
> +	struct virtio_net_ctrl *ctrl_req;
> +	uint16_t head_idx;
> +	uint16_t n_descs;
> +	uint8_t *desc_ack;
> +};
> +
> +static int
> +virtio_net_ctrl_pop(struct virtio_net *dev, struct vhost_virtqueue *cvq,
> +		struct virtio_net_ctrl_elem *ctrl_elem)
> +	__rte_shared_locks_required(&cvq->iotlb_lock)
> +{
> +	uint16_t avail_idx, desc_idx, n_descs = 0;
> +	uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
> +	uint8_t *ctrl_req;
> +	struct vring_desc *descs;
> +
> +	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
> +	if (avail_idx == cvq->last_avail_idx) {
> +		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
> +		return 0;
> +	}
> +
> +	desc_idx = cvq->avail->ring[cvq->last_avail_idx];
> +	if (desc_idx >= cvq->size) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index,
> dropping\n");
> +		goto err;
> +	}
> +
> +	ctrl_elem->head_idx = desc_idx;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
> +		desc_len = cvq->desc[desc_idx].len;
> +		desc_iova = cvq->desc[desc_idx].addr;
> +
> +		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev,
> cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_RO);
> +		if (!descs || desc_len != cvq->desc[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> indirect descs\n");
> +			goto err;
> +		}
> +
> +		desc_idx = 0;
> +	} else {
> +		descs = cvq->desc;
> +	}
> +
> +	while (1) {
> +		desc_len = descs[desc_idx].len;
> +		desc_iova = descs[desc_idx].addr;
> +
> +		n_descs++;
> +
> +		if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
> +			if (ctrl_elem->desc_ack) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Unexpected ctrl chain layout\n");
> +				goto err;
> +			}
> +
> +			if (desc_len != sizeof(uint8_t)) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Invalid ack size for ctrl req,
> dropping\n");
> +				goto err;
> +			}
> +
> +			ctrl_elem->desc_ack = (uint8_t
> *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_WO);
> +			if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t))
> {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Failed to map ctrl ack descriptor\n");
> +				goto err;
> +			}
> +		} else {
> +			if (ctrl_elem->desc_ack) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Unexpected ctrl chain layout\n");
> +				goto err;
> +			}
> +
> +			data_len += desc_len;
> +		}
> +
> +		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
> +			break;
> +
> +		desc_idx = descs[desc_idx].next;
> +	}
> +
> +	desc_idx = ctrl_elem->head_idx;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
> +		ctrl_elem->n_descs = 1;
> +	else
> +		ctrl_elem->n_descs = n_descs;
> +
> +	if (!ctrl_elem->desc_ack) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack
> descriptor\n");
> +		goto err;
> +	}
> +
> +	if (data_len < sizeof(ctrl_elem->ctrl_req->class) +
> sizeof(ctrl_elem->ctrl_req->command)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header
> size\n");
> +		goto err;
> +	}
> +
> +	ctrl_elem->ctrl_req = malloc(data_len);
> +	if (!ctrl_elem->ctrl_req) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl
> request\n");
> +		goto err;
> +	}
> +
> +	ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
> +		desc_len = cvq->desc[desc_idx].len;
> +		desc_iova = cvq->desc[desc_idx].addr;
> +
> +		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev,
> cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_RO);
> +		if (!descs || desc_len != cvq->desc[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> indirect descs\n");
> +			goto free_err;
> +		}
> +
> +		desc_idx = 0;
> +	} else {
> +		descs = cvq->desc;
> +	}
> +
> +	while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
> +		desc_len = descs[desc_idx].len;
> +		desc_iova = descs[desc_idx].addr;
> +
> +		desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len,
> VHOST_ACCESS_RO);
> +		if (!desc_addr || desc_len < descs[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> descriptor\n");
> +			goto free_err;
> +		}
> +
> +		memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
> +		ctrl_req += desc_len;
> +
> +		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
> +			break;
> +
> +		desc_idx = descs[desc_idx].next;
> +	}
> +
> +	cvq->last_avail_idx++;
> +	if (cvq->last_avail_idx >= cvq->size)
> +		cvq->last_avail_idx -= cvq->size;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vhost_avail_event(cvq) = cvq->last_avail_idx;
> +
> +	return 1;
> +
> +free_err:
> +	free(ctrl_elem->ctrl_req);
> +err:
> +	cvq->last_avail_idx++;
> +	if (cvq->last_avail_idx >= cvq->size)
> +		cvq->last_avail_idx -= cvq->size;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vhost_avail_event(cvq) = cvq->last_avail_idx;
> +
> +	return -1;
> +}
> +
> +static uint8_t
> +virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl
> *ctrl_req)
> +{
> +	uint8_t ret = VIRTIO_NET_ERR;
> +
> +	if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
> +			ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> +		uint16_t queue_pairs;
> +		uint32_t i;
> +
> +		queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue
> pairs\n", queue_pairs);
> +		ret = VIRTIO_NET_OK;
> +
> +		for (i = 0; i < dev->nr_vring; i++) {
> +			struct vhost_virtqueue *vq = dev->virtqueue[i];
> +			bool enable;
> +
> +			if (vq == dev->cvq)
> +				continue;
> +
> +			if (i < queue_pairs * 2)
> +				enable = true;
> +			else
> +				enable = false;
> +
> +			vq->enabled = enable;
> +			if (dev->notify_ops->vring_state_changed)
> +				dev->notify_ops->vring_state_changed(dev->vid, i,
> enable);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem
> *ctrl_elem)
> +{
> +	struct vhost_virtqueue *cvq = dev->cvq;
> +	struct vring_used_elem *used_elem;
> +
> +	used_elem = &cvq->used->ring[cvq->last_used_idx];
> +	used_elem->id = ctrl_elem->head_idx;
> +	used_elem->len = ctrl_elem->n_descs;
> +
> +	cvq->last_used_idx++;
> +	if (cvq->last_used_idx >= cvq->size)
> +		cvq->last_used_idx -= cvq->size;
> +
> +	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx,
> __ATOMIC_RELEASE);
> +
> +	vhost_vring_call_split(dev, dev->cvq);
> +
> +	free(ctrl_elem->ctrl_req);
> +
> +	return 0;
> +}
> +
> +int
> +virtio_net_ctrl_handle(struct virtio_net *dev)
> +{
> +	int ret = 0;
> +
> +	if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported
> yet\n");
> +		return -1;
> +	}
> +
> +	if (!dev->cvq) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
> +		return -1;
> +	}
> +
> +	rte_spinlock_lock(&dev->cvq->access_lock);
> +	vhost_user_iotlb_rd_lock(dev->cvq);
> +
> +	while (1) {
> +		struct virtio_net_ctrl_elem ctrl_elem;
> +
> +		memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
> +
> +		ret = virtio_net_ctrl_pop(dev, dev->cvq, &ctrl_elem);
> +		if (ret <= 0)
> +			break;
> +
> +		*ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev,
> ctrl_elem.ctrl_req);
> +
> +		ret = virtio_net_ctrl_push(dev, &ctrl_elem);
> +		if (ret < 0)
> +			break;
> +	}
> +
> +	vhost_user_iotlb_rd_unlock(dev->cvq);
> +	rte_spinlock_unlock(&dev->cvq->access_lock);
> +
> +	return ret;
> +}
> diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
> new file mode 100644
> index 0000000000..9a90f4b9da
> --- /dev/null
> +++ b/lib/vhost/virtio_net_ctrl.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#ifndef _VIRTIO_NET_CTRL_H
> +#define _VIRTIO_NET_CTRL_H
> +
> +int virtio_net_ctrl_handle(struct virtio_net *dev);
> +
> +#endif
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 18/28] vhost: add VDUSE device creation and destruction
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (16 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 17/28] vhost: add control virtqueue support Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-26  9:11   ` David Marchand
  2023-05-25 16:25 ` [PATCH v3 19/28] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
                   ` (10 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds initial support for VDUSE, which includes
the device creation and destruction.

It does not include the virtqueues configuration, so this is
not functionnal at this point.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/meson.build |   4 +
 lib/vhost/socket.c    |  34 ++++---
 lib/vhost/vduse.c     | 201 ++++++++++++++++++++++++++++++++++++++++++
 lib/vhost/vduse.h     |  33 +++++++
 lib/vhost/vhost.h     |   2 +
 5 files changed, 262 insertions(+), 12 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index 83c8482c9e..e63072d7a1 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -29,6 +29,10 @@ sources = files(
         'virtio_net.c',
         'virtio_net_ctrl.c',
 )
+if cc.has_header('linux/vduse.h')
+    sources += files('vduse.c')
+    cflags += '-DVHOST_HAS_VDUSE'
+endif
 headers = files(
         'rte_vdpa.h',
         'rte_vhost.h',
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index e95c3ffeac..a8a1c4cd2b 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 
 #include "fd_man.h"
+#include "vduse.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
@@ -35,6 +36,7 @@ struct vhost_user_socket {
 	int socket_fd;
 	struct sockaddr_un un;
 	bool is_server;
+	bool is_vduse;
 	bool reconnect;
 	bool iommu_support;
 	bool use_builtin_virtio_net;
@@ -992,18 +994,21 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 #endif
 	}
 
-	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
-		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-		if (vsocket->reconnect && reconn_tid == 0) {
-			if (vhost_user_reconnect_init() != 0)
-				goto out_mutex;
-		}
+	if (!strncmp("/dev/vduse/", path, strlen("/dev/vduse/"))) {
+		vsocket->is_vduse = true;
 	} else {
-		vsocket->is_server = true;
-	}
-	ret = create_unix_socket(vsocket);
-	if (ret < 0) {
-		goto out_mutex;
+		if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
+			vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
+			if (vsocket->reconnect && reconn_tid == 0) {
+				if (vhost_user_reconnect_init() != 0)
+					goto out_mutex;
+			}
+		} else {
+			vsocket->is_server = true;
+		}
+		ret = create_unix_socket(vsocket);
+		if (ret < 0)
+			goto out_mutex;
 	}
 
 	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -1068,7 +1073,9 @@ rte_vhost_driver_unregister(const char *path)
 		if (strcmp(vsocket->path, path))
 			continue;
 
-		if (vsocket->is_server) {
+		if (vsocket->is_vduse) {
+			vduse_device_destroy(path);
+		} else if (vsocket->is_server) {
 			/*
 			 * If r/wcb is executing, release vhost_user's
 			 * mutex lock, and try again since the r/wcb
@@ -1171,6 +1178,9 @@ rte_vhost_driver_start(const char *path)
 	if (!vsocket)
 		return -1;
 
+	if (vsocket->is_vduse)
+		return vduse_device_create(path);
+
 	if (fdset_tid == 0) {
 		/**
 		 * create a pipe which will be waited by poll and notified to
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
new file mode 100644
index 0000000000..d67818bfb5
--- /dev/null
+++ b/lib/vhost/vduse.c
@@ -0,0 +1,201 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+
+#include <linux/vduse.h>
+#include <linux/virtio_net.h>
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <rte_common.h>
+
+#include "vduse.h"
+#include "vhost.h"
+
+#define VHOST_VDUSE_API_VERSION 0
+#define VDUSE_CTRL_PATH "/dev/vduse/control"
+
+#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
+				(1ULL << VIRTIO_F_ANY_LAYOUT) | \
+				(1ULL << VIRTIO_F_VERSION_1)   | \
+				(1ULL << VIRTIO_NET_F_GSO) | \
+				(1ULL << VIRTIO_NET_F_HOST_TSO4) | \
+				(1ULL << VIRTIO_NET_F_HOST_TSO6) | \
+				(1ULL << VIRTIO_NET_F_HOST_UFO) | \
+				(1ULL << VIRTIO_NET_F_HOST_ECN) | \
+				(1ULL << VIRTIO_NET_F_CSUM)    | \
+				(1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
+				(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
+				(1ULL << VIRTIO_NET_F_GUEST_UFO) | \
+				(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
+				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
+				(1ULL << VIRTIO_F_IN_ORDER) | \
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+static struct vhost_backend_ops vduse_backend_ops = {
+};
+
+int
+vduse_device_create(const char *path)
+{
+	int control_fd, dev_fd, vid, ret;
+	uint32_t i;
+	struct virtio_net *dev;
+	uint64_t ver = VHOST_VDUSE_API_VERSION;
+	struct vduse_dev_config *dev_config = NULL;
+	const char *name = path + strlen("/dev/vduse/");
+
+	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
+	if (control_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
+				VDUSE_CTRL_PATH, strerror(errno));
+		return -1;
+	}
+
+	if (ioctl(control_fd, VDUSE_SET_API_VERSION, &ver)) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to set API version: %" PRIu64 ": %s\n",
+				ver, strerror(errno));
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	if (!dev_config) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	memset(dev_config, 0, sizeof(struct vduse_dev_config));
+
+	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
+	dev_config->device_id = VIRTIO_ID_NET;
+	dev_config->vendor_id = 0;
+	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
+	dev_config->vq_num = 2;
+	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
+	dev_config->config_size = 0;
+
+	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create VDUSE device: %s\n",
+				strerror(errno));
+		goto out_free;
+	}
+
+	dev_fd = open(path, O_RDWR);
+	if (dev_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open device %s: %s\n",
+				path, strerror(errno));
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	ret = fcntl(dev_fd, F_SETFL, O_NONBLOCK);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to set chardev as non-blocking: %s\n",
+				strerror(errno));
+		goto out_dev_close;
+	}
+
+	vid = vhost_new_device(&vduse_backend_ops);
+	if (vid < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create new Vhost device\n");
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	dev = get_device(vid);
+	if (!dev) {
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	strncpy(dev->ifname, path, IF_NAME_SZ - 1);
+	dev->vduse_ctrl_fd = control_fd;
+	dev->vduse_dev_fd = dev_fd;
+	vhost_setup_virtio_net(dev->vid, true, true, true, true);
+
+	for (i = 0; i < 2; i++) {
+		struct vduse_vq_config vq_cfg = { 0 };
+
+		ret = alloc_vring_queue(dev, i);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to alloc vring %d metadata\n", i);
+			goto out_dev_destroy;
+		}
+
+		vq_cfg.index = i;
+		vq_cfg.max_size = 1024;
+
+		ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP, &vq_cfg);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to set-up VQ %d\n", i);
+			goto out_dev_destroy;
+		}
+	}
+
+	free(dev_config);
+
+	return 0;
+
+out_dev_destroy:
+	vhost_destroy_device(vid);
+out_dev_close:
+	if (dev_fd >= 0)
+		close(dev_fd);
+	ioctl(control_fd, VDUSE_DESTROY_DEV, name);
+out_free:
+	free(dev_config);
+out_ctrl_close:
+	close(control_fd);
+
+	return ret;
+}
+
+int
+vduse_device_destroy(const char *path)
+{
+	const char *name = path + strlen("/dev/vduse/");
+	struct virtio_net *dev;
+	int vid, ret;
+
+	for (vid = 0; vid < RTE_MAX_VHOST_DEVICE; vid++) {
+		dev = vhost_devices[vid];
+
+		if (dev == NULL)
+			continue;
+
+		if (!strcmp(path, dev->ifname))
+			break;
+	}
+
+	if (vid == RTE_MAX_VHOST_DEVICE)
+		return -1;
+
+	if (dev->vduse_dev_fd >= 0) {
+		close(dev->vduse_dev_fd);
+		dev->vduse_dev_fd = -1;
+	}
+
+	if (dev->vduse_ctrl_fd >= 0) {
+		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
+		if (ret)
+			VHOST_LOG_CONFIG(name, ERR, "Failed to destroy VDUSE device: %s\n",
+					strerror(errno));
+		close(dev->vduse_ctrl_fd);
+		dev->vduse_ctrl_fd = -1;
+	}
+
+	vhost_destroy_device(vid);
+
+	return 0;
+}
diff --git a/lib/vhost/vduse.h b/lib/vhost/vduse.h
new file mode 100644
index 0000000000..a15e5d4c16
--- /dev/null
+++ b/lib/vhost/vduse.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VDUSE_H
+#define _VDUSE_H
+
+#include "vhost.h"
+
+#ifdef VHOST_HAS_VDUSE
+
+int vduse_device_create(const char *path);
+int vduse_device_destroy(const char *path);
+
+#else
+
+static inline int
+vduse_device_create(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+static inline int
+vduse_device_destroy(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+#endif /* VHOST_HAS_VDUSE */
+
+#endif /* _VDUSE_H */
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 76663aed24..c8f2a0d43a 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -524,6 +524,8 @@ struct virtio_net {
 
 	int			postcopy_ufd;
 	int			postcopy_listening;
+	int			vduse_ctrl_fd;
+	int			vduse_dev_fd;
 
 	struct vhost_virtqueue	*cvq;
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 18/28] vhost: add VDUSE device creation and destruction
  2023-05-25 16:25 ` [PATCH v3 18/28] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-05-26  9:11   ` David Marchand
  2023-06-01 14:05     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-05-26  9:11 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, May 25, 2023 at 6:27 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> new file mode 100644
> index 0000000000..d67818bfb5
> --- /dev/null
> +++ b/lib/vhost/vduse.c

[snip]

> +#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> +                               (1ULL << VIRTIO_F_ANY_LAYOUT) | \
> +                               (1ULL << VIRTIO_F_VERSION_1)   | \
> +                               (1ULL << VIRTIO_NET_F_GSO) | \
> +                               (1ULL << VIRTIO_NET_F_HOST_TSO4) | \
> +                               (1ULL << VIRTIO_NET_F_HOST_TSO6) | \
> +                               (1ULL << VIRTIO_NET_F_HOST_UFO) | \
> +                               (1ULL << VIRTIO_NET_F_HOST_ECN) | \
> +                               (1ULL << VIRTIO_NET_F_CSUM)    | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_UFO) | \
> +                               (1ULL << VIRTIO_NET_F_GUEST_ECN) | \
> +                               (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
> +                               (1ULL << VIRTIO_F_IN_ORDER) | \
> +                               (1ULL << VIRTIO_F_IOMMU_PLATFORM))

That's a lot of indent/spaces.

#define VDUSE_NET_SUPPORTED_FEATURES (\
        (1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
        (1ULL << VIRTIO_F_ANY_LAYOUT) | \


Plus, can't we use RTE_BIT64? (this could be a cleanup to do on the
whole vhost library)


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 18/28] vhost: add VDUSE device creation and destruction
  2023-05-26  9:11   ` David Marchand
@ 2023-06-01 14:05     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 14:05 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu



On 5/26/23 11:11, David Marchand wrote:
> On Thu, May 25, 2023 at 6:27 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
>> new file mode 100644
>> index 0000000000..d67818bfb5
>> --- /dev/null
>> +++ b/lib/vhost/vduse.c
> 
> [snip]
> 
>> +#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
>> +                               (1ULL << VIRTIO_F_ANY_LAYOUT) | \
>> +                               (1ULL << VIRTIO_F_VERSION_1)   | \
>> +                               (1ULL << VIRTIO_NET_F_GSO) | \
>> +                               (1ULL << VIRTIO_NET_F_HOST_TSO4) | \
>> +                               (1ULL << VIRTIO_NET_F_HOST_TSO6) | \
>> +                               (1ULL << VIRTIO_NET_F_HOST_UFO) | \
>> +                               (1ULL << VIRTIO_NET_F_HOST_ECN) | \
>> +                               (1ULL << VIRTIO_NET_F_CSUM)    | \
>> +                               (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
>> +                               (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
>> +                               (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>> +                               (1ULL << VIRTIO_NET_F_GUEST_UFO) | \
>> +                               (1ULL << VIRTIO_NET_F_GUEST_ECN) | \
>> +                               (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
>> +                               (1ULL << VIRTIO_F_IN_ORDER) | \
>> +                               (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> That's a lot of indent/spaces.
> 
> #define VDUSE_NET_SUPPORTED_FEATURES (\
>          (1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
>          (1ULL << VIRTIO_F_ANY_LAYOUT) | \
> 
> 
> Plus, can't we use RTE_BIT64? (this could be a cleanup to do on the
> whole vhost library)
> 
> 

Agree we should move to RTE_BIT64(), I'll do that in another patch for
-rc2 if that is fine to you (including the indents/spaces reduction).

Maxime


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 19/28] vhost: add VDUSE callback for IOTLB miss
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (17 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 18/28] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB misses,
which is done by using the VDUSE VDUSE_IOTLB_GET_FD ioctl.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d67818bfb5..f72c7bf6ab 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -13,9 +13,11 @@
 
 #include <sys/ioctl.h>
 #include <sys/mman.h>
+#include <sys/stat.h>
 
 #include <rte_common.h>
 
+#include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
 
@@ -40,7 +42,63 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
+{
+	struct vduse_iotlb_entry entry;
+	uint64_t size, page_size;
+	struct stat stat;
+	void *mmap_addr;
+	int fd, ret;
+
+	entry.start = iova;
+	entry.last = iova + 1;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_IOTLB_GET_FD, &entry);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get IOTLB entry for 0x%" PRIx64 "\n",
+				iova);
+		return -1;
+	}
+
+	fd = ret;
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "New IOTLB entry:\n");
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tIOVA: %" PRIx64 " - %" PRIx64 "\n",
+			(uint64_t)entry.start, (uint64_t)entry.last);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\toffset: %" PRIx64 "\n", (uint64_t)entry.offset);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tfd: %d\n", fd);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tperm: %x\n", entry.perm);
+
+	size = entry.last - entry.start + 1;
+	mmap_addr = mmap(0, size + entry.offset, entry.perm, MAP_SHARED, fd, 0);
+	if (!mmap_addr) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to mmap IOTLB entry for 0x%" PRIx64 "\n", iova);
+		ret = -1;
+		goto close_fd;
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get page size.\n");
+		munmap(mmap_addr, entry.offset + size);
+		goto close_fd;
+	}
+	page_size = (uint64_t)stat.st_blksize;
+
+	vhost_user_iotlb_cache_insert(dev, entry.start, (uint64_t)(uintptr_t)mmap_addr,
+		entry.offset, size, page_size, entry.perm);
+
+	ret = 0;
+close_fd:
+	close(fd);
+
+	return ret;
+}
+
 static struct vhost_backend_ops vduse_backend_ops = {
+	.iotlb_miss = vduse_iotlb_miss,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (18 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 19/28] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:51   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 21/28] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB entry
removal, where it unmaps the pages from the invalidated
IOTLB entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index f72c7bf6ab..58c1b384a8 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static void
+vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
+{
+	munmap((void *)(uintptr_t)addr, offset + size);
+}
+
 static int
 vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
 {
@@ -99,6 +105,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
+	.iotlb_remove_notify = vduse_iotlb_remove_notify,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal
  2023-05-25 16:25 ` [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-05-29  6:51   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:51 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry
> removal
> 
> This patch implements the VDUSE callback for IOTLB entry
> removal, where it unmaps the pages from the invalidated
> IOTLB entry.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index f72c7bf6ab..58c1b384a8 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -42,6 +42,12 @@
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
>  				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> +static void
> +vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
> +{
> +	munmap((void *)(uintptr_t)addr, offset + size);
> +}
> +
>  static int
>  vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm
> __rte_unused)
>  {
> @@ -99,6 +105,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova,
> uint8_t perm __rte_unuse
> 
>  static struct vhost_backend_ops vduse_backend_ops = {
>  	.iotlb_miss = vduse_iotlb_miss,
> +	.iotlb_remove_notify = vduse_iotlb_remove_notify,
>  };
> 
>  int
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 21/28] vhost: add VDUSE callback for IRQ injection
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (19 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 22/28] vhost: add VDUSE events handler Maxime Coquelin
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for kicking
virtqueues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 58c1b384a8..d39e39b9dc 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	return ioctl(dev->vduse_dev_fd, VDUSE_VQ_INJECT_IRQ, &vq->index);
+}
+
 static void
 vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
 {
@@ -106,6 +112,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
 	.iotlb_remove_notify = vduse_iotlb_remove_notify,
+	.inject_irq = vduse_inject_irq,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 22/28] vhost: add VDUSE events handler
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (20 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 21/28] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 23/28] vhost: add support for virtqueue state get event Maxime Coquelin
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch makes use of Vhost lib's FD manager to install
a handler for VDUSE events occurring on the VDUSE device FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d39e39b9dc..92c515cff2 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -17,6 +17,7 @@
 
 #include <rte_common.h>
 
+#include "fd_man.h"
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
@@ -42,6 +43,31 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+struct vduse {
+	struct fdset fdset;
+};
+
+static struct vduse vduse = {
+	.fdset = {
+		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.num = 0
+	},
+};
+
+static bool vduse_events_thread;
+
+static const char * const vduse_reqs_str[] = {
+	"VDUSE_GET_VQ_STATE",
+	"VDUSE_SET_STATUS",
+	"VDUSE_UPDATE_IOTLB",
+};
+
+#define vduse_req_id_to_str(id) \
+	(id < RTE_DIM(vduse_reqs_str) ? \
+	vduse_reqs_str[id] : "Unknown")
+
 static int
 vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -115,16 +141,80 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	struct vduse_dev_request req;
+	struct vduse_dev_response resp;
+	int ret;
+
+	memset(&resp, 0, sizeof(resp));
+
+	ret = read(fd, &req, sizeof(req));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read request: %s\n",
+				strerror(errno));
+		return;
+	} else if (ret < (int)sizeof(req)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Incomplete to read request %d\n", ret);
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "New request: %s (%u)\n",
+			vduse_req_id_to_str(req.type), req.type);
+
+	switch (req.type) {
+	default:
+		resp.result = VDUSE_REQ_RESULT_FAILED;
+		break;
+	}
+
+	resp.request_id = req.request_id;
+
+	ret = write(dev->vduse_dev_fd, &resp, sizeof(resp));
+	if (ret != sizeof(resp)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to write response %s\n",
+				strerror(errno));
+		return;
+	}
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled successfully\n",
+			vduse_req_id_to_str(req.type), req.type);
+}
+
 int
 vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
+	pthread_t fdset_tid;
 	uint32_t i;
 	struct virtio_net *dev;
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
 
+	/* If first device, create events dispatcher thread */
+	if (vduse_events_thread == false) {
+		/**
+		 * create a pipe which will be waited by poll and notified to
+		 * rebuild the wait list of poll.
+		 */
+		if (fdset_pipe_init(&vduse.fdset) < 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create pipe for vduse fdset\n");
+			return -1;
+		}
+
+		ret = rte_ctrl_thread_create(&fdset_tid, "vduse-events", NULL,
+				fdset_event_dispatch, &vduse.fdset);
+		if (ret != 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create vduse fdset handling thread\n");
+			fdset_pipe_uninit(&vduse.fdset);
+			return -1;
+		}
+
+		vduse_events_thread = true;
+	}
+
 	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
 	if (control_fd < 0) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
@@ -215,6 +305,14 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
+	if (ret) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
+				dev->vduse_dev_fd);
+		goto out_dev_destroy;
+	}
+	fdset_pipe_notify(&vduse.fdset);
+
 	free(dev_config);
 
 	return 0;
@@ -253,6 +351,9 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
+	fdset_pipe_notify(&vduse.fdset);
+
 	if (dev->vduse_dev_fd >= 0) {
 		close(dev->vduse_dev_fd);
 		dev->vduse_dev_fd = -1;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 23/28] vhost: add support for virtqueue state get event
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (21 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 22/28] vhost: add VDUSE events handler Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 24/28] vhost: add support for VDUSE status set event Maxime Coquelin
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_GET_VQ_STATE event
handling, which consists in providing the backend last
available index for the specified virtqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 92c515cff2..7e36c50b6c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -147,6 +147,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	struct virtio_net *dev = arg;
 	struct vduse_dev_request req;
 	struct vduse_dev_response resp;
+	struct vhost_virtqueue *vq;
 	int ret;
 
 	memset(&resp, 0, sizeof(resp));
@@ -165,6 +166,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 			vduse_req_id_to_str(req.type), req.type);
 
 	switch (req.type) {
+	case VDUSE_GET_VQ_STATE:
+		vq = dev->virtqueue[req.vq_state.index];
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tvq index: %u, avail_index: %u\n",
+				req.vq_state.index, vq->last_avail_idx);
+		resp.vq_state.split.avail_index = vq->last_avail_idx;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 24/28] vhost: add support for VDUSE status set event
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (22 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 23/28] vhost: add support for virtqueue state get event Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_SET_STATUS event
handling, which consists in updating the Virtio device
status set by the Virtio driver.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 7e36c50b6c..3bf65d4b8b 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -173,6 +173,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		resp.vq_state.split.avail_index = vq->last_avail_idx;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_SET_STATUS:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
+				req.s.status);
+		dev->status = req.s.status;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (23 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 24/28] vhost: add support for VDUSE status set event Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:52   ` Xia, Chenbo
  2023-05-25 16:25 ` [PATCH v3 26/28] vhost: add VDUSE device startup Maxime Coquelin
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_UPDATE_IOTLB event
handling, which consists in invaliding IOTLB entries for
the range specified in the request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 3bf65d4b8b..110654ec68 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -179,6 +179,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		dev->status = req.s.status;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_UPDATE_IOTLB:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 " - %" PRIx64 "\n",
+				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
+		vhost_user_iotlb_cache_remove(dev, req.iova.start,
+				req.iova.last - req.iova.start + 1);
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event
  2023-05-25 16:25 ` [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-05-29  6:52   ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:52 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event
> 
> This patch adds support for VDUSE_UPDATE_IOTLB event
> handling, which consists in invaliding IOTLB entries for
> the range specified in the request.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 3bf65d4b8b..110654ec68 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -179,6 +179,13 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  		dev->status = req.s.status;
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
> +	case VDUSE_UPDATE_IOTLB:
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 "
> - %" PRIx64 "\n",
> +				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
> +		vhost_user_iotlb_cache_remove(dev, req.iova.start,
> +				req.iova.last - req.iova.start + 1);
> +		resp.result = VDUSE_REQ_RESULT_OK;
> +		break;
>  	default:
>  		resp.result = VDUSE_REQ_RESULT_FAILED;
>  		break;
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 26/28] vhost: add VDUSE device startup
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (24 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 27/28] vhost: add multiqueue support to VDUSE Maxime Coquelin
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds the device and its virtqueues
initialization once the Virtio driver has set the DRIVER_OK
in the Virtio status register.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 110654ec68..a10dc24d38 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -141,6 +141,128 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_vring_setup(struct virtio_net *dev, unsigned int index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[index];
+	struct vhost_vring_addr *ra = &vq->ring_addrs;
+	struct vduse_vq_info vq_info;
+	struct vduse_vq_eventfd vq_efd;
+	int ret;
+
+	vq_info.index = index;
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u info: %s\n",
+				index, strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n", vq_info.desc_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n", vq_info.driver_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n", vq_info.device_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n", vq_info.split.avail_index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
+
+	vq->last_avail_idx = vq_info.split.avail_index;
+	vq->size = vq_info.num;
+	vq->ready = true;
+	vq->enabled = vq_info.ready;
+	ra->desc_user_addr = vq_info.desc_addr;
+	ra->avail_user_addr = vq_info.driver_addr;
+	ra->used_user_addr = vq_info.device_addr;
+
+	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+	if (vq->kickfd < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		vq->kickfd = VIRTIO_INVALID_EVENTFD;
+		return;
+	}
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tkick fd: %d\n", vq->kickfd);
+
+	vq->shadow_used_split = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct vring_used_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+	vq->batch_copy_elems = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct batch_copy_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+
+	vhost_user_iotlb_rd_lock(vq);
+	if (vring_translate(dev, vq))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate vring %d addresses\n",
+				index);
+
+	if (vhost_enable_guest_notification(dev, vq, 0))
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to disable guest notifications on vring %d\n",
+				index);
+	vhost_user_iotlb_rd_unlock(vq);
+
+	vq_efd.index = index;
+	vq_efd.fd = vq->kickfd;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		close(vq->kickfd);
+		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		return;
+	}
+}
+
+static void
+vduse_device_start(struct virtio_net *dev)
+{
+	unsigned int i, ret;
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Starting device...\n");
+
+	dev->notify_ops = vhost_driver_callback_get(dev->ifname);
+	if (!dev->notify_ops) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to get callback ops for driver\n");
+		return;
+	}
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev->features);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get features: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Negotiated Virtio features: 0x%" PRIx64 "\n",
+		dev->features);
+
+	if (dev->features &
+		((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+		 (1ULL << VIRTIO_F_VERSION_1) |
+		 (1ULL << VIRTIO_F_RING_PACKED))) {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	} else {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->nr_vring; i++)
+		vduse_vring_setup(dev, i);
+
+	dev->flags |= VIRTIO_DEV_READY;
+
+	if (dev->notify_ops->new_device(dev->vid) == 0)
+		dev->flags |= VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (dev->notify_ops->vring_state_changed)
+			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
+	}
+}
+
 static void
 vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 {
@@ -177,6 +299,10 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
 				req.s.status);
 		dev->status = req.s.status;
+
+		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+			vduse_device_start(dev);
+
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
 	case VDUSE_UPDATE_IOTLB:
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 27/28] vhost: add multiqueue support to VDUSE
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (25 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 26/28] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-25 16:25 ` [PATCH v3 28/28] vhost: add VDUSE device stop Maxime Coquelin
  2023-05-26  9:14 ` [PATCH v3 00/28] Add VDUSE support to Vhost library David Marchand
  28 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch enables control queue support in order to
support multiqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 83 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 76 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index a10dc24d38..699cfed9e3 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -21,6 +21,7 @@
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
+#include "virtio_net_ctrl.h"
 
 #define VHOST_VDUSE_API_VERSION 0
 #define VDUSE_CTRL_PATH "/dev/vduse/control"
@@ -41,7 +42,9 @@
 				(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
 				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
 				(1ULL << VIRTIO_F_IN_ORDER) | \
-				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
+				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+				(1ULL << VIRTIO_NET_F_MQ))
 
 struct vduse {
 	struct fdset fdset;
@@ -141,6 +144,25 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_control_queue_event(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	uint64_t buf;
+	int ret;
+
+	ret = read(fd, &buf, sizeof(buf));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read control queue event: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue kicked\n");
+	if (virtio_net_ctrl_handle(dev))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to handle ctrl request\n");
+}
+
 static void
 vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 {
@@ -212,6 +234,22 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 		return;
 	}
+
+	if (vq == dev->cvq) {
+		ret = fdset_add(&vduse.fdset, vq->kickfd, vduse_control_queue_event, NULL, dev);
+		if (ret) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR,
+					"Failed to setup kickfd handler for VQ %u: %s\n",
+					index, strerror(errno));
+			vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+			ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+			close(vq->kickfd);
+			vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		}
+		fdset_pipe_notify(&vduse.fdset);
+		vhost_enable_guest_notification(dev, vq, 1);
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl queue event handler installed\n");
+	}
 }
 
 static void
@@ -258,6 +296,9 @@ vduse_device_start(struct virtio_net *dev)
 	for (i = 0; i < dev->nr_vring; i++) {
 		struct vhost_virtqueue *vq = dev->virtqueue[i];
 
+		if (vq == dev->cvq)
+			continue;
+
 		if (dev->notify_ops->vring_state_changed)
 			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
 	}
@@ -334,9 +375,11 @@ vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
 	pthread_t fdset_tid;
-	uint32_t i;
+	uint32_t i, max_queue_pairs, total_queues;
 	struct virtio_net *dev;
+	struct virtio_net_config vnet_config = { 0 };
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
+	uint64_t features = VDUSE_NET_SUPPORTED_FEATURES;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
 
@@ -376,22 +419,39 @@ vduse_device_create(const char *path)
 		goto out_ctrl_close;
 	}
 
-	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	dev_config = malloc(offsetof(struct vduse_dev_config, config) +
+			sizeof(vnet_config));
 	if (!dev_config) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
 		ret = -1;
 		goto out_ctrl_close;
 	}
 
+	ret = rte_vhost_driver_get_queue_num(path, &max_queue_pairs);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to get max queue pairs\n");
+		goto out_free;
+	}
+
+	VHOST_LOG_CONFIG(path, INFO, "VDUSE max queue pairs: %u\n", max_queue_pairs);
+	total_queues = max_queue_pairs * 2;
+
+	if (max_queue_pairs == 1)
+		features &= ~(RTE_BIT64(VIRTIO_NET_F_CTRL_VQ) | RTE_BIT64(VIRTIO_NET_F_MQ));
+	else
+		total_queues += 1; /* Includes ctrl queue */
+
+	vnet_config.max_virtqueue_pairs = max_queue_pairs;
 	memset(dev_config, 0, sizeof(struct vduse_dev_config));
 
 	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
 	dev_config->device_id = VIRTIO_ID_NET;
 	dev_config->vendor_id = 0;
-	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
-	dev_config->vq_num = 2;
+	dev_config->features = features;
+	dev_config->vq_num = total_queues;
 	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
-	dev_config->config_size = 0;
+	dev_config->config_size = sizeof(struct virtio_net_config);
+	memcpy(dev_config->config, &vnet_config, sizeof(vnet_config));
 
 	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
 	if (ret < 0) {
@@ -433,7 +493,7 @@ vduse_device_create(const char *path)
 	dev->vduse_dev_fd = dev_fd;
 	vhost_setup_virtio_net(dev->vid, true, true, true, true);
 
-	for (i = 0; i < 2; i++) {
+	for (i = 0; i < total_queues; i++) {
 		struct vduse_vq_config vq_cfg = { 0 };
 
 		ret = alloc_vring_queue(dev, i);
@@ -452,6 +512,8 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	dev->cvq = dev->virtqueue[max_queue_pairs * 2];
+
 	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
 	if (ret) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
@@ -498,6 +560,13 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	if (dev->cvq && dev->cvq->kickfd >= 0) {
+		fdset_del(&vduse.fdset, dev->cvq->kickfd);
+		fdset_pipe_notify(&vduse.fdset);
+		close(dev->cvq->kickfd);
+		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+	}
+
 	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
 	fdset_pipe_notify(&vduse.fdset);
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 28/28] vhost: add VDUSE device stop
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (26 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 27/28] vhost: add multiqueue support to VDUSE Maxime Coquelin
@ 2023-05-25 16:25 ` Maxime Coquelin
  2023-05-29  6:53   ` Xia, Chenbo
  2023-05-26  9:14 ` [PATCH v3 00/28] Add VDUSE support to Vhost library David Marchand
  28 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-05-25 16:25 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds VDUSE device stop and cleanup of its
virtqueues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_23_07.rst |  6 +++
 lib/vhost/vduse.c                      | 72 +++++++++++++++++++++++---
 2 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index fa889a5ee7..66ba9e25dd 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -60,6 +60,12 @@ New Features
   Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
   maximum number of supported queue pairs, required for VDUSE support.
 
+* **Added VDUSE support into Vhost library
+
+  VDUSE aims at implementing vDPA devices in userspace. It can be used as an
+  alternative to Vhost-user when using Vhost-vDPA, but also enable providing a
+  virtio-net netdev to the host when using Virtio-vDPA driver.
+
 
 Removed Items
 -------------
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 699cfed9e3..f421b1cf4c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 	}
 }
 
+static void
+vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[index];
+	struct vduse_vq_eventfd vq_efd;
+	int ret;
+
+	if (vq == dev->cvq && vq->kickfd >= 0) {
+		fdset_del(&vduse.fdset, vq->kickfd);
+		fdset_pipe_notify(&vduse.fdset);
+	}
+
+	vq_efd.index = index;
+	vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+	if (ret)
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+
+	close(vq->kickfd);
+	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+	vring_invalidate(dev, vq);
+
+	rte_free(vq->batch_copy_elems);
+	vq->batch_copy_elems = NULL;
+
+	rte_free(vq->shadow_used_split);
+	vq->shadow_used_split = NULL;
+
+	vq->enabled = false;
+	vq->ready = false;
+	vq->size = 0;
+	vq->last_used_idx = 0;
+	vq->last_avail_idx = 0;
+}
+
 static void
 vduse_device_start(struct virtio_net *dev)
 {
@@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
 	}
 }
 
+static void
+vduse_device_stop(struct virtio_net *dev)
+{
+	unsigned int i;
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
+
+	vhost_destroy_device_notify(dev);
+
+	dev->flags &= ~VIRTIO_DEV_READY;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		vduse_vring_cleanup(dev, i);
+
+	vhost_user_iotlb_flush_all(dev);
+}
+
 static void
 vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 {
@@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	struct vduse_dev_request req;
 	struct vduse_dev_response resp;
 	struct vhost_virtqueue *vq;
+	uint8_t old_status;
 	int ret;
 
 	memset(&resp, 0, sizeof(resp));
@@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	case VDUSE_SET_STATUS:
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
 				req.s.status);
+		old_status = dev->status;
 		dev->status = req.s.status;
 
-		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
-			vduse_device_start(dev);
+		if ((old_status ^ dev->status) & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
+			if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+				vduse_device_start(dev);
+			else
+				vduse_device_stop(dev);
+		}
 
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
@@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
-	if (dev->cvq && dev->cvq->kickfd >= 0) {
-		fdset_del(&vduse.fdset, dev->cvq->kickfd);
-		fdset_pipe_notify(&vduse.fdset);
-		close(dev->cvq->kickfd);
-		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
-	}
+	vduse_device_stop(dev);
 
 	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
 	fdset_pipe_notify(&vduse.fdset);
-- 
2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 28/28] vhost: add VDUSE device stop
  2023-05-25 16:25 ` [PATCH v3 28/28] vhost: add VDUSE device stop Maxime Coquelin
@ 2023-05-29  6:53   ` Xia, Chenbo
  2023-06-01 18:48     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  6:53 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 26, 2023 12:26 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v3 28/28] vhost: add VDUSE device stop
> 
> This patch adds VDUSE device stop and cleanup of its
> virtqueues.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/rel_notes/release_23_07.rst |  6 +++
>  lib/vhost/vduse.c                      | 72 +++++++++++++++++++++++---
>  2 files changed, 70 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_23_07.rst
> b/doc/guides/rel_notes/release_23_07.rst
> index fa889a5ee7..66ba9e25dd 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -60,6 +60,12 @@ New Features
>    Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
> the
>    maximum number of supported queue pairs, required for VDUSE support.
> 
> +* **Added VDUSE support into Vhost library

Missing '.**' at the end like patch 15

Thanks,
Chenbo

> +
> +  VDUSE aims at implementing vDPA devices in userspace. It can be used as
> an
> +  alternative to Vhost-user when using Vhost-vDPA, but also enable
> providing a
> +  virtio-net netdev to the host when using Virtio-vDPA driver.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 699cfed9e3..f421b1cf4c 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
> int index)
>  	}
>  }
> 
> +static void
> +vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
> +{
> +	struct vhost_virtqueue *vq = dev->virtqueue[index];
> +	struct vduse_vq_eventfd vq_efd;
> +	int ret;
> +
> +	if (vq == dev->cvq && vq->kickfd >= 0) {
> +		fdset_del(&vduse.fdset, vq->kickfd);
> +		fdset_pipe_notify(&vduse.fdset);
> +	}
> +
> +	vq_efd.index = index;
> +	vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
> +	if (ret)
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd
> for VQ %u: %s\n",
> +				index, strerror(errno));
> +
> +	close(vq->kickfd);
> +	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +
> +	vring_invalidate(dev, vq);
> +
> +	rte_free(vq->batch_copy_elems);
> +	vq->batch_copy_elems = NULL;
> +
> +	rte_free(vq->shadow_used_split);
> +	vq->shadow_used_split = NULL;
> +
> +	vq->enabled = false;
> +	vq->ready = false;
> +	vq->size = 0;
> +	vq->last_used_idx = 0;
> +	vq->last_avail_idx = 0;
> +}
> +
>  static void
>  vduse_device_start(struct virtio_net *dev)
>  {
> @@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
>  	}
>  }
> 
> +static void
> +vduse_device_stop(struct virtio_net *dev)
> +{
> +	unsigned int i;
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
> +
> +	vhost_destroy_device_notify(dev);
> +
> +	dev->flags &= ~VIRTIO_DEV_READY;
> +
> +	for (i = 0; i < dev->nr_vring; i++)
> +		vduse_vring_cleanup(dev, i);
> +
> +	vhost_user_iotlb_flush_all(dev);
> +}
> +
>  static void
>  vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>  {
> @@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  	struct vduse_dev_request req;
>  	struct vduse_dev_response resp;
>  	struct vhost_virtqueue *vq;
> +	uint8_t old_status;
>  	int ret;
> 
>  	memset(&resp, 0, sizeof(resp));
> @@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  	case VDUSE_SET_STATUS:
>  		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
>  				req.s.status);
> +		old_status = dev->status;
>  		dev->status = req.s.status;
> 
> -		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> -			vduse_device_start(dev);
> +		if ((old_status ^ dev->status) &
> VIRTIO_DEVICE_STATUS_DRIVER_OK) {
> +			if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> +				vduse_device_start(dev);
> +			else
> +				vduse_device_stop(dev);
> +		}
> 
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
> @@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
>  	if (vid == RTE_MAX_VHOST_DEVICE)
>  		return -1;
> 
> -	if (dev->cvq && dev->cvq->kickfd >= 0) {
> -		fdset_del(&vduse.fdset, dev->cvq->kickfd);
> -		fdset_pipe_notify(&vduse.fdset);
> -		close(dev->cvq->kickfd);
> -		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> -	}
> +	vduse_device_stop(dev);
> 
>  	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
>  	fdset_pipe_notify(&vduse.fdset);
> --
> 2.40.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 28/28] vhost: add VDUSE device stop
  2023-05-29  6:53   ` Xia, Chenbo
@ 2023-06-01 18:48     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 18:48 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu



On 5/29/23 08:53, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, May 26, 2023 12:26 AM
>> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com; lulu@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v3 28/28] vhost: add VDUSE device stop
>>
>> This patch adds VDUSE device stop and cleanup of its
>> virtqueues.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   doc/guides/rel_notes/release_23_07.rst |  6 +++
>>   lib/vhost/vduse.c                      | 72 +++++++++++++++++++++++---
>>   2 files changed, 70 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_23_07.rst
>> b/doc/guides/rel_notes/release_23_07.rst
>> index fa889a5ee7..66ba9e25dd 100644
>> --- a/doc/guides/rel_notes/release_23_07.rst
>> +++ b/doc/guides/rel_notes/release_23_07.rst
>> @@ -60,6 +60,12 @@ New Features
>>     Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
>> the
>>     maximum number of supported queue pairs, required for VDUSE support.
>>
>> +* **Added VDUSE support into Vhost library
> 
> Missing '.**' at the end like patch 15

Fixed.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>> +
>> +  VDUSE aims at implementing vDPA devices in userspace. It can be used as
>> an
>> +  alternative to Vhost-user when using Vhost-vDPA, but also enable
>> providing a
>> +  virtio-net netdev to the host when using Virtio-vDPA driver.
>> +
>>
>>   Removed Items
>>   -------------
>> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
>> index 699cfed9e3..f421b1cf4c 100644
>> --- a/lib/vhost/vduse.c
>> +++ b/lib/vhost/vduse.c
>> @@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
>> int index)
>>   	}
>>   }
>>
>> +static void
>> +vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
>> +{
>> +	struct vhost_virtqueue *vq = dev->virtqueue[index];
>> +	struct vduse_vq_eventfd vq_efd;
>> +	int ret;
>> +
>> +	if (vq == dev->cvq && vq->kickfd >= 0) {
>> +		fdset_del(&vduse.fdset, vq->kickfd);
>> +		fdset_pipe_notify(&vduse.fdset);
>> +	}
>> +
>> +	vq_efd.index = index;
>> +	vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
>> +
>> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
>> +	if (ret)
>> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd
>> for VQ %u: %s\n",
>> +				index, strerror(errno));
>> +
>> +	close(vq->kickfd);
>> +	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>> +
>> +	vring_invalidate(dev, vq);
>> +
>> +	rte_free(vq->batch_copy_elems);
>> +	vq->batch_copy_elems = NULL;
>> +
>> +	rte_free(vq->shadow_used_split);
>> +	vq->shadow_used_split = NULL;
>> +
>> +	vq->enabled = false;
>> +	vq->ready = false;
>> +	vq->size = 0;
>> +	vq->last_used_idx = 0;
>> +	vq->last_avail_idx = 0;
>> +}
>> +
>>   static void
>>   vduse_device_start(struct virtio_net *dev)
>>   {
>> @@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
>>   	}
>>   }
>>
>> +static void
>> +vduse_device_stop(struct virtio_net *dev)
>> +{
>> +	unsigned int i;
>> +
>> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
>> +
>> +	vhost_destroy_device_notify(dev);
>> +
>> +	dev->flags &= ~VIRTIO_DEV_READY;
>> +
>> +	for (i = 0; i < dev->nr_vring; i++)
>> +		vduse_vring_cleanup(dev, i);
>> +
>> +	vhost_user_iotlb_flush_all(dev);
>> +}
>> +
>>   static void
>>   vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>>   {
>> @@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove
>> __rte_unused)
>>   	struct vduse_dev_request req;
>>   	struct vduse_dev_response resp;
>>   	struct vhost_virtqueue *vq;
>> +	uint8_t old_status;
>>   	int ret;
>>
>>   	memset(&resp, 0, sizeof(resp));
>> @@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove
>> __rte_unused)
>>   	case VDUSE_SET_STATUS:
>>   		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
>>   				req.s.status);
>> +		old_status = dev->status;
>>   		dev->status = req.s.status;
>>
>> -		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
>> -			vduse_device_start(dev);
>> +		if ((old_status ^ dev->status) &
>> VIRTIO_DEVICE_STATUS_DRIVER_OK) {
>> +			if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
>> +				vduse_device_start(dev);
>> +			else
>> +				vduse_device_stop(dev);
>> +		}
>>
>>   		resp.result = VDUSE_REQ_RESULT_OK;
>>   		break;
>> @@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
>>   	if (vid == RTE_MAX_VHOST_DEVICE)
>>   		return -1;
>>
>> -	if (dev->cvq && dev->cvq->kickfd >= 0) {
>> -		fdset_del(&vduse.fdset, dev->cvq->kickfd);
>> -		fdset_pipe_notify(&vduse.fdset);
>> -		close(dev->cvq->kickfd);
>> -		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>> -	}
>> +	vduse_device_stop(dev);
>>
>>   	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
>>   	fdset_pipe_notify(&vduse.fdset);
>> --
>> 2.40.1
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/28] Add VDUSE support to Vhost library
  2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (27 preceding siblings ...)
  2023-05-25 16:25 ` [PATCH v3 28/28] vhost: add VDUSE device stop Maxime Coquelin
@ 2023-05-26  9:14 ` David Marchand
  2023-06-01 14:59   ` Maxime Coquelin
  28 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-05-26  9:14 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, May 25, 2023 at 6:25 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Note: v2 is identical to v3, it is just a resend because
> of an issue when posting v2 breaking the series in patchwork.
>
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
>
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
>
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
>
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
>
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
>
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
>
> Benchmark results:
> ==================
>
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
>
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>
> Usage:
> ======
>
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
>
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
>
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0  ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
>
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>   ############################################################################
> testpmd> stop
>
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
>
> 7. Quit testpmd
> testpmd> quit
>
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>  -> a temporary patch to support reconnection via a tmpfs file is available,
>     upstream solution would be in-kernel and is being developed.
>  -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results
>
> Changes in v2/v3:
> =================
> - Fixed mem_set_dump() parameter (patch 4)
> - Fixed accidental comment change (patch 7, Chenbo)
> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> - move change from patch 12 to 13 (Chenbo)
> - Enable locks annotation for control queue (Patch 17)
> - Send control queue notification when used descriptors enqueued (Patch 17)
> - Lock control queue IOTLB lock (Patch 17)
> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> - Enable more Virtio features (Patch 18)
> - Remove calls to pthread_setcancelstate() (Patch 22)
> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
> - Use RTE_DIM() to get requests string array size (Patch 22)
> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> - Fix queues enablement with multiqueue (Patch 26)
> - Move kickfd creation for better logging (Patch 26)
> - Improve logging (Patch 26)
> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
> - Enable CVQ notifications once handler is installed (Patch 27)
> - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
> - Add release notes
>
> Maxime Coquelin (28):
>   vhost: fix missing guest notif stat increment
>   vhost: fix invalid call FD handling
>   vhost: fix IOTLB entries overlap check with previous entry
>   vhost: add helper of IOTLB entries coredump
>   vhost: add helper for IOTLB entries shared page check
>   vhost: don't dump unneeded pages with IOTLB
>   vhost: change to single IOTLB cache per device
>   vhost: add offset field to IOTLB entries
>   vhost: add page size info to IOTLB entry
>   vhost: retry translating IOVA after IOTLB miss
>   vhost: introduce backend ops
>   vhost: add IOTLB cache entry removal callback
>   vhost: add helper for IOTLB misses
>   vhost: add helper for interrupt injection
>   vhost: add API to set max queue pairs
>   net/vhost: use API to set max queue pairs
>   vhost: add control virtqueue support
>   vhost: add VDUSE device creation and destruction
>   vhost: add VDUSE callback for IOTLB miss
>   vhost: add VDUSE callback for IOTLB entry removal
>   vhost: add VDUSE callback for IRQ injection
>   vhost: add VDUSE events handler
>   vhost: add support for virtqueue state get event
>   vhost: add support for VDUSE status set event
>   vhost: add support for VDUSE IOTLB update event
>   vhost: add VDUSE device startup
>   vhost: add multiqueue support to VDUSE
>   vhost: add VDUSE device stop
>
>  doc/guides/prog_guide/vhost_lib.rst    |   4 +
>  doc/guides/rel_notes/release_23_07.rst |  11 +
>  drivers/net/vhost/rte_eth_vhost.c      |   3 +
>  lib/vhost/iotlb.c                      | 333 +++++++------
>  lib/vhost/iotlb.h                      |  45 +-
>  lib/vhost/meson.build                  |   5 +
>  lib/vhost/rte_vhost.h                  |  17 +
>  lib/vhost/socket.c                     |  72 ++-
>  lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>  lib/vhost/vduse.h                      |  33 ++
>  lib/vhost/version.map                  |   3 +
>  lib/vhost/vhost.c                      |  51 +-
>  lib/vhost/vhost.h                      |  90 ++--
>  lib/vhost/vhost_user.c                 |  51 +-
>  lib/vhost/vhost_user.h                 |   2 +-
>  lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>  lib/vhost/virtio_net_ctrl.h            |  10 +
>  17 files changed, 1424 insertions(+), 238 deletions(-)
>  create mode 100644 lib/vhost/vduse.c
>  create mode 100644 lib/vhost/vduse.h
>  create mode 100644 lib/vhost/virtio_net_ctrl.c
>  create mode 100644 lib/vhost/virtio_net_ctrl.h

I did not do a in-depth review, but overall, the series lgtm (and per
patch compilation looks fine).

A few comments though:
- patch 2 is the same as
https://patchwork.dpdk.org/project/dpdk/patch/168431454344.558450.2397970324914136724.stgit@ebuild.local/
It would be cool to report the review tags in the first series that
gets applied.
- there may be a bug in patch 5, see comment on patch,
- patch 4, 5 and 6 go together, with patch 6 being the fix itself. I
understand it was easier to review as splitted patches, but maybe it
would be simpler to squash them to make the backport trivial.
- patch 7 (and some other patches in the series) will increase the
virtio_net structure but we are not gaining anything on the
vhost_virtqueue size, so a device + vqs memory footprint will slightly
increase. This is not a problem afaics?
- patch 15 breaks the doc, format is incorrect but the CI reported it
so you will notice it before merging :-),


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/28] Add VDUSE support to Vhost library
  2023-05-26  9:14 ` [PATCH v3 00/28] Add VDUSE support to Vhost library David Marchand
@ 2023-06-01 14:59   ` Maxime Coquelin
  2023-06-01 15:18     ` David Marchand
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2023-06-01 14:59 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

Hi David,

On 5/26/23 11:14, David Marchand wrote:
> On Thu, May 25, 2023 at 6:25 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> Note: v2 is identical to v3, it is just a resend because
>> of an issue when posting v2 breaking the series in patchwork.
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0  ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>    ############################################################################
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
>>
>> Changes in v2/v3:
>> =================
>> - Fixed mem_set_dump() parameter (patch 4)
>> - Fixed accidental comment change (patch 7, Chenbo)
>> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
>> - move change from patch 12 to 13 (Chenbo)
>> - Enable locks annotation for control queue (Patch 17)
>> - Send control queue notification when used descriptors enqueued (Patch 17)
>> - Lock control queue IOTLB lock (Patch 17)
>> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
>> - Set VDUSE dev FD as NONBLOCK (Patch 18)
>> - Enable more Virtio features (Patch 18)
>> - Remove calls to pthread_setcancelstate() (Patch 22)
>> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
>> - Use RTE_DIM() to get requests string array size (Patch 22)
>> - Set reply result for IOTLB update message (Patch 25, Chenbo)
>> - Fix queues enablement with multiqueue (Patch 26)
>> - Move kickfd creation for better logging (Patch 26)
>> - Improve logging (Patch 26)
>> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
>> - Enable CVQ notifications once handler is installed (Patch 27)
>> - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
>> - Add release notes
>>
>> Maxime Coquelin (28):
>>    vhost: fix missing guest notif stat increment
>>    vhost: fix invalid call FD handling
>>    vhost: fix IOTLB entries overlap check with previous entry
>>    vhost: add helper of IOTLB entries coredump
>>    vhost: add helper for IOTLB entries shared page check
>>    vhost: don't dump unneeded pages with IOTLB
>>    vhost: change to single IOTLB cache per device
>>    vhost: add offset field to IOTLB entries
>>    vhost: add page size info to IOTLB entry
>>    vhost: retry translating IOVA after IOTLB miss
>>    vhost: introduce backend ops
>>    vhost: add IOTLB cache entry removal callback
>>    vhost: add helper for IOTLB misses
>>    vhost: add helper for interrupt injection
>>    vhost: add API to set max queue pairs
>>    net/vhost: use API to set max queue pairs
>>    vhost: add control virtqueue support
>>    vhost: add VDUSE device creation and destruction
>>    vhost: add VDUSE callback for IOTLB miss
>>    vhost: add VDUSE callback for IOTLB entry removal
>>    vhost: add VDUSE callback for IRQ injection
>>    vhost: add VDUSE events handler
>>    vhost: add support for virtqueue state get event
>>    vhost: add support for VDUSE status set event
>>    vhost: add support for VDUSE IOTLB update event
>>    vhost: add VDUSE device startup
>>    vhost: add multiqueue support to VDUSE
>>    vhost: add VDUSE device stop
>>
>>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>>   doc/guides/rel_notes/release_23_07.rst |  11 +
>>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>>   lib/vhost/iotlb.c                      | 333 +++++++------
>>   lib/vhost/iotlb.h                      |  45 +-
>>   lib/vhost/meson.build                  |   5 +
>>   lib/vhost/rte_vhost.h                  |  17 +
>>   lib/vhost/socket.c                     |  72 ++-
>>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>>   lib/vhost/vduse.h                      |  33 ++
>>   lib/vhost/version.map                  |   3 +
>>   lib/vhost/vhost.c                      |  51 +-
>>   lib/vhost/vhost.h                      |  90 ++--
>>   lib/vhost/vhost_user.c                 |  51 +-
>>   lib/vhost/vhost_user.h                 |   2 +-
>>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>>   lib/vhost/virtio_net_ctrl.h            |  10 +
>>   17 files changed, 1424 insertions(+), 238 deletions(-)
>>   create mode 100644 lib/vhost/vduse.c
>>   create mode 100644 lib/vhost/vduse.h
>>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>>   create mode 100644 lib/vhost/virtio_net_ctrl.h
> 
> I did not do a in-depth review, but overall, the series lgtm (and per
> patch compilation looks fine).
> 
> A few comments though:
> - patch 2 is the same as
> https://patchwork.dpdk.org/project/dpdk/patch/168431454344.558450.2397970324914136724.stgit@ebuild.local/
> It would be cool to report the review tags in the first series that
> gets applied.

Done

> - there may be a bug in patch 5, see comment on patch,
> - patch 4, 5 and 6 go together, with patch 6 being the fix itself. I
> understand it was easier to review as splitted patches, but maybe it
> would be simpler to squash them to make the backport trivial.

If possible I would prefer to keep them as separate patches, it will be
much easier to understand the code in the future if a regression
happened.

I'll help the LTS maintainer with backporting it (i.e. request to also
pick patch 5 and 5).

Does that work for you?

> - patch 7 (and some other patches in the series) will increase the
> virtio_net structure but we are not gaining anything on the
> vhost_virtqueue size, so a device + vqs memory footprint will slightly
> increase. This is not a problem afaics?

I have not noticed performance degradation being introduced, but it may 
be good to revisit it.

> - patch 15 breaks the doc, format is incorrect but the CI reported it
> so you will notice it before merging :-),
> 

Fixed!

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/28] Add VDUSE support to Vhost library
  2023-06-01 14:59   ` Maxime Coquelin
@ 2023-06-01 15:18     ` David Marchand
  0 siblings, 0 replies; 50+ messages in thread
From: David Marchand @ 2023-06-01 15:18 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Thu, Jun 1, 2023 at 4:59 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> > - patch 4, 5 and 6 go together, with patch 6 being the fix itself. I
> > understand it was easier to review as splitted patches, but maybe it
> > would be simpler to squash them to make the backport trivial.
>
> If possible I would prefer to keep them as separate patches, it will be
> much easier to understand the code in the future if a regression
> happened.
>
> I'll help the LTS maintainer with backporting it (i.e. request to also
> pick patch 5 and 5).

4*

>
> Does that work for you?

Ok for me.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-06-01 19:59 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 16:25 [PATCH v3 00/28] Add VDUSE support to Vhost library Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 01/28] vhost: fix missing guest notif stat increment Maxime Coquelin
2023-06-01 19:59   ` Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 02/28] vhost: fix invalid call FD handling Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 03/28] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 04/28] vhost: add helper of IOTLB entries coredump Maxime Coquelin
2023-05-26  8:46   ` David Marchand
2023-06-01 13:43     ` Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 05/28] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 06/28] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 07/28] vhost: change to single IOTLB cache per device Maxime Coquelin
2023-05-29  6:32   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 08/28] vhost: add offset field to IOTLB entries Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 09/28] vhost: add page size info to IOTLB entry Maxime Coquelin
2023-05-29  6:32   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 10/28] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 11/28] vhost: introduce backend ops Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 12/28] vhost: add IOTLB cache entry removal callback Maxime Coquelin
2023-05-29  6:33   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 13/28] vhost: add helper for IOTLB misses Maxime Coquelin
2023-05-29  6:33   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 14/28] vhost: add helper for interrupt injection Maxime Coquelin
2023-05-26  8:54   ` David Marchand
2023-06-01 13:58     ` Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 15/28] vhost: add API to set max queue pairs Maxime Coquelin
2023-05-26  8:58   ` David Marchand
2023-06-01 14:00     ` Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 16/28] net/vhost: use " Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 17/28] vhost: add control virtqueue support Maxime Coquelin
2023-05-29  6:51   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 18/28] vhost: add VDUSE device creation and destruction Maxime Coquelin
2023-05-26  9:11   ` David Marchand
2023-06-01 14:05     ` Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 19/28] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 20/28] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
2023-05-29  6:51   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 21/28] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 22/28] vhost: add VDUSE events handler Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 23/28] vhost: add support for virtqueue state get event Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 24/28] vhost: add support for VDUSE status set event Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 25/28] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
2023-05-29  6:52   ` Xia, Chenbo
2023-05-25 16:25 ` [PATCH v3 26/28] vhost: add VDUSE device startup Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 27/28] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-05-25 16:25 ` [PATCH v3 28/28] vhost: add VDUSE device stop Maxime Coquelin
2023-05-29  6:53   ` Xia, Chenbo
2023-06-01 18:48     ` Maxime Coquelin
2023-05-26  9:14 ` [PATCH v3 00/28] Add VDUSE support to Vhost library David Marchand
2023-06-01 14:59   ` Maxime Coquelin
2023-06-01 15:18     ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).