DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v5 00/26] Add VDUSE support to Vhost library
@ 2023-06-06  8:18 Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
                   ` (28 more replies)
  0 siblings, 29 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This series introduces a new type of backend, VDUSE,
to the Vhost library.

VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.

Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.

It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.

While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:

https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc

In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.

Benchmark results:
==================

On this v2, PVP reference benchmark has been run & compared with
Vhost-user.

When doing macswap forwarding in the worload, no difference is seen.
When doing io forwarding in the workload, we see 4% performance
degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
explained by the use of the IOTLB layer in the Vhost-library when using
VDUSE, whereas Vhost-user/Virtio-user does not make use of it.

Usage:
======

1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa

2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build

3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
 
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff

5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
  ######################## NIC statistics for port 0  ########################
  RX-packets: 11         RX-missed: 0          RX-bytes:  1482
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 1          TX-errors: 0          TX-bytes:  62

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################
testpmd> stop

6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0

7. Quit testpmd
testpmd> quit

Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
 -> a temporary patch to support reconnection via a tmpfs file is available,
    upstream solution would be in-kernel and is being developed.
 -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
- Support packed ring
- Provide more performance benchmark results

Changes in v5:
==============
- Delay starting/stopping the device to after having replied to the VDUSE
  event in order to avoid a deadlock encountered when testing with OVS.
- Mention reconnection support lack in the release note.

Changes in v4:
==============
- Applied patch 1 and patch 2 from v3
- Rebased on top of Eelco series
- Fix coredump clear in IOTLB cache removal (David)
- Remove uneeded ret variable in vhost_vring_inject_irq (David)
- Fixed release note (David, Chenbo)

Changes in v2/v3:
=================
- Fixed mem_set_dump() parameter (patch 4)
- Fixed accidental comment change (patch 7, Chenbo)
- Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
- move change from patch 12 to 13 (Chenbo)
- Enable locks annotation for control queue (Patch 17)
- Send control queue notification when used descriptors enqueued (Patch 17)
- Lock control queue IOTLB lock (Patch 17)
- Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
- Set VDUSE dev FD as NONBLOCK (Patch 18)
- Enable more Virtio features (Patch 18)
- Remove calls to pthread_setcancelstate() (Patch 22)
- Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
- Use RTE_DIM() to get requests string array size (Patch 22)
- Set reply result for IOTLB update message (Patch 25, Chenbo)
- Fix queues enablement with multiqueue (Patch 26)
- Move kickfd creation for better logging (Patch 26)
- Improve logging (Patch 26)
- Uninstall cvq kickfd in case of handler installation failure (Patch 27)
- Enable CVQ notifications once handler is installed (Patch 27)
- Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
- Add release notes

Maxime Coquelin (26):
  vhost: fix IOTLB entries overlap check with previous entry
  vhost: add helper of IOTLB entries coredump
  vhost: add helper for IOTLB entries shared page check
  vhost: don't dump unneeded pages with IOTLB
  vhost: change to single IOTLB cache per device
  vhost: add offset field to IOTLB entries
  vhost: add page size info to IOTLB entry
  vhost: retry translating IOVA after IOTLB miss
  vhost: introduce backend ops
  vhost: add IOTLB cache entry removal callback
  vhost: add helper for IOTLB misses
  vhost: add helper for interrupt injection
  vhost: add API to set max queue pairs
  net/vhost: use API to set max queue pairs
  vhost: add control virtqueue support
  vhost: add VDUSE device creation and destruction
  vhost: add VDUSE callback for IOTLB miss
  vhost: add VDUSE callback for IOTLB entry removal
  vhost: add VDUSE callback for IRQ injection
  vhost: add VDUSE events handler
  vhost: add support for virtqueue state get event
  vhost: add support for VDUSE status set event
  vhost: add support for VDUSE IOTLB update event
  vhost: add VDUSE device startup
  vhost: add multiqueue support to VDUSE
  vhost: add VDUSE device stop

 doc/guides/prog_guide/vhost_lib.rst    |   4 +
 doc/guides/rel_notes/release_23_07.rst |  12 +
 drivers/net/vhost/rte_eth_vhost.c      |   3 +
 lib/vhost/iotlb.c                      | 333 +++++++------
 lib/vhost/iotlb.h                      |  45 +-
 lib/vhost/meson.build                  |   5 +
 lib/vhost/rte_vhost.h                  |  17 +
 lib/vhost/socket.c                     |  72 ++-
 lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
 lib/vhost/vduse.h                      |  33 ++
 lib/vhost/version.map                  |   1 +
 lib/vhost/vhost.c                      |  70 ++-
 lib/vhost/vhost.h                      |  57 ++-
 lib/vhost/vhost_user.c                 |  51 +-
 lib/vhost/vhost_user.h                 |   2 +-
 lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
 lib/vhost/virtio_net_ctrl.h            |  10 +
 17 files changed, 1409 insertions(+), 238 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap check")
fixed the check to ensure the entry to be removed does not
overlap with the next one in the IOTLB cache before marking
it as DONTDUMP with madvise(). This is not enough, because
the same issue is present when comparing with the previous
entry in the cache, where the end address of the previous
entry should be used, not the start one.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 3f45bc6061..870c8acb88 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 			mask = ~(alignment - 1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
@@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			mask = ~(alignment-1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 02/26] vhost: add helper of IOTLB entries coredump
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch reworks IOTLB code to extract madvise-related
bits into dedicated helper. This refactoring improves code
sharing.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 77 +++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 870c8acb88..51d45de446 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,34 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+{
+	uint64_t align;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+
+	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+}
+
+static void
+vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
+{
+	uint64_t align, mask;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	mask = ~(align - 1);
+
+	/* Don't disable coredump if the previous node is in the same page */
+	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+		/* Don't disable coredump if the next node is in the same page */
+		if (next == NULL ||
+				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+	}
+}
+
 static struct vhost_iotlb_entry *
 vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
 {
@@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
-		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
-			hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
+		vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
+
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(vq, node);
 	}
@@ -164,7 +192,6 @@ static void
 vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 	int entry_idx;
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
@@ -173,20 +200,10 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment - 1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
 			vq->iotlb_cache_nr--;
@@ -240,16 +257,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 			vhost_user_iotlb_pool_put(vq, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-				hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+			vhost_user_iotlb_set_dump(dev, new_node);
+
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			vq->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
-	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+	vhost_user_iotlb_set_dump(dev, new_node);
+
 	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
 	vq->iotlb_cache_nr++;
 
@@ -265,7 +282,6 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 					uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 
 	if (unlikely(!size))
 		return;
@@ -278,20 +294,9 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			break;
 
 		if (iova < node->iova + node->size) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment-1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 03/26] vhost: add helper for IOTLB entries shared page check
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces a helper to check whether two IOTLB
entries share a page.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 51d45de446..4ef038adff 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static bool
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
+		uint64_t align)
+{
+	uint64_t a_end, b_start;
+
+	if (a == NULL || b == NULL)
+		return false;
+
+	/* Assumes entry a lower than entry b */
+	RTE_ASSERT(a->uaddr < b->uaddr);
+	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+
+	return a_end > b_start;
+}
+
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
@@ -37,16 +54,14 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, mask;
+	uint64_t align;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-	mask = ~(align - 1);
 
 	/* Don't disable coredump if the previous node is in the same page */
-	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+	if (!vhost_user_iotlb_share_page(prev, node, align)) {
 		/* Don't disable coredump if the next node is in the same page */
-		if (next == NULL ||
-				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+		if (!vhost_user_iotlb_share_page(node, next, align))
 			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
 	}
 }
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 04/26] vhost: don't dump unneeded pages with IOTLB
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (2 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin, stable

On IOTLB entry removal, previous fixes took care of not
marking pages shared with other IOTLB entries as DONTDUMP.

However, if an IOTLB entry is spanned on multiple pages,
the other pages were kept as DODUMP while they might not
have been shared with other entries, increasing needlessly
the coredump size.

This patch addresses this issue by excluding only the
shared pages from madvise's DONTDUMP.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 4ef038adff..95d67ac832 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -54,16 +54,23 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align;
+	uint64_t align, start, end;
+
+	start = node->uaddr;
+	end = node->uaddr + node->size;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
 
-	/* Don't disable coredump if the previous node is in the same page */
-	if (!vhost_user_iotlb_share_page(prev, node, align)) {
-		/* Don't disable coredump if the next node is in the same page */
-		if (!vhost_user_iotlb_share_page(node, next, align))
-			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
-	}
+	/* Skip first page if shared with previous entry. */
+	if (vhost_user_iotlb_share_page(prev, node, align))
+		start = RTE_ALIGN_CEIL(start, align);
+
+	/* Skip last page if shared with next entry. */
+	if (vhost_user_iotlb_share_page(node, next, align))
+		end = RTE_ALIGN_FLOOR(end, align);
+
+	if (end > start)
+		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
 }
 
 static struct vhost_iotlb_entry *
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 05/26] vhost: change to single IOTLB cache per device
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (3 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch simplifies IOTLB implementation and improves
IOTLB memory consumption by having a single IOTLB cache
per device, instead of having one per queue.

In order to not impact performance, it keeps an IOTLB lock
per virtqueue, so that there is no contention between
multiple queue trying to acquire it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  43 ++++++---
 lib/vhost/vhost.c      |  18 ++--
 lib/vhost/vhost.h      |  16 ++--
 lib/vhost/vhost_user.c |  23 +++--
 5 files changed, 159 insertions(+), 153 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 95d67ac832..6d49bf6b30 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -74,86 +74,81 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 }
 
 static struct vhost_iotlb_entry *
-vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pool_get(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node;
 
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	node = SLIST_FIRST(&vq->iotlb_free_list);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	node = SLIST_FIRST(&dev->iotlb_free_list);
 	if (node != NULL)
-		SLIST_REMOVE_HEAD(&vq->iotlb_free_list, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+		SLIST_REMOVE_HEAD(&dev->iotlb_free_list, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 	return node;
 }
 
 static void
-vhost_user_iotlb_pool_put(struct vhost_virtqueue *vq,
-	struct vhost_iotlb_entry *node)
+vhost_user_iotlb_pool_put(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	SLIST_INSERT_HEAD(&vq->iotlb_free_list, node, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	SLIST_INSERT_HEAD(&dev->iotlb_free_list, node, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq);
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev);
 
 static void
-vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pending_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next, temp_node) {
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 bool
-vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-				uint8_t perm)
+vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	bool found = false;
 
-	rte_rwlock_read_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_pending_list, next) {
 		if ((node->iova == iova) && (node->perm == perm)) {
 			found = true;
 			break;
 		}
 	}
 
-	rte_rwlock_read_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_unlock(&dev->iotlb_pending_lock);
 
 	return found;
 }
 
 void
-vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint8_t perm)
+vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 
-	node = vhost_user_iotlb_pool_get(vq);
+	node = vhost_user_iotlb_pool_get(dev);
 	if (node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool for vq %"PRIu32" empty, clear entries for pending insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
-			vhost_user_iotlb_pending_remove_all(vq);
+			"IOTLB pool empty, clear entries for pending insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_pending_list))
+			vhost_user_iotlb_pending_remove_all(dev);
 		else
-			vhost_user_iotlb_cache_random_evict(dev, vq);
-		node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_cache_random_evict(dev);
+		node = vhost_user_iotlb_pool_get(dev);
 		if (node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, pending insertion failure\n",
-				vq->index);
+				"IOTLB pool still empty, pending insertion failure\n");
 			return;
 		}
 	}
@@ -161,22 +156,21 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *
 	node->iova = iova;
 	node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
+	TAILQ_INSERT_TAIL(&dev->iotlb_pending_list, node, next);
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 void
-vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t size, uint8_t perm)
+vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next,
 				temp_node) {
 		if (node->iova < iova)
 			continue;
@@ -184,81 +178,78 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 			continue;
 		if ((node->perm & perm) != node->perm)
 			continue;
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 static void
-vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
 
-		TAILQ_REMOVE(&vq->iotlb_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 	int entry_idx;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	entry_idx = rte_rand() % vq->iotlb_cache_nr;
+	entry_idx = rte_rand() % dev->iotlb_cache_nr;
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
 			break;
 		}
 		prev_node = node;
 		entry_idx--;
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t uaddr,
+vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 				uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
-	new_node = vhost_user_iotlb_pool_get(vq);
+	new_node = vhost_user_iotlb_pool_get(dev);
 	if (new_node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool vq %"PRIu32" empty, clear entries for cache insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_list))
-			vhost_user_iotlb_cache_random_evict(dev, vq);
+			"IOTLB pool empty, clear entries for cache insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_list))
+			vhost_user_iotlb_cache_random_evict(dev);
 		else
-			vhost_user_iotlb_pending_remove_all(vq);
-		new_node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_pending_remove_all(dev);
+		new_node = vhost_user_iotlb_pool_get(dev);
 		if (new_node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, cache insertion failed\n",
-				vq->index);
+				"IOTLB pool still empty, cache insertion failed\n");
 			return;
 		}
 	}
@@ -268,49 +259,47 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 	new_node->size = size;
 	new_node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/*
 		 * Entries must be invalidated before being updated.
 		 * So if iova already in list, assume identical.
 		 */
 		if (node->iova == new_node->iova) {
-			vhost_user_iotlb_pool_put(vq, new_node);
+			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
 			vhost_user_iotlb_set_dump(dev, new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
-			vq->iotlb_cache_nr++;
+			dev->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
 	vhost_user_iotlb_set_dump(dev, new_node);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
-	vq->iotlb_cache_nr++;
+	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
+	dev->iotlb_cache_nr++;
 
 unlock:
-	vhost_user_iotlb_pending_remove(vq, iova, size, perm);
-
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_pending_remove(dev, iova, size, perm);
 
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size)
+vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 
 	if (unlikely(!size))
 		return;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
@@ -320,19 +309,19 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
-		} else
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
+		} else {
 			prev_node = node;
+		}
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 uint64_t
-vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
-						uint64_t *size, uint8_t perm)
+vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	uint64_t offset, vva = 0, mapped = 0;
@@ -340,7 +329,7 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 	if (unlikely(!*size))
 		goto out;
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/* List sorted by iova */
 		if (unlikely(iova < node->iova))
 			break;
@@ -373,60 +362,57 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 }
 
 void
-vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_flush_all(struct virtio_net *dev)
 {
-	vhost_user_iotlb_cache_remove_all(dev, vq);
-	vhost_user_iotlb_pending_remove_all(vq);
+	vhost_user_iotlb_cache_remove_all(dev);
+	vhost_user_iotlb_pending_remove_all(dev);
 }
 
 int
-vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_init(struct virtio_net *dev)
 {
 	unsigned int i;
 	int socket = 0;
 
-	if (vq->iotlb_pool) {
+	if (dev->iotlb_pool) {
 		/*
 		 * The cache has already been initialized,
 		 * just drop all cached and pending entries.
 		 */
-		vhost_user_iotlb_flush_all(dev, vq);
-		rte_free(vq->iotlb_pool);
+		vhost_user_iotlb_flush_all(dev);
+		rte_free(dev->iotlb_pool);
 	}
 
 #ifdef RTE_LIBRTE_VHOST_NUMA
-	if (get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR) != 0)
+	if (get_mempolicy(&socket, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR) != 0)
 		socket = 0;
 #endif
 
-	rte_spinlock_init(&vq->iotlb_free_lock);
-	rte_rwlock_init(&vq->iotlb_lock);
-	rte_rwlock_init(&vq->iotlb_pending_lock);
+	rte_spinlock_init(&dev->iotlb_free_lock);
+	rte_rwlock_init(&dev->iotlb_pending_lock);
 
-	SLIST_INIT(&vq->iotlb_free_list);
-	TAILQ_INIT(&vq->iotlb_list);
-	TAILQ_INIT(&vq->iotlb_pending_list);
+	SLIST_INIT(&dev->iotlb_free_list);
+	TAILQ_INIT(&dev->iotlb_list);
+	TAILQ_INIT(&dev->iotlb_pending_list);
 
 	if (dev->flags & VIRTIO_DEV_SUPPORT_IOMMU) {
-		vq->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
+		dev->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
 			sizeof(struct vhost_iotlb_entry), 0, socket);
-		if (!vq->iotlb_pool) {
-			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"Failed to create IOTLB cache pool for vq %"PRIu32"\n",
-				vq->index);
+		if (!dev->iotlb_pool) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB cache pool\n");
 			return -1;
 		}
 		for (i = 0; i < IOTLB_CACHE_SIZE; i++)
-			vhost_user_iotlb_pool_put(vq, &vq->iotlb_pool[i]);
+			vhost_user_iotlb_pool_put(dev, &dev->iotlb_pool[i]);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
 	return 0;
 }
 
 void
-vhost_user_iotlb_destroy(struct vhost_virtqueue *vq)
+vhost_user_iotlb_destroy(struct virtio_net *dev)
 {
-	rte_free(vq->iotlb_pool);
+	rte_free(dev->iotlb_pool);
 }
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 73b5465b41..3490b9e6be 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -37,20 +37,37 @@ vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq)
 	rte_rwlock_write_unlock(&vq->iotlb_lock);
 }
 
-void vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t uaddr,
+static __rte_always_inline void
+vhost_user_iotlb_wr_lock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_lock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_unlock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 					uint64_t size, uint8_t perm);
-void vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size);
-uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
+uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
-bool vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-						uint8_t perm);
-void vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-						uint64_t iova, uint8_t perm);
-void vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq, uint64_t iova,
+bool vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova,
 						uint64_t size, uint8_t perm);
-void vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq);
-int vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq);
-void vhost_user_iotlb_destroy(struct vhost_virtqueue *vq);
+void vhost_user_iotlb_flush_all(struct virtio_net *dev);
+int vhost_user_iotlb_init(struct virtio_net *dev);
+void vhost_user_iotlb_destroy(struct virtio_net *dev);
+
 #endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 79e88f986e..3ddd2a963f 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -67,7 +67,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	tmp_size = *size;
 
-	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
 	if (tmp_size == *size) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			vq->stats.iotlb_hits++;
@@ -79,7 +79,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	iova += tmp_size;
 
-	if (!vhost_user_iotlb_pending_miss(vq, iova, perm)) {
+	if (!vhost_user_iotlb_pending_miss(dev, iova, perm)) {
 		/*
 		 * iotlb_lock is read-locked for a full burst,
 		 * but it only protects the iotlb cache.
@@ -89,12 +89,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		 */
 		vhost_user_iotlb_rd_unlock(vq);
 
-		vhost_user_iotlb_pending_insert(dev, vq, iova, perm);
+		vhost_user_iotlb_pending_insert(dev, iova, perm);
 		if (vhost_user_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
-			vhost_user_iotlb_pending_remove(vq, iova, 1, perm);
+			vhost_user_iotlb_pending_remove(dev, iova, 1, perm);
 		}
 
 		vhost_user_iotlb_rd_lock(vq);
@@ -401,7 +401,6 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	vhost_free_async_mem(vq);
 	rte_rwlock_write_unlock(&vq->access_lock);
 	rte_free(vq->batch_copy_elems);
-	vhost_user_iotlb_destroy(vq);
 	rte_free(vq->log_cache);
 	rte_free(vq);
 }
@@ -579,7 +578,7 @@ vring_invalidate(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq
 }
 
 static void
-init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
+init_vring_queue(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq,
 	uint32_t vring_idx)
 {
 	int numa_node = SOCKET_ID_ANY;
@@ -599,8 +598,6 @@ init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	}
 #endif
 	vq->numa_node = numa_node;
-
-	vhost_user_iotlb_init(dev, vq);
 }
 
 static void
@@ -635,6 +632,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 		dev->virtqueue[i] = vq;
 		init_vring_queue(dev, vq, i);
 		rte_rwlock_init(&vq->access_lock);
+		rte_rwlock_init(&vq->iotlb_lock);
 		vq->avail_wrap_counter = 1;
 		vq->used_wrap_counter = 1;
 		vq->signalled_used_valid = false;
@@ -799,6 +797,10 @@ vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags, bool stats
 		dev->flags |= VIRTIO_DEV_SUPPORT_IOMMU;
 	else
 		dev->flags &= ~VIRTIO_DEV_SUPPORT_IOMMU;
+
+	if (vhost_user_iotlb_init(dev) < 0)
+		VHOST_LOG_CONFIG("device", ERR, "failed to init IOTLB\n");
+
 }
 
 void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index eaf3b0d392..ee952de175 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -305,13 +305,6 @@ struct vhost_virtqueue {
 	struct log_cache_entry	*log_cache;
 
 	rte_rwlock_t	iotlb_lock;
-	rte_rwlock_t	iotlb_pending_lock;
-	struct vhost_iotlb_entry *iotlb_pool;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
-	int				iotlb_cache_nr;
-	rte_spinlock_t	iotlb_free_lock;
-	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
 
 	/* Used to notify the guest (trigger interrupt) */
 	int			callfd;
@@ -486,6 +479,15 @@ struct virtio_net {
 	int			extbuf;
 	int			linearbuf;
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+
+	rte_rwlock_t	iotlb_pending_lock;
+	struct vhost_iotlb_entry *iotlb_pool;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
+	int				iotlb_cache_nr;
+	rte_spinlock_t	iotlb_free_lock;
+	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
+
 	struct inflight_mem_info *inflight_info;
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index c9454ce3d9..f2fe7ebc93 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -237,6 +237,8 @@ vhost_backend_cleanup(struct virtio_net *dev)
 	}
 
 	dev->postcopy_listening = 0;
+
+	vhost_user_iotlb_destroy(dev);
 }
 
 static void
@@ -539,7 +541,6 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 	if (vq != dev->virtqueue[vq->index]) {
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "reallocated virtqueue on node %d\n", node);
 		dev->virtqueue[vq->index] = vq;
-		vhost_user_iotlb_init(dev, vq);
 	}
 
 	if (vq_is_packed(dev)) {
@@ -664,6 +665,8 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 		return;
 	}
 	dev->guest_pages = gp;
+
+	vhost_user_iotlb_init(dev);
 }
 #else
 static void
@@ -1360,8 +1363,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 
 		/* Flush IOTLB cache as previous HVAs are now invalid */
 		if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-			for (i = 0; i < dev->nr_vring; i++)
-				vhost_user_iotlb_flush_all(dev, dev->virtqueue[i]);
+			vhost_user_iotlb_flush_all(dev);
 
 		free_mem_region(dev);
 		rte_free(dev->mem);
@@ -2194,7 +2196,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 	ctx->msg.size = sizeof(ctx->msg.payload.state);
 	ctx->fd_num = 0;
 
-	vhost_user_iotlb_flush_all(dev, vq);
+	vhost_user_iotlb_flush_all(dev);
 
 	vring_invalidate(dev, vq);
 
@@ -2639,15 +2641,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_insert(dev, vq, imsg->iova, vva,
-					len, imsg->perm);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_rwlock_write_lock(&vq->access_lock);
 				translate_ring_addresses(&dev, &vq);
@@ -2657,15 +2658,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_user_iotlb_cache_remove(dev, imsg->iova, imsg->size);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_remove(dev, vq, imsg->iova,
-					imsg->size);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_rwlock_write_lock(&vq->access_lock);
 				vring_invalidate(dev, vq);
@@ -2674,8 +2674,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	default:
-		VHOST_LOG_CONFIG(dev->ifname, ERR,
-			"invalid IOTLB message type (%d)\n",
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "invalid IOTLB message type (%d)\n",
 			imsg->type);
 		return RTE_VHOST_MSG_RESULT_ERR;
 	}
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 06/26] vhost: add offset field to IOTLB entries
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (4 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch is a preliminary work to prepare for VDUSE
support, for which we need to keep track of the mmaped base
address and offset in order to be able to unmap it later
when IOTLB entry is invalidated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c      | 30 ++++++++++++++++++------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost_user.c |  2 +-
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 6d49bf6b30..aa5100e6e7 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -17,6 +17,7 @@ struct vhost_iotlb_entry {
 
 	uint64_t iova;
 	uint64_t uaddr;
+	uint64_t uoffset;
 	uint64_t size;
 	uint8_t perm;
 };
@@ -27,15 +28,18 @@ static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
 		uint64_t align)
 {
-	uint64_t a_end, b_start;
+	uint64_t a_start, a_end, b_start;
 
 	if (a == NULL || b == NULL)
 		return false;
 
+	a_start = a->uaddr + a->uoffset;
+	b_start = b->uaddr + b->uoffset;
+
 	/* Assumes entry a lower than entry b */
-	RTE_ASSERT(a->uaddr < b->uaddr);
-	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+	RTE_ASSERT(a_start < b_start);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b_start, align);
 
 	return a_end > b_start;
 }
@@ -43,11 +47,12 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	uint64_t align;
+	uint64_t align, start;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	start = node->uaddr + node->uoffset;
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
-	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
 }
 
 static void
@@ -56,10 +61,10 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 {
 	uint64_t align, start, end;
 
-	start = node->uaddr;
-	end = node->uaddr + node->size;
+	start = node->uaddr + node->uoffset;
+	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
 	/* Skip first page if shared with previous entry. */
 	if (vhost_user_iotlb_share_page(prev, node, align))
@@ -234,7 +239,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -256,6 +261,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 
 	new_node->iova = iova;
 	new_node->uaddr = uaddr;
+	new_node->uoffset = uoffset;
 	new_node->size = size;
 	new_node->perm = perm;
 
@@ -344,7 +350,7 @@ vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *siz
 
 		offset = iova - node->iova;
 		if (!vva)
-			vva = node->uaddr + offset;
+			vva = node->uaddr + node->uoffset + offset;
 
 		mapped += node->size - offset;
 		iova = node->iova + node->size;
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 3490b9e6be..bee36c5903 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t size, uint8_t perm);
+					uint64_t uoffset, uint64_t size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index f2fe7ebc93..7f88a8754f 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -2641,7 +2641,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 07/26] vhost: add page size info to IOTLB entry
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (5 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

VDUSE will close the file descriptor after having mapped
the shared memory, so it will not be possible to get the
page size afterwards.

This patch adds an new page_shift field to the IOTLB entry,
so that the information will be passed at IOTLB cache
insertion time. The information is stored as a bit shift
value so that IOTLB entry keeps fitting in a single
cacheline.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost.h      |  1 -
 lib/vhost/vhost_user.c |  8 +++++---
 4 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index aa5100e6e7..87986f2489 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
 	uint64_t uaddr;
 	uint64_t uoffset;
 	uint64_t size;
+	uint8_t page_shift;
 	uint8_t perm;
 };
 
 #define IOTLB_CACHE_SIZE 2048
 
 static bool
-vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
-		uint64_t align)
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
 	uint64_t a_start, a_end, b_start;
 
@@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 
 	/* Assumes entry a lower than entry b */
 	RTE_ASSERT(a_start < b_start);
-	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b_start, align);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
+	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
 
 	return a_end > b_start;
 }
 
 static void
-vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
 {
-	uint64_t align, start;
+	uint64_t start;
 
 	start = node->uaddr + node->uoffset;
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
-	mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, true, RTE_BIT64(node->page_shift));
 }
 
 static void
-vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, start, end;
+	uint64_t start, end;
 
 	start = node->uaddr + node->uoffset;
 	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
 	/* Skip first page if shared with previous entry. */
-	if (vhost_user_iotlb_share_page(prev, node, align))
-		start = RTE_ALIGN_CEIL(start, align);
+	if (vhost_user_iotlb_share_page(prev, node))
+		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
 
 	/* Skip last page if shared with next entry. */
-	if (vhost_user_iotlb_share_page(node, next, align))
-		end = RTE_ALIGN_FLOOR(end, align);
+	if (vhost_user_iotlb_share_page(node, next))
+		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
 
 	if (end > start)
-		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
+		mem_set_dump((void *)(uintptr_t)start, end - start, false,
+			RTE_BIT64(node->page_shift));
 }
 
 static struct vhost_iotlb_entry *
@@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 	vhost_user_iotlb_wr_lock_all(dev);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
-		vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
+		vhost_user_iotlb_clear_dump(node, NULL, NULL);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(dev, node);
@@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
@@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t uoffset, uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 	new_node->uaddr = uaddr;
 	new_node->uoffset = uoffset;
 	new_node->size = size;
+	new_node->page_shift = __builtin_ctzll(page_size);
 	new_node->perm = perm;
 
 	vhost_user_iotlb_wr_lock_all(dev);
@@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			vhost_user_iotlb_set_dump(dev, new_node);
+			vhost_user_iotlb_set_dump(new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			dev->iotlb_cache_nr++;
@@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 		}
 	}
 
-	vhost_user_iotlb_set_dump(dev, new_node);
+	vhost_user_iotlb_set_dump(new_node);
 
 	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
 	dev->iotlb_cache_nr++;
@@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 		if (iova < node->iova + node->size) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index bee36c5903..81ca04df21 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t uoffset, uint64_t size, uint8_t perm);
+		uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ee952de175..de84c115b7 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -1032,7 +1032,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
 	return true;
 }
 
-uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
 void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
 
 /* Versioned functions */
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 7f88a8754f..98d8b8ac79 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	return log_gpa;
 }
 
-uint64_t
+static uint64_t
 hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
 {
 	struct rte_vhost_mem_region *r;
@@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
 	uint16_t i;
-	uint64_t vva, len;
+	uint64_t vva, len, pg_sz;
 
 	switch (imsg->type) {
 	case VHOST_IOTLB_UPDATE:
@@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
+		pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
+
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, pg_sz, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 08/26] vhost: retry translating IOVA after IOTLB miss
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (6 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 09/26] vhost: introduce backend ops Maxime Coquelin
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

Vhost-user backend IOTLB misses and updates are
asynchronous, so IOVA address translation function
just fails after having sent an IOTLB miss update if needed
entry was not in the IOTLB cache.

This is not the case for VDUSE, for which the needed IOTLB
update is returned directly when sending an IOTLB miss.

This patch retry again finding the needed entry in the
IOTLB cache after having sent an IOTLB miss.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 3ddd2a963f..7e1af487c1 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -100,6 +100,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_lock(vq);
 	}
 
+	tmp_size = *size;
+	/* Retry in case of VDUSE, as it is synchronous */
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
+	if (tmp_size == *size)
+		return vva;
+
 	return 0;
 }
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 09/26] vhost: introduce backend ops
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (7 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces backend ops struct, that will enable
calling backend specifics callbacks (Vhost-user, VDUSE), in
shared code.

This is an empty shell for now, it will be filled in later
patches.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/socket.c     |  2 +-
 lib/vhost/vhost.c      |  8 +++++++-
 lib/vhost/vhost.h      | 10 +++++++++-
 lib/vhost/vhost_user.c |  8 ++++++++
 lib/vhost/vhost_user.h |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index cb0218b7bc..407d0011c3 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -223,7 +223,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		return;
 	}
 
-	vid = vhost_new_device();
+	vid = vhost_user_new_device();
 	if (vid == -1) {
 		goto err;
 	}
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 7e1af487c1..d054772bf8 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -680,11 +680,16 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(struct vhost_backend_ops *ops)
 {
 	struct virtio_net *dev;
 	int i;
 
+	if (ops == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing backend ops.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
@@ -712,6 +717,7 @@ vhost_new_device(void)
 	dev->backend_req_fd = -1;
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->backend_req_lock);
+	dev->backend_ops = ops;
 
 	return i;
 }
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index de84c115b7..966b6dd67a 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,6 +89,12 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+/**
+ * Structure that contains backend-specific ops.
+ */
+struct vhost_backend_ops {
+};
+
 /**
  * Structure contains buffer address, length and descriptor index
  * from vring to do scatter RX.
@@ -516,6 +522,8 @@ struct virtio_net {
 	void			*extern_data;
 	/* pre and post vhost user message handlers for the device */
 	struct rte_vhost_user_extern_ops extern_ops;
+
+	struct vhost_backend_ops *backend_ops;
 } __rte_cache_aligned;
 
 static inline void
@@ -815,7 +823,7 @@ get_device(int vid)
 	return dev;
 }
 
-int vhost_new_device(void);
+int vhost_new_device(struct vhost_backend_ops *ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 98d8b8ac79..2cd86686a5 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3464,3 +3464,11 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 
 	return ret;
 }
+
+static struct vhost_backend_ops vhost_user_backend_ops;
+
+int
+vhost_user_new_device(void)
+{
+	return vhost_new_device(&vhost_user_backend_ops);
+}
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index a0987a58f9..61456049c8 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -185,5 +185,6 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
 		int *fd_num);
 int send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int vhost_user_new_device(void);
 
 #endif
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 10/26] vhost: add IOTLB cache entry removal callback
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (8 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 09/26] vhost: introduce backend ops Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

VDUSE will need to munmap() the IOTLB entry on removal
from the cache, as it performs mmap() before insertion.

This patch introduces a callback that VDUSE layer will
implement to achieve this.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/iotlb.c | 12 ++++++++++++
 lib/vhost/vhost.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 87986f2489..424121cc00 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct vhost_iotlb_entry *entry)
+{
+	if (dev->backend_ops->iotlb_remove_notify == NULL)
+		return;
+
+	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset, entry->size);
+}
+
 static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
@@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 		vhost_user_iotlb_clear_dump(node, NULL, NULL);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_remove_notify(dev, node);
 		vhost_user_iotlb_pool_put(dev, node);
 	}
 
@@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 			break;
@@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 		} else {
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 966b6dd67a..69df8b14c0 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,10 +89,13 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
+	vhost_iotlb_remove_notify iotlb_remove_notify;
 };
 
 /**
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 11/26] vhost: add helper for IOTLB misses
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (9 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 12/26] vhost: add helper for interrupt injection Maxime Coquelin
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds a helper for sending IOTLB misses as VDUSE
will use an ioctl while Vhost-user use a dedicated
Vhost-user backend request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.c      | 13 ++++++++++++-
 lib/vhost/vhost.h      |  4 ++++
 lib/vhost/vhost_user.c |  6 ++++--
 lib/vhost/vhost_user.h |  1 -
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index d054772bf8..f77f30d674 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -56,6 +56,12 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
 
 #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
 
+static int
+vhost_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+	return dev->backend_ops->iotlb_miss(dev, iova, perm);
+}
+
 uint64_t
 __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		    uint64_t iova, uint64_t *size, uint8_t perm)
@@ -90,7 +96,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_unlock(vq);
 
 		vhost_user_iotlb_pending_insert(dev, iova, perm);
-		if (vhost_user_iotlb_miss(dev, iova, perm)) {
+		if (vhost_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
@@ -690,6 +696,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->iotlb_miss == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IOTLB miss backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 69df8b14c0..25255aaea8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,13 +89,17 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+struct virtio_net;
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
+typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
+	vhost_iotlb_miss_cb iotlb_miss;
 };
 
 /**
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 2cd86686a5..30ad63aba0 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3305,7 +3305,7 @@ vhost_user_msg_handler(int vid, int fd)
 	return ret;
 }
 
-int
+static int
 vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	int ret;
@@ -3465,7 +3465,9 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
-static struct vhost_backend_ops vhost_user_backend_ops;
+static struct vhost_backend_ops vhost_user_backend_ops = {
+	.iotlb_miss = vhost_user_iotlb_miss,
+};
 
 int
 vhost_user_new_device(void)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index 61456049c8..1ffeca92f3 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -179,7 +179,6 @@ struct __rte_packed vhu_msg_context {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
-int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 /* socket.c */
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 12/26] vhost: add helper for interrupt injection
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (10 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 13/26] vhost: add API to set max queue pairs Maxime Coquelin
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

Vhost-user uses eventfd to inject IRQs, but VDUSE uses
an ioctl.

This patch prepares vhost_vring_call_split() and
vhost_vring_call_packed() to support VDUSE by introducing
a new helper.

It also adds a new counter for guest notification failures,
which could happen in case of uninitialized call file
descriptor for example.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vhost.c      | 25 +++++++++++++------------
 lib/vhost/vhost.h      | 19 +++++++++----------
 lib/vhost/vhost_user.c | 10 ++++++++++
 3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index f77f30d674..eb6309b681 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -701,6 +701,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->inject_irq == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IRQ injection backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
@@ -1511,20 +1516,16 @@ rte_vhost_notify_guest(int vid, uint16_t queue_id)
 
 	rte_rwlock_read_lock(&vq->access_lock);
 
-	if (vq->callfd >= 0) {
-		int ret = eventfd_write(vq->callfd, (eventfd_t)1);
-
-		if (ret) {
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				__atomic_fetch_add(&vq->stats.guest_notifications_error,
+	if (dev->backend_ops->inject_irq(dev, vq)) {
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			__atomic_fetch_add(&vq->stats.guest_notifications_error,
 					1, __ATOMIC_RELAXED);
-		} else {
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				__atomic_fetch_add(&vq->stats.guest_notifications,
+	} else {
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			__atomic_fetch_add(&vq->stats.guest_notifications,
 					1, __ATOMIC_RELAXED);
-			if (dev->notify_ops->guest_notified)
-				dev->notify_ops->guest_notified(dev->vid);
-		}
+		if (dev->notify_ops->guest_notified)
+			dev->notify_ops->guest_notified(dev->vid);
 	}
 
 	rte_rwlock_read_unlock(&vq->access_lock);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 25255aaea8..ea2798b0bf 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -90,16 +90,20 @@
 #endif
 
 struct virtio_net;
+struct vhost_virtqueue;
+
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
 typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
+typedef int (*vhost_vring_inject_irq_cb)(struct virtio_net *dev, struct vhost_virtqueue *vq);
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
 	vhost_iotlb_miss_cb iotlb_miss;
+	vhost_vring_inject_irq_cb inject_irq;
 };
 
 /**
@@ -906,8 +910,6 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
 static __rte_always_inline void
 vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
-	int ret;
-
 	if (dev->notify_ops->guest_notify &&
 	    dev->notify_ops->guest_notify(dev->vid, vq->index)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
@@ -916,8 +918,7 @@ vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		return;
 	}
 
-	ret = eventfd_write(vq->callfd, (eventfd_t) 1);
-	if (ret) {
+	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			__atomic_fetch_add(&vq->stats.guest_notifications_error,
 				1, __ATOMIC_RELAXED);
@@ -950,14 +951,12 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			"%s: used_event_idx=%d, old=%d, new=%d\n",
 			__func__, vhost_used_event(vq), old, new);
 
-		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
-					unlikely(!signalled_used_valid)) &&
-				vq->callfd >= 0)
+		if (vhost_need_event(vhost_used_event(vq), new, old) ||
+				unlikely(!signalled_used_valid))
 			vhost_vring_inject_irq(dev, vq);
 	} else {
 		/* Kick the guest if necessary. */
-		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) &&
-				(vq->callfd >= 0))
+		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
 			vhost_vring_inject_irq(dev, vq);
 	}
 }
@@ -1009,7 +1008,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (vhost_need_event(off, new, old))
 		kick = true;
 kick:
-	if (kick && vq->callfd >= 0)
+	if (kick)
 		vhost_vring_inject_irq(dev, vq);
 }
 
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 30ad63aba0..901a80bbaa 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3465,8 +3465,18 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
+static int
+vhost_user_inject_irq(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq)
+{
+	if (vq->callfd < 0)
+		return -1;
+
+	return eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
 static struct vhost_backend_ops vhost_user_backend_ops = {
 	.iotlb_miss = vhost_user_iotlb_miss,
+	.inject_irq = vhost_user_inject_irq,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 13/26] vhost: add API to set max queue pairs
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (11 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 12/26] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 14/26] net/vhost: use " Maxime Coquelin
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch introduces a new rte_vhost_driver_set_max_queues
API as preliminary work for multiqueue support with VDUSE.

Indeed, with VDUSE we need to pre-allocate the vrings at
device creation time, so we need such API not to allocate
the 128 queue pairs supported by the Vhost library.

Calling the API is optional, 128 queue pairs remaining the
default.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |  4 +++
 doc/guides/rel_notes/release_23_07.rst |  5 ++++
 lib/vhost/rte_vhost.h                  | 17 ++++++++++++
 lib/vhost/socket.c                     | 36 ++++++++++++++++++++++++--
 lib/vhost/version.map                  |  1 +
 5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 0f964d7a4a..0c2b4d020a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -339,6 +339,10 @@ The following is an overview of some key Vhost API functions:
   Inject the offloaded interrupt received by the 'guest_notify' callback,
   into the vhost device's queue.
 
+* ``rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)``
+
+  Set the maximum number of queue pairs supported by the device.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 3eed8ac561..7034fb664c 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -62,6 +62,11 @@ New Features
   rte_vhost_notify_guest(), is added to raise the interrupt outside of the
   fast path.
 
+* **Added Vhost API to set maximum queue pairs supported.**
+
+  Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
+  maximum number of supported queue pairs, required for VDUSE support.
+
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 7a10bc36cf..7844c9f142 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -609,6 +609,23 @@ rte_vhost_driver_get_protocol_features(const char *path,
 int
 rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set the maximum number of queue pairs supported by the device.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param max_queue_pairs
+ *  The maximum number of queue pairs
+ * @return
+ *  0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs);
+
 /**
  * Get the feature bits after negotiation
  *
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 407d0011c3..29f7a8cece 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -57,6 +57,8 @@ struct vhost_user_socket {
 
 	uint64_t protocol_features;
 
+	uint32_t max_queue_pairs;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	struct rte_vhost_device_ops const *notify_ops;
@@ -823,7 +825,7 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 
 	vdpa_dev = vsocket->vdpa_dev;
 	if (!vdpa_dev) {
-		*queue_num = VHOST_MAX_QUEUE_PAIRS;
+		*queue_num = vsocket->max_queue_pairs;
 		goto unlock_exit;
 	}
 
@@ -833,7 +835,36 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 		goto unlock_exit;
 	}
 
-	*queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS, vdpa_queue_num);
+	*queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
+
+unlock_exit:
+	pthread_mutex_unlock(&vhost_user.mutex);
+	return ret;
+}
+
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)
+{
+	struct vhost_user_socket *vsocket;
+	int ret = 0;
+
+	VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n", max_queue_pairs);
+
+	if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
+		VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u queue pairs\n",
+				VHOST_MAX_QUEUE_PAIRS);
+		return -1;
+	}
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	if (!vsocket) {
+		VHOST_LOG_CONFIG(path, ERR, "socket file is not registered yet.\n");
+		ret = -1;
+		goto unlock_exit;
+	}
+
+	vsocket->max_queue_pairs = max_queue_pairs;
 
 unlock_exit:
 	pthread_mutex_unlock(&vhost_user.mutex);
@@ -889,6 +920,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		goto out_free;
 	}
 	vsocket->vdpa_dev = NULL;
+	vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
 	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
 	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
 	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 7bcbfd12cf..5051c29022 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -107,6 +107,7 @@ EXPERIMENTAL {
 
         # added in 23.07
 	rte_vhost_notify_guest;
+	rte_vhost_driver_set_max_queue_num;
 };
 
 INTERNAL {
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 14/26] net/vhost: use API to set max queue pairs
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (12 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 13/26] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 15/26] vhost: add control virtqueue support Maxime Coquelin
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

In order to support multiqueue with VDUSE, we need to
be able to limit the maximum number of queue pairs, to
avoid unnecessary memory consumption since the maximum
number of queue pairs need to be allocated at device
creation time, as opposed to Vhost-user which allocate
only when the frontend initialize them.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 62ef955ebc..8d37ec9775 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1013,6 +1013,9 @@ vhost_driver_setup(struct rte_eth_dev *eth_dev)
 			goto drv_unreg;
 	}
 
+	if (rte_vhost_driver_set_max_queue_num(internal->iface_name, internal->max_queues))
+		goto drv_unreg;
+
 	if (rte_vhost_driver_callback_register(internal->iface_name,
 					       &vhost_ops) < 0) {
 		VHOST_LOG(ERR, "Can't register callbacks\n");
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 15/26] vhost: add control virtqueue support
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (13 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 14/26] net/vhost: use " Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

In order to support multi-queue with VDUSE, having
control queue support is required.

This patch adds control queue implementation, it will be
used later when adding VDUSE support. Only split ring
layout is supported for now, packed ring support will be
added later.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/meson.build       |   1 +
 lib/vhost/vhost.h           |   2 +
 lib/vhost/virtio_net_ctrl.c | 286 ++++++++++++++++++++++++++++++++++++
 lib/vhost/virtio_net_ctrl.h |  10 ++
 4 files changed, 299 insertions(+)
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index 05679447db..d211a0bd37 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -27,6 +27,7 @@ sources = files(
         'vhost_crypto.c',
         'vhost_user.c',
         'virtio_net.c',
+        'virtio_net_ctrl.c',
 )
 headers = files(
         'rte_vdpa.h',
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ea2798b0bf..04267a3ac5 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -527,6 +527,8 @@ struct virtio_net {
 	int			postcopy_ufd;
 	int			postcopy_listening;
 
+	struct vhost_virtqueue	*cvq;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	/* context data for the external message handlers */
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
new file mode 100644
index 0000000000..6b583a0ce6
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -0,0 +1,286 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "iotlb.h"
+#include "vhost.h"
+#include "virtio_net_ctrl.h"
+
+struct virtio_net_ctrl {
+	uint8_t class;
+	uint8_t command;
+	uint8_t command_data[];
+};
+
+struct virtio_net_ctrl_elem {
+	struct virtio_net_ctrl *ctrl_req;
+	uint16_t head_idx;
+	uint16_t n_descs;
+	uint8_t *desc_ack;
+};
+
+static int
+virtio_net_ctrl_pop(struct virtio_net *dev, struct vhost_virtqueue *cvq,
+		struct virtio_net_ctrl_elem *ctrl_elem)
+	__rte_shared_locks_required(&cvq->iotlb_lock)
+{
+	uint16_t avail_idx, desc_idx, n_descs = 0;
+	uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
+	uint8_t *ctrl_req;
+	struct vring_desc *descs;
+
+	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	if (avail_idx == cvq->last_avail_idx) {
+		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
+		return 0;
+	}
+
+	desc_idx = cvq->avail->ring[cvq->last_avail_idx];
+	if (desc_idx >= cvq->size) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index, dropping\n");
+		goto err;
+	}
+
+	ctrl_elem->head_idx = desc_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (1) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		n_descs++;
+
+		if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			if (desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Invalid ack size for ctrl req, dropping\n");
+				goto err;
+			}
+
+			ctrl_elem->desc_ack = (uint8_t *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_WO);
+			if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Failed to map ctrl ack descriptor\n");
+				goto err;
+			}
+		} else {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			data_len += desc_len;
+		}
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	desc_idx = ctrl_elem->head_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
+		ctrl_elem->n_descs = 1;
+	else
+		ctrl_elem->n_descs = n_descs;
+
+	if (!ctrl_elem->desc_ack) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack descriptor\n");
+		goto err;
+	}
+
+	if (data_len < sizeof(ctrl_elem->ctrl_req->class) + sizeof(ctrl_elem->ctrl_req->command)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header size\n");
+		goto err;
+	}
+
+	ctrl_elem->ctrl_req = malloc(data_len);
+	if (!ctrl_elem->ctrl_req) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl request\n");
+		goto err;
+	}
+
+	ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto free_err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!desc_addr || desc_len < descs[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl descriptor\n");
+			goto free_err;
+		}
+
+		memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
+		ctrl_req += desc_len;
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return 1;
+
+free_err:
+	free(ctrl_elem->ctrl_req);
+err:
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return -1;
+}
+
+static uint8_t
+virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl *ctrl_req)
+{
+	uint8_t ret = VIRTIO_NET_ERR;
+
+	if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
+			ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+		uint16_t queue_pairs;
+		uint32_t i;
+
+		queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue pairs\n", queue_pairs);
+		ret = VIRTIO_NET_OK;
+
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+			bool enable;
+
+			if (vq == dev->cvq)
+				continue;
+
+			if (i < queue_pairs * 2)
+				enable = true;
+			else
+				enable = false;
+
+			vq->enabled = enable;
+			if (dev->notify_ops->vring_state_changed)
+				dev->notify_ops->vring_state_changed(dev->vid, i, enable);
+		}
+	}
+
+	return ret;
+}
+
+static int
+virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem *ctrl_elem)
+{
+	struct vhost_virtqueue *cvq = dev->cvq;
+	struct vring_used_elem *used_elem;
+
+	used_elem = &cvq->used->ring[cvq->last_used_idx];
+	used_elem->id = ctrl_elem->head_idx;
+	used_elem->len = ctrl_elem->n_descs;
+
+	cvq->last_used_idx++;
+	if (cvq->last_used_idx >= cvq->size)
+		cvq->last_used_idx -= cvq->size;
+
+	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+
+	vhost_vring_call_split(dev, dev->cvq);
+
+	free(ctrl_elem->ctrl_req);
+
+	return 0;
+}
+
+int
+virtio_net_ctrl_handle(struct virtio_net *dev)
+{
+	int ret = 0;
+
+	if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported yet\n");
+		return -1;
+	}
+
+	if (!dev->cvq) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
+		return -1;
+	}
+
+	rte_rwlock_read_lock(&dev->cvq->access_lock);
+	vhost_user_iotlb_rd_lock(dev->cvq);
+
+	while (1) {
+		struct virtio_net_ctrl_elem ctrl_elem;
+
+		memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
+
+		ret = virtio_net_ctrl_pop(dev, dev->cvq, &ctrl_elem);
+		if (ret <= 0)
+			break;
+
+		*ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev, ctrl_elem.ctrl_req);
+
+		ret = virtio_net_ctrl_push(dev, &ctrl_elem);
+		if (ret < 0)
+			break;
+	}
+
+	vhost_user_iotlb_rd_unlock(dev->cvq);
+	rte_rwlock_read_unlock(&dev->cvq->access_lock);
+
+	return ret;
+}
diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
new file mode 100644
index 0000000000..9a90f4b9da
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VIRTIO_NET_CTRL_H
+#define _VIRTIO_NET_CTRL_H
+
+int virtio_net_ctrl_handle(struct virtio_net *dev);
+
+#endif
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 16/26] vhost: add VDUSE device creation and destruction
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (14 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 15/26] vhost: add control virtqueue support Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds initial support for VDUSE, which includes
the device creation and destruction.

It does not include the virtqueues configuration, so this is
not functionnal at this point.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/meson.build |   4 +
 lib/vhost/socket.c    |  34 ++++---
 lib/vhost/vduse.c     | 201 ++++++++++++++++++++++++++++++++++++++++++
 lib/vhost/vduse.h     |  33 +++++++
 lib/vhost/vhost.h     |   2 +
 5 files changed, 262 insertions(+), 12 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index d211a0bd37..13e0382c92 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -29,6 +29,10 @@ sources = files(
         'virtio_net.c',
         'virtio_net_ctrl.c',
 )
+if cc.has_header('linux/vduse.h')
+    sources += files('vduse.c')
+    cflags += '-DVHOST_HAS_VDUSE'
+endif
 headers = files(
         'rte_vdpa.h',
         'rte_vhost.h',
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 29f7a8cece..19a7469e45 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -19,6 +19,7 @@
 #include <rte_log.h>
 
 #include "fd_man.h"
+#include "vduse.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
@@ -36,6 +37,7 @@ struct vhost_user_socket {
 	int socket_fd;
 	struct sockaddr_un un;
 	bool is_server;
+	bool is_vduse;
 	bool reconnect;
 	bool iommu_support;
 	bool use_builtin_virtio_net;
@@ -991,18 +993,21 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 #endif
 	}
 
-	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
-		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-		if (vsocket->reconnect && reconn_tid == 0) {
-			if (vhost_user_reconnect_init() != 0)
-				goto out_mutex;
-		}
+	if (!strncmp("/dev/vduse/", path, strlen("/dev/vduse/"))) {
+		vsocket->is_vduse = true;
 	} else {
-		vsocket->is_server = true;
-	}
-	ret = create_unix_socket(vsocket);
-	if (ret < 0) {
-		goto out_mutex;
+		if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
+			vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
+			if (vsocket->reconnect && reconn_tid == 0) {
+				if (vhost_user_reconnect_init() != 0)
+					goto out_mutex;
+			}
+		} else {
+			vsocket->is_server = true;
+		}
+		ret = create_unix_socket(vsocket);
+		if (ret < 0)
+			goto out_mutex;
 	}
 
 	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -1067,7 +1072,9 @@ rte_vhost_driver_unregister(const char *path)
 		if (strcmp(vsocket->path, path))
 			continue;
 
-		if (vsocket->is_server) {
+		if (vsocket->is_vduse) {
+			vduse_device_destroy(path);
+		} else if (vsocket->is_server) {
 			/*
 			 * If r/wcb is executing, release vhost_user's
 			 * mutex lock, and try again since the r/wcb
@@ -1218,6 +1225,9 @@ rte_vhost_driver_start(const char *path)
 	if (!vsocket)
 		return -1;
 
+	if (vsocket->is_vduse)
+		return vduse_device_create(path);
+
 	if (fdset_tid == 0) {
 		/**
 		 * create a pipe which will be waited by poll and notified to
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
new file mode 100644
index 0000000000..d67818bfb5
--- /dev/null
+++ b/lib/vhost/vduse.c
@@ -0,0 +1,201 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+
+#include <linux/vduse.h>
+#include <linux/virtio_net.h>
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <rte_common.h>
+
+#include "vduse.h"
+#include "vhost.h"
+
+#define VHOST_VDUSE_API_VERSION 0
+#define VDUSE_CTRL_PATH "/dev/vduse/control"
+
+#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
+				(1ULL << VIRTIO_F_ANY_LAYOUT) | \
+				(1ULL << VIRTIO_F_VERSION_1)   | \
+				(1ULL << VIRTIO_NET_F_GSO) | \
+				(1ULL << VIRTIO_NET_F_HOST_TSO4) | \
+				(1ULL << VIRTIO_NET_F_HOST_TSO6) | \
+				(1ULL << VIRTIO_NET_F_HOST_UFO) | \
+				(1ULL << VIRTIO_NET_F_HOST_ECN) | \
+				(1ULL << VIRTIO_NET_F_CSUM)    | \
+				(1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
+				(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
+				(1ULL << VIRTIO_NET_F_GUEST_UFO) | \
+				(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
+				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
+				(1ULL << VIRTIO_F_IN_ORDER) | \
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+static struct vhost_backend_ops vduse_backend_ops = {
+};
+
+int
+vduse_device_create(const char *path)
+{
+	int control_fd, dev_fd, vid, ret;
+	uint32_t i;
+	struct virtio_net *dev;
+	uint64_t ver = VHOST_VDUSE_API_VERSION;
+	struct vduse_dev_config *dev_config = NULL;
+	const char *name = path + strlen("/dev/vduse/");
+
+	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
+	if (control_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
+				VDUSE_CTRL_PATH, strerror(errno));
+		return -1;
+	}
+
+	if (ioctl(control_fd, VDUSE_SET_API_VERSION, &ver)) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to set API version: %" PRIu64 ": %s\n",
+				ver, strerror(errno));
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	if (!dev_config) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	memset(dev_config, 0, sizeof(struct vduse_dev_config));
+
+	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
+	dev_config->device_id = VIRTIO_ID_NET;
+	dev_config->vendor_id = 0;
+	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
+	dev_config->vq_num = 2;
+	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
+	dev_config->config_size = 0;
+
+	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create VDUSE device: %s\n",
+				strerror(errno));
+		goto out_free;
+	}
+
+	dev_fd = open(path, O_RDWR);
+	if (dev_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open device %s: %s\n",
+				path, strerror(errno));
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	ret = fcntl(dev_fd, F_SETFL, O_NONBLOCK);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to set chardev as non-blocking: %s\n",
+				strerror(errno));
+		goto out_dev_close;
+	}
+
+	vid = vhost_new_device(&vduse_backend_ops);
+	if (vid < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create new Vhost device\n");
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	dev = get_device(vid);
+	if (!dev) {
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	strncpy(dev->ifname, path, IF_NAME_SZ - 1);
+	dev->vduse_ctrl_fd = control_fd;
+	dev->vduse_dev_fd = dev_fd;
+	vhost_setup_virtio_net(dev->vid, true, true, true, true);
+
+	for (i = 0; i < 2; i++) {
+		struct vduse_vq_config vq_cfg = { 0 };
+
+		ret = alloc_vring_queue(dev, i);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to alloc vring %d metadata\n", i);
+			goto out_dev_destroy;
+		}
+
+		vq_cfg.index = i;
+		vq_cfg.max_size = 1024;
+
+		ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP, &vq_cfg);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to set-up VQ %d\n", i);
+			goto out_dev_destroy;
+		}
+	}
+
+	free(dev_config);
+
+	return 0;
+
+out_dev_destroy:
+	vhost_destroy_device(vid);
+out_dev_close:
+	if (dev_fd >= 0)
+		close(dev_fd);
+	ioctl(control_fd, VDUSE_DESTROY_DEV, name);
+out_free:
+	free(dev_config);
+out_ctrl_close:
+	close(control_fd);
+
+	return ret;
+}
+
+int
+vduse_device_destroy(const char *path)
+{
+	const char *name = path + strlen("/dev/vduse/");
+	struct virtio_net *dev;
+	int vid, ret;
+
+	for (vid = 0; vid < RTE_MAX_VHOST_DEVICE; vid++) {
+		dev = vhost_devices[vid];
+
+		if (dev == NULL)
+			continue;
+
+		if (!strcmp(path, dev->ifname))
+			break;
+	}
+
+	if (vid == RTE_MAX_VHOST_DEVICE)
+		return -1;
+
+	if (dev->vduse_dev_fd >= 0) {
+		close(dev->vduse_dev_fd);
+		dev->vduse_dev_fd = -1;
+	}
+
+	if (dev->vduse_ctrl_fd >= 0) {
+		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
+		if (ret)
+			VHOST_LOG_CONFIG(name, ERR, "Failed to destroy VDUSE device: %s\n",
+					strerror(errno));
+		close(dev->vduse_ctrl_fd);
+		dev->vduse_ctrl_fd = -1;
+	}
+
+	vhost_destroy_device(vid);
+
+	return 0;
+}
diff --git a/lib/vhost/vduse.h b/lib/vhost/vduse.h
new file mode 100644
index 0000000000..a15e5d4c16
--- /dev/null
+++ b/lib/vhost/vduse.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VDUSE_H
+#define _VDUSE_H
+
+#include "vhost.h"
+
+#ifdef VHOST_HAS_VDUSE
+
+int vduse_device_create(const char *path);
+int vduse_device_destroy(const char *path);
+
+#else
+
+static inline int
+vduse_device_create(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+static inline int
+vduse_device_destroy(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+#endif /* VHOST_HAS_VDUSE */
+
+#endif /* _VDUSE_H */
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 04267a3ac5..9ecede0f30 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -526,6 +526,8 @@ struct virtio_net {
 
 	int			postcopy_ufd;
 	int			postcopy_listening;
+	int			vduse_ctrl_fd;
+	int			vduse_dev_fd;
 
 	struct vhost_virtqueue	*cvq;
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 17/26] vhost: add VDUSE callback for IOTLB miss
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (15 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB misses,
which is done by using the VDUSE VDUSE_IOTLB_GET_FD ioctl.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d67818bfb5..f72c7bf6ab 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -13,9 +13,11 @@
 
 #include <sys/ioctl.h>
 #include <sys/mman.h>
+#include <sys/stat.h>
 
 #include <rte_common.h>
 
+#include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
 
@@ -40,7 +42,63 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
+{
+	struct vduse_iotlb_entry entry;
+	uint64_t size, page_size;
+	struct stat stat;
+	void *mmap_addr;
+	int fd, ret;
+
+	entry.start = iova;
+	entry.last = iova + 1;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_IOTLB_GET_FD, &entry);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get IOTLB entry for 0x%" PRIx64 "\n",
+				iova);
+		return -1;
+	}
+
+	fd = ret;
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "New IOTLB entry:\n");
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tIOVA: %" PRIx64 " - %" PRIx64 "\n",
+			(uint64_t)entry.start, (uint64_t)entry.last);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\toffset: %" PRIx64 "\n", (uint64_t)entry.offset);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tfd: %d\n", fd);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tperm: %x\n", entry.perm);
+
+	size = entry.last - entry.start + 1;
+	mmap_addr = mmap(0, size + entry.offset, entry.perm, MAP_SHARED, fd, 0);
+	if (!mmap_addr) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to mmap IOTLB entry for 0x%" PRIx64 "\n", iova);
+		ret = -1;
+		goto close_fd;
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get page size.\n");
+		munmap(mmap_addr, entry.offset + size);
+		goto close_fd;
+	}
+	page_size = (uint64_t)stat.st_blksize;
+
+	vhost_user_iotlb_cache_insert(dev, entry.start, (uint64_t)(uintptr_t)mmap_addr,
+		entry.offset, size, page_size, entry.perm);
+
+	ret = 0;
+close_fd:
+	close(fd);
+
+	return ret;
+}
+
 static struct vhost_backend_ops vduse_backend_ops = {
+	.iotlb_miss = vduse_iotlb_miss,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 18/26] vhost: add VDUSE callback for IOTLB entry removal
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (16 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB entry
removal, where it unmaps the pages from the invalidated
IOTLB entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index f72c7bf6ab..58c1b384a8 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static void
+vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
+{
+	munmap((void *)(uintptr_t)addr, offset + size);
+}
+
 static int
 vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
 {
@@ -99,6 +105,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
+	.iotlb_remove_notify = vduse_iotlb_remove_notify,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 19/26] vhost: add VDUSE callback for IRQ injection
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (17 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 20/26] vhost: add VDUSE events handler Maxime Coquelin
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for kicking
virtqueues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 58c1b384a8..d39e39b9dc 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	return ioctl(dev->vduse_dev_fd, VDUSE_VQ_INJECT_IRQ, &vq->index);
+}
+
 static void
 vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
 {
@@ -106,6 +112,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
 	.iotlb_remove_notify = vduse_iotlb_remove_notify,
+	.inject_irq = vduse_inject_irq,
 };
 
 int
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 20/26] vhost: add VDUSE events handler
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (18 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch makes use of Vhost lib's FD manager to install
a handler for VDUSE events occurring on the VDUSE device FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d39e39b9dc..92c515cff2 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -17,6 +17,7 @@
 
 #include <rte_common.h>
 
+#include "fd_man.h"
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
@@ -42,6 +43,31 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+struct vduse {
+	struct fdset fdset;
+};
+
+static struct vduse vduse = {
+	.fdset = {
+		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.num = 0
+	},
+};
+
+static bool vduse_events_thread;
+
+static const char * const vduse_reqs_str[] = {
+	"VDUSE_GET_VQ_STATE",
+	"VDUSE_SET_STATUS",
+	"VDUSE_UPDATE_IOTLB",
+};
+
+#define vduse_req_id_to_str(id) \
+	(id < RTE_DIM(vduse_reqs_str) ? \
+	vduse_reqs_str[id] : "Unknown")
+
 static int
 vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -115,16 +141,80 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	struct vduse_dev_request req;
+	struct vduse_dev_response resp;
+	int ret;
+
+	memset(&resp, 0, sizeof(resp));
+
+	ret = read(fd, &req, sizeof(req));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read request: %s\n",
+				strerror(errno));
+		return;
+	} else if (ret < (int)sizeof(req)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Incomplete to read request %d\n", ret);
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "New request: %s (%u)\n",
+			vduse_req_id_to_str(req.type), req.type);
+
+	switch (req.type) {
+	default:
+		resp.result = VDUSE_REQ_RESULT_FAILED;
+		break;
+	}
+
+	resp.request_id = req.request_id;
+
+	ret = write(dev->vduse_dev_fd, &resp, sizeof(resp));
+	if (ret != sizeof(resp)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to write response %s\n",
+				strerror(errno));
+		return;
+	}
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled successfully\n",
+			vduse_req_id_to_str(req.type), req.type);
+}
+
 int
 vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
+	pthread_t fdset_tid;
 	uint32_t i;
 	struct virtio_net *dev;
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
 
+	/* If first device, create events dispatcher thread */
+	if (vduse_events_thread == false) {
+		/**
+		 * create a pipe which will be waited by poll and notified to
+		 * rebuild the wait list of poll.
+		 */
+		if (fdset_pipe_init(&vduse.fdset) < 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create pipe for vduse fdset\n");
+			return -1;
+		}
+
+		ret = rte_ctrl_thread_create(&fdset_tid, "vduse-events", NULL,
+				fdset_event_dispatch, &vduse.fdset);
+		if (ret != 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create vduse fdset handling thread\n");
+			fdset_pipe_uninit(&vduse.fdset);
+			return -1;
+		}
+
+		vduse_events_thread = true;
+	}
+
 	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
 	if (control_fd < 0) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
@@ -215,6 +305,14 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
+	if (ret) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
+				dev->vduse_dev_fd);
+		goto out_dev_destroy;
+	}
+	fdset_pipe_notify(&vduse.fdset);
+
 	free(dev_config);
 
 	return 0;
@@ -253,6 +351,9 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
+	fdset_pipe_notify(&vduse.fdset);
+
 	if (dev->vduse_dev_fd >= 0) {
 		close(dev->vduse_dev_fd);
 		dev->vduse_dev_fd = -1;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 21/26] vhost: add support for virtqueue state get event
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (19 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 20/26] vhost: add VDUSE events handler Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_GET_VQ_STATE event
handling, which consists in providing the backend last
available index for the specified virtqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 92c515cff2..7e36c50b6c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -147,6 +147,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	struct virtio_net *dev = arg;
 	struct vduse_dev_request req;
 	struct vduse_dev_response resp;
+	struct vhost_virtqueue *vq;
 	int ret;
 
 	memset(&resp, 0, sizeof(resp));
@@ -165,6 +166,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 			vduse_req_id_to_str(req.type), req.type);
 
 	switch (req.type) {
+	case VDUSE_GET_VQ_STATE:
+		vq = dev->virtqueue[req.vq_state.index];
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tvq index: %u, avail_index: %u\n",
+				req.vq_state.index, vq->last_avail_idx);
+		resp.vq_state.split.avail_index = vq->last_avail_idx;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 22/26] vhost: add support for VDUSE status set event
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (20 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_SET_STATUS event
handling, which consists in updating the Virtio device
status set by the Virtio driver.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 7e36c50b6c..3bf65d4b8b 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -173,6 +173,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		resp.vq_state.split.avail_index = vq->last_avail_idx;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_SET_STATUS:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
+				req.s.status);
+		dev->status = req.s.status;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 23/26] vhost: add support for VDUSE IOTLB update event
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (21 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 24/26] vhost: add VDUSE device startup Maxime Coquelin
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds support for VDUSE_UPDATE_IOTLB event
handling, which consists in invaliding IOTLB entries for
the range specified in the request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 3bf65d4b8b..110654ec68 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -179,6 +179,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		dev->status = req.s.status;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_UPDATE_IOTLB:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 " - %" PRIx64 "\n",
+				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
+		vhost_user_iotlb_cache_remove(dev, req.iova.start,
+				req.iova.last - req.iova.start + 1);
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 24/26] vhost: add VDUSE device startup
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (22 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-08  2:10   ` Xia, Chenbo
  2023-06-06  8:18 ` [PATCH v5 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
                   ` (4 subsequent siblings)
  28 siblings, 1 reply; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds the device and its virtqueues
initialization once the Virtio driver has set the DRIVER_OK
in the Virtio status register.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 110654ec68..ff4c54c321 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -141,6 +141,128 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_vring_setup(struct virtio_net *dev, unsigned int index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[index];
+	struct vhost_vring_addr *ra = &vq->ring_addrs;
+	struct vduse_vq_info vq_info;
+	struct vduse_vq_eventfd vq_efd;
+	int ret;
+
+	vq_info.index = index;
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u info: %s\n",
+				index, strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n", vq_info.desc_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n", vq_info.driver_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n", vq_info.device_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n", vq_info.split.avail_index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
+
+	vq->last_avail_idx = vq_info.split.avail_index;
+	vq->size = vq_info.num;
+	vq->ready = true;
+	vq->enabled = vq_info.ready;
+	ra->desc_user_addr = vq_info.desc_addr;
+	ra->avail_user_addr = vq_info.driver_addr;
+	ra->used_user_addr = vq_info.device_addr;
+
+	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+	if (vq->kickfd < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		vq->kickfd = VIRTIO_INVALID_EVENTFD;
+		return;
+	}
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tkick fd: %d\n", vq->kickfd);
+
+	vq->shadow_used_split = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct vring_used_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+	vq->batch_copy_elems = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct batch_copy_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+
+	vhost_user_iotlb_rd_lock(vq);
+	if (vring_translate(dev, vq))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate vring %d addresses\n",
+				index);
+
+	if (vhost_enable_guest_notification(dev, vq, 0))
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to disable guest notifications on vring %d\n",
+				index);
+	vhost_user_iotlb_rd_unlock(vq);
+
+	vq_efd.index = index;
+	vq_efd.fd = vq->kickfd;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		close(vq->kickfd);
+		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		return;
+	}
+}
+
+static void
+vduse_device_start(struct virtio_net *dev)
+{
+	unsigned int i, ret;
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Starting device...\n");
+
+	dev->notify_ops = vhost_driver_callback_get(dev->ifname);
+	if (!dev->notify_ops) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to get callback ops for driver\n");
+		return;
+	}
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev->features);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get features: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Negotiated Virtio features: 0x%" PRIx64 "\n",
+		dev->features);
+
+	if (dev->features &
+		((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+		 (1ULL << VIRTIO_F_VERSION_1) |
+		 (1ULL << VIRTIO_F_RING_PACKED))) {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	} else {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->nr_vring; i++)
+		vduse_vring_setup(dev, i);
+
+	dev->flags |= VIRTIO_DEV_READY;
+
+	if (dev->notify_ops->new_device(dev->vid) == 0)
+		dev->flags |= VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (dev->notify_ops->vring_state_changed)
+			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
+	}
+}
+
 static void
 vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 {
@@ -199,6 +321,10 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 				strerror(errno));
 		return;
 	}
+
+	if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+		vduse_device_start(dev);
+
 	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled successfully\n",
 			vduse_req_id_to_str(req.type), req.type);
 }
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 25/26] vhost: add multiqueue support to VDUSE
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (23 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 24/26] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-06  8:18 ` [PATCH v5 26/26] vhost: add VDUSE device stop Maxime Coquelin
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch enables control queue support in order to
support multiqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/vhost/vduse.c | 83 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 76 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index ff4c54c321..d3759077ff 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -21,6 +21,7 @@
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
+#include "virtio_net_ctrl.h"
 
 #define VHOST_VDUSE_API_VERSION 0
 #define VDUSE_CTRL_PATH "/dev/vduse/control"
@@ -41,7 +42,9 @@
 				(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
 				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
 				(1ULL << VIRTIO_F_IN_ORDER) | \
-				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
+				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+				(1ULL << VIRTIO_NET_F_MQ))
 
 struct vduse {
 	struct fdset fdset;
@@ -141,6 +144,25 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_control_queue_event(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	uint64_t buf;
+	int ret;
+
+	ret = read(fd, &buf, sizeof(buf));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read control queue event: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue kicked\n");
+	if (virtio_net_ctrl_handle(dev))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to handle ctrl request\n");
+}
+
 static void
 vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 {
@@ -212,6 +234,22 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 		return;
 	}
+
+	if (vq == dev->cvq) {
+		ret = fdset_add(&vduse.fdset, vq->kickfd, vduse_control_queue_event, NULL, dev);
+		if (ret) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR,
+					"Failed to setup kickfd handler for VQ %u: %s\n",
+					index, strerror(errno));
+			vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+			ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+			close(vq->kickfd);
+			vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		}
+		fdset_pipe_notify(&vduse.fdset);
+		vhost_enable_guest_notification(dev, vq, 1);
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl queue event handler installed\n");
+	}
 }
 
 static void
@@ -258,6 +296,9 @@ vduse_device_start(struct virtio_net *dev)
 	for (i = 0; i < dev->nr_vring; i++) {
 		struct vhost_virtqueue *vq = dev->virtqueue[i];
 
+		if (vq == dev->cvq)
+			continue;
+
 		if (dev->notify_ops->vring_state_changed)
 			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
 	}
@@ -334,9 +375,11 @@ vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
 	pthread_t fdset_tid;
-	uint32_t i;
+	uint32_t i, max_queue_pairs, total_queues;
 	struct virtio_net *dev;
+	struct virtio_net_config vnet_config = { 0 };
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
+	uint64_t features = VDUSE_NET_SUPPORTED_FEATURES;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
 
@@ -376,22 +419,39 @@ vduse_device_create(const char *path)
 		goto out_ctrl_close;
 	}
 
-	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	dev_config = malloc(offsetof(struct vduse_dev_config, config) +
+			sizeof(vnet_config));
 	if (!dev_config) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
 		ret = -1;
 		goto out_ctrl_close;
 	}
 
+	ret = rte_vhost_driver_get_queue_num(path, &max_queue_pairs);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to get max queue pairs\n");
+		goto out_free;
+	}
+
+	VHOST_LOG_CONFIG(path, INFO, "VDUSE max queue pairs: %u\n", max_queue_pairs);
+	total_queues = max_queue_pairs * 2;
+
+	if (max_queue_pairs == 1)
+		features &= ~(RTE_BIT64(VIRTIO_NET_F_CTRL_VQ) | RTE_BIT64(VIRTIO_NET_F_MQ));
+	else
+		total_queues += 1; /* Includes ctrl queue */
+
+	vnet_config.max_virtqueue_pairs = max_queue_pairs;
 	memset(dev_config, 0, sizeof(struct vduse_dev_config));
 
 	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
 	dev_config->device_id = VIRTIO_ID_NET;
 	dev_config->vendor_id = 0;
-	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
-	dev_config->vq_num = 2;
+	dev_config->features = features;
+	dev_config->vq_num = total_queues;
 	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
-	dev_config->config_size = 0;
+	dev_config->config_size = sizeof(struct virtio_net_config);
+	memcpy(dev_config->config, &vnet_config, sizeof(vnet_config));
 
 	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
 	if (ret < 0) {
@@ -433,7 +493,7 @@ vduse_device_create(const char *path)
 	dev->vduse_dev_fd = dev_fd;
 	vhost_setup_virtio_net(dev->vid, true, true, true, true);
 
-	for (i = 0; i < 2; i++) {
+	for (i = 0; i < total_queues; i++) {
 		struct vduse_vq_config vq_cfg = { 0 };
 
 		ret = alloc_vring_queue(dev, i);
@@ -452,6 +512,8 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	dev->cvq = dev->virtqueue[max_queue_pairs * 2];
+
 	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
 	if (ret) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
@@ -498,6 +560,13 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	if (dev->cvq && dev->cvq->kickfd >= 0) {
+		fdset_del(&vduse.fdset, dev->cvq->kickfd);
+		fdset_pipe_notify(&vduse.fdset);
+		close(dev->cvq->kickfd);
+		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+	}
+
 	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
 	fdset_pipe_notify(&vduse.fdset);
 
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5 26/26] vhost: add VDUSE device stop
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (24 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
@ 2023-06-06  8:18 ` Maxime Coquelin
  2023-06-08  2:11   ` Xia, Chenbo
  2023-06-07  6:48 ` [PATCH v5 00/26] Add VDUSE support to Vhost library Xia, Chenbo
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-06  8:18 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
  Cc: Maxime Coquelin

This patch adds VDUSE device stop and cleanup of its
virtqueues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_23_07.rst |  7 +++
 lib/vhost/vduse.c                      | 72 +++++++++++++++++++++++---
 2 files changed, 71 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 7034fb664c..0f44c859da 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -67,6 +67,13 @@ New Features
   Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
   maximum number of supported queue pairs, required for VDUSE support.
 
+* **Added VDUSE support into Vhost library.**
+
+  VDUSE aims at implementing vDPA devices in userspace. It can be used as an
+  alternative to Vhost-user when using Vhost-vDPA, but also enable providing a
+  virtio-net netdev to the host when using Virtio-vDPA driver. A limitation in
+  this release is the lack of reconnection support.
+
 
 Removed Items
 -------------
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d3759077ff..a509daf80c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 	}
 }
 
+static void
+vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[index];
+	struct vduse_vq_eventfd vq_efd;
+	int ret;
+
+	if (vq == dev->cvq && vq->kickfd >= 0) {
+		fdset_del(&vduse.fdset, vq->kickfd);
+		fdset_pipe_notify(&vduse.fdset);
+	}
+
+	vq_efd.index = index;
+	vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+	if (ret)
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+
+	close(vq->kickfd);
+	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+	vring_invalidate(dev, vq);
+
+	rte_free(vq->batch_copy_elems);
+	vq->batch_copy_elems = NULL;
+
+	rte_free(vq->shadow_used_split);
+	vq->shadow_used_split = NULL;
+
+	vq->enabled = false;
+	vq->ready = false;
+	vq->size = 0;
+	vq->last_used_idx = 0;
+	vq->last_avail_idx = 0;
+}
+
 static void
 vduse_device_start(struct virtio_net *dev)
 {
@@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
 	}
 }
 
+static void
+vduse_device_stop(struct virtio_net *dev)
+{
+	unsigned int i;
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
+
+	vhost_destroy_device_notify(dev);
+
+	dev->flags &= ~VIRTIO_DEV_READY;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		vduse_vring_cleanup(dev, i);
+
+	vhost_user_iotlb_flush_all(dev);
+}
+
 static void
 vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 {
@@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	struct vduse_dev_request req;
 	struct vduse_dev_response resp;
 	struct vhost_virtqueue *vq;
+	uint8_t old_status = dev->status;
 	int ret;
 
 	memset(&resp, 0, sizeof(resp));
@@ -339,6 +395,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	case VDUSE_SET_STATUS:
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
 				req.s.status);
+		old_status = dev->status;
 		dev->status = req.s.status;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
@@ -363,8 +420,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		return;
 	}
 
-	if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
-		vduse_device_start(dev);
+	if ((old_status ^ dev->status) & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
+		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+			vduse_device_start(dev);
+		else
+			vduse_device_stop(dev);
+	}
 
 	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled successfully\n",
 			vduse_req_id_to_str(req.type), req.type);
@@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
-	if (dev->cvq && dev->cvq->kickfd >= 0) {
-		fdset_del(&vduse.fdset, dev->cvq->kickfd);
-		fdset_pipe_notify(&vduse.fdset);
-		close(dev->cvq->kickfd);
-		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
-	}
+	vduse_device_stop(dev);
 
 	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
 	fdset_pipe_notify(&vduse.fdset);
-- 
2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (25 preceding siblings ...)
  2023-06-06  8:18 ` [PATCH v5 26/26] vhost: add VDUSE device stop Maxime Coquelin
@ 2023-06-07  6:48 ` Xia, Chenbo
  2023-06-07 14:58   ` Maxime Coquelin
  2023-06-07  8:05 ` David Marchand
  2023-06-08 14:29 ` Maxime Coquelin
  28 siblings, 1 reply; 36+ messages in thread
From: Xia, Chenbo @ 2023-06-07  6:48 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 6, 2023 4:18 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
> 
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
> 
> Benchmark results:
> ==================
> 
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
> 
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> 
> Usage:
> ======
> 
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
> 
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
> 
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --
> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
> txq=4 --rxq=4
> 
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> 
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0
> ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
> 
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
> 
> ##########################################################################
> ##
> testpmd> stop
> 
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
> 
> 7. Quit testpmd
> testpmd> quit
> 
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>  -> a temporary patch to support reconnection via a tmpfs file is
> available,
>     upstream solution would be in-kernel and is being developed.
>  -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results
> 
> Changes in v5:
> ==============
> - Delay starting/stopping the device to after having replied to the VDUSE
>   event in order to avoid a deadlock encountered when testing with OVS.

Could you explain more to help me understand the deadlock issue?

Thanks,
Chenbo

> - Mention reconnection support lack in the release note.
> 
> Changes in v4:
> ==============
> - Applied patch 1 and patch 2 from v3
> - Rebased on top of Eelco series
> - Fix coredump clear in IOTLB cache removal (David)
> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> - Fixed release note (David, Chenbo)
> 
> Changes in v2/v3:
> =================
> - Fixed mem_set_dump() parameter (patch 4)
> - Fixed accidental comment change (patch 7, Chenbo)
> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> - move change from patch 12 to 13 (Chenbo)
> - Enable locks annotation for control queue (Patch 17)
> - Send control queue notification when used descriptors enqueued (Patch 17)
> - Lock control queue IOTLB lock (Patch 17)
> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> - Enable more Virtio features (Patch 18)
> - Remove calls to pthread_setcancelstate() (Patch 22)
> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
> (Patch 22)
> - Use RTE_DIM() to get requests string array size (Patch 22)
> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> - Fix queues enablement with multiqueue (Patch 26)
> - Move kickfd creation for better logging (Patch 26)
> - Improve logging (Patch 26)
> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
> - Enable CVQ notifications once handler is installed (Patch 27)
> - Don't advertise multiqueue and control queue if app only request single
> queue pair (Patch 27)
> - Add release notes
> 
> Maxime Coquelin (26):
>   vhost: fix IOTLB entries overlap check with previous entry
>   vhost: add helper of IOTLB entries coredump
>   vhost: add helper for IOTLB entries shared page check
>   vhost: don't dump unneeded pages with IOTLB
>   vhost: change to single IOTLB cache per device
>   vhost: add offset field to IOTLB entries
>   vhost: add page size info to IOTLB entry
>   vhost: retry translating IOVA after IOTLB miss
>   vhost: introduce backend ops
>   vhost: add IOTLB cache entry removal callback
>   vhost: add helper for IOTLB misses
>   vhost: add helper for interrupt injection
>   vhost: add API to set max queue pairs
>   net/vhost: use API to set max queue pairs
>   vhost: add control virtqueue support
>   vhost: add VDUSE device creation and destruction
>   vhost: add VDUSE callback for IOTLB miss
>   vhost: add VDUSE callback for IOTLB entry removal
>   vhost: add VDUSE callback for IRQ injection
>   vhost: add VDUSE events handler
>   vhost: add support for virtqueue state get event
>   vhost: add support for VDUSE status set event
>   vhost: add support for VDUSE IOTLB update event
>   vhost: add VDUSE device startup
>   vhost: add multiqueue support to VDUSE
>   vhost: add VDUSE device stop
> 
>  doc/guides/prog_guide/vhost_lib.rst    |   4 +
>  doc/guides/rel_notes/release_23_07.rst |  12 +
>  drivers/net/vhost/rte_eth_vhost.c      |   3 +
>  lib/vhost/iotlb.c                      | 333 +++++++------
>  lib/vhost/iotlb.h                      |  45 +-
>  lib/vhost/meson.build                  |   5 +
>  lib/vhost/rte_vhost.h                  |  17 +
>  lib/vhost/socket.c                     |  72 ++-
>  lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>  lib/vhost/vduse.h                      |  33 ++
>  lib/vhost/version.map                  |   1 +
>  lib/vhost/vhost.c                      |  70 ++-
>  lib/vhost/vhost.h                      |  57 ++-
>  lib/vhost/vhost_user.c                 |  51 +-
>  lib/vhost/vhost_user.h                 |   2 +-
>  lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>  lib/vhost/virtio_net_ctrl.h            |  10 +
>  17 files changed, 1409 insertions(+), 238 deletions(-)
>  create mode 100644 lib/vhost/vduse.c
>  create mode 100644 lib/vhost/vduse.h
>  create mode 100644 lib/vhost/virtio_net_ctrl.c
>  create mode 100644 lib/vhost/virtio_net_ctrl.h
> 
> --
> 2.40.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (26 preceding siblings ...)
  2023-06-07  6:48 ` [PATCH v5 00/26] Add VDUSE support to Vhost library Xia, Chenbo
@ 2023-06-07  8:05 ` David Marchand
  2023-06-08  9:17   ` Maxime Coquelin
  2023-06-08 14:29 ` Maxime Coquelin
  28 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2023-06-07  8:05 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu

On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
>
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
>
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
>
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
>
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
>
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
>
> Benchmark results:
> ==================
>
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
>
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>
> Usage:
> ======
>
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
>
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
>
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4

9 is a nice but undefined value. 8 is enough.
In general, I prefer "human readable" strings, like *:debug ;-).


>
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0  ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
>
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>   ############################################################################
> testpmd> stop
>
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
>
> 7. Quit testpmd
> testpmd> quit
>
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>  -> a temporary patch to support reconnection via a tmpfs file is available,
>     upstream solution would be in-kernel and is being developed.
>  -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results

We are missing a reference to the kernel patches required to have
vduse accept net devices.

I had played with the patches at v1 and it was working ok.
I did not review in depth the latest revisions, but I followed your
series from the PoC/start.
Overall, the series lgtm.

For the series,
Acked-by: David Marchand <david.marchand@redhat.com>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-07  6:48 ` [PATCH v5 00/26] Add VDUSE support to Vhost library Xia, Chenbo
@ 2023-06-07 14:58   ` Maxime Coquelin
  2023-06-08  1:53     ` Xia, Chenbo
  0 siblings, 1 reply; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-07 14:58 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu



On 6/7/23 08:48, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, June 6, 2023 4:18 PM
>> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com; lulu@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --
>> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
>> txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0
>> ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>
>> ##########################################################################
>> ##
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is
>> available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
>> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
>>
>> Changes in v5:
>> ==============
>> - Delay starting/stopping the device to after having replied to the VDUSE
>>    event in order to avoid a deadlock encountered when testing with OVS.
> 
> Could you explain more to help me understand the deadlock issue?

Sure.

The V5 fixes an ABBA deadlock involving OVS mutex and kernel
rtnl_lock(), two OVS threads and the vdpa tool process.

We have an OVS bridge with a mlx5 port already added.
We add the vduse port to the same bridge.
Then we use the iproute2 vdpa tool to attach the vduse device the the
kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
reply.

This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
called, which triggers a bridge reconfiguration.

On bridge reconfiguration, the mlx5 port takes the OVS mutex and
performs an ioctl() which tries to take the rtnl lock, but is is already
owned by the vdpa tool.

The vduse_events thread is stucked waiting for the OVS mutex, so the 
reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
process is stucked for 30 seconds, until a timeout happens.

When the timeourt happen, everything is unblocked, but the VDUSE device
has been marked as broken, and so not usable anymore.

I could reproduce and provide you the backtraces of the different
threads if you wish.

Anyway, I think it makes sense to perform the device startup after
having replied to VDUSE_SET_STATUS request, as it just mean the device
has taken into account the new status of the driver.

Hope it clarifies, let me know if you need more details.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>> - Mention reconnection support lack in the release note.
>>
>> Changes in v4:
>> ==============
>> - Applied patch 1 and patch 2 from v3
>> - Rebased on top of Eelco series
>> - Fix coredump clear in IOTLB cache removal (David)
>> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
>> - Fixed release note (David, Chenbo)
>>
>> Changes in v2/v3:
>> =================
>> - Fixed mem_set_dump() parameter (patch 4)
>> - Fixed accidental comment change (patch 7, Chenbo)
>> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
>> - move change from patch 12 to 13 (Chenbo)
>> - Enable locks annotation for control queue (Patch 17)
>> - Send control queue notification when used descriptors enqueued (Patch 17)
>> - Lock control queue IOTLB lock (Patch 17)
>> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
>> - Set VDUSE dev FD as NONBLOCK (Patch 18)
>> - Enable more Virtio features (Patch 18)
>> - Remove calls to pthread_setcancelstate() (Patch 22)
>> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
>> (Patch 22)
>> - Use RTE_DIM() to get requests string array size (Patch 22)
>> - Set reply result for IOTLB update message (Patch 25, Chenbo)
>> - Fix queues enablement with multiqueue (Patch 26)
>> - Move kickfd creation for better logging (Patch 26)
>> - Improve logging (Patch 26)
>> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
>> - Enable CVQ notifications once handler is installed (Patch 27)
>> - Don't advertise multiqueue and control queue if app only request single
>> queue pair (Patch 27)
>> - Add release notes
>>
>> Maxime Coquelin (26):
>>    vhost: fix IOTLB entries overlap check with previous entry
>>    vhost: add helper of IOTLB entries coredump
>>    vhost: add helper for IOTLB entries shared page check
>>    vhost: don't dump unneeded pages with IOTLB
>>    vhost: change to single IOTLB cache per device
>>    vhost: add offset field to IOTLB entries
>>    vhost: add page size info to IOTLB entry
>>    vhost: retry translating IOVA after IOTLB miss
>>    vhost: introduce backend ops
>>    vhost: add IOTLB cache entry removal callback
>>    vhost: add helper for IOTLB misses
>>    vhost: add helper for interrupt injection
>>    vhost: add API to set max queue pairs
>>    net/vhost: use API to set max queue pairs
>>    vhost: add control virtqueue support
>>    vhost: add VDUSE device creation and destruction
>>    vhost: add VDUSE callback for IOTLB miss
>>    vhost: add VDUSE callback for IOTLB entry removal
>>    vhost: add VDUSE callback for IRQ injection
>>    vhost: add VDUSE events handler
>>    vhost: add support for virtqueue state get event
>>    vhost: add support for VDUSE status set event
>>    vhost: add support for VDUSE IOTLB update event
>>    vhost: add VDUSE device startup
>>    vhost: add multiqueue support to VDUSE
>>    vhost: add VDUSE device stop
>>
>>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>>   doc/guides/rel_notes/release_23_07.rst |  12 +
>>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>>   lib/vhost/iotlb.c                      | 333 +++++++------
>>   lib/vhost/iotlb.h                      |  45 +-
>>   lib/vhost/meson.build                  |   5 +
>>   lib/vhost/rte_vhost.h                  |  17 +
>>   lib/vhost/socket.c                     |  72 ++-
>>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>>   lib/vhost/vduse.h                      |  33 ++
>>   lib/vhost/version.map                  |   1 +
>>   lib/vhost/vhost.c                      |  70 ++-
>>   lib/vhost/vhost.h                      |  57 ++-
>>   lib/vhost/vhost_user.c                 |  51 +-
>>   lib/vhost/vhost_user.h                 |   2 +-
>>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>>   lib/vhost/virtio_net_ctrl.h            |  10 +
>>   17 files changed, 1409 insertions(+), 238 deletions(-)
>>   create mode 100644 lib/vhost/vduse.c
>>   create mode 100644 lib/vhost/vduse.h
>>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>>   create mode 100644 lib/vhost/virtio_net_ctrl.h
>>
>> --
>> 2.40.1
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-07 14:58   ` Maxime Coquelin
@ 2023-06-08  1:53     ` Xia, Chenbo
  0 siblings, 0 replies; 36+ messages in thread
From: Xia, Chenbo @ 2023-06-08  1:53 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, June 7, 2023 10:59 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Subject: Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
> 
> 
> 
> On 6/7/23 08:48, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, June 6, 2023 4:18 PM
> >> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> >> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> >> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
> Yongji
> >> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> >> amorenoz@redhat.com; lulu@redhat.com
> >> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
> >>
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >>
> >> Benchmark results:
> >> ==================
> >>
> >> On this v2, PVP reference benchmark has been run & compared with
> >> Vhost-user.
> >>
> >> When doing macswap forwarding in the worload, no difference is seen.
> >> When doing io forwarding in the workload, we see 4% performance
> >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> >> explained by the use of the IOTLB layer in the Vhost-library when using
> >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> >>
> >> Usage:
> >> ======
> >>
> >> 1. Probe required Kernel modules
> >> # modprobe vdpa
> >> # modprobe vduse
> >> # modprobe virtio-vdpa
> >>
> >> 2. Build (require vduse kernel headers to be available)
> >> # meson build
> >> # ninja -C build
> >>
> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> >> testpmd (with 4 queue pairs in this example)
> >> # ./build/app/dpdk-testpmd --no-pci --
> >> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i
> --
> >> txq=4 --rxq=4
> >>
> >> 4. Attach the VDUSE device to the vDPA bus
> >> # vdpa dev add name vduse0 mgmtdev vduse
> >> => The virtio-net netdev shows up (eth0 here)
> >> # ip l show eth0
> >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> >> mode DEFAULT group default qlen 1000
> >>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> >>
> >> 5. Start/stop traffic in testpmd
> >> testpmd> start
> >> testpmd> show port stats 0
> >>    ######################## NIC statistics for port 0
> >> ########################
> >>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
> >>    RX-errors: 0
> >>    RX-nombuf:  0
> >>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> >>
> >>    Throughput (since last show)
> >>    Rx-pps:            0          Rx-bps:            0
> >>    Tx-pps:            0          Tx-bps:            0
> >>
> >>
> ##########################################################################
> >> ##
> >> testpmd> stop
> >>
> >> 6. Detach the VDUSE device from the vDPA bus
> >> # vdpa dev del vduse0
> >>
> >> 7. Quit testpmd
> >> testpmd> quit
> >>
> >> Known issues & remaining work:
> >> ==============================
> >> - Fix issue in FD manager (still polling while FD has been removed)
> >> - Add Netlink support in Vhost library
> >> - Support device reconnection
> >>   -> a temporary patch to support reconnection via a tmpfs file is
> >> available,
> >>      upstream solution would be in-kernel and is being developed.
> >>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
> >> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> >> - Support packed ring
> >> - Provide more performance benchmark results
> >>
> >> Changes in v5:
> >> ==============
> >> - Delay starting/stopping the device to after having replied to the
> VDUSE
> >>    event in order to avoid a deadlock encountered when testing with OVS.
> >
> > Could you explain more to help me understand the deadlock issue?
> 
> Sure.
> 
> The V5 fixes an ABBA deadlock involving OVS mutex and kernel
> rtnl_lock(), two OVS threads and the vdpa tool process.
> 
> We have an OVS bridge with a mlx5 port already added.
> We add the vduse port to the same bridge.
> Then we use the iproute2 vdpa tool to attach the vduse device the the
> kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
> net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
> reply.
> 
> This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
> handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
> called, which triggers a bridge reconfiguration.
> 
> On bridge reconfiguration, the mlx5 port takes the OVS mutex and
> performs an ioctl() which tries to take the rtnl lock, but is is already
> owned by the vdpa tool.
> 
> The vduse_events thread is stucked waiting for the OVS mutex, so the
> reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
> process is stucked for 30 seconds, until a timeout happens.
> 
> When the timeourt happen, everything is unblocked, but the VDUSE device
> has been marked as broken, and so not usable anymore.
> 
> I could reproduce and provide you the backtraces of the different
> threads if you wish.
> 
> Anyway, I think it makes sense to perform the device startup after
> having replied to VDUSE_SET_STATUS request, as it just mean the device
> has taken into account the new status of the driver.
> 
> Hope it clarifies, let me know if you need more details.

It's very clear! Thanks Maxime for the explanation!

/Chenbo

> 
> Thanks,
> Maxime
> 
> > Thanks,
> > Chenbo
> >
> >> - Mention reconnection support lack in the release note.
> >>
> >> Changes in v4:
> >> ==============
> >> - Applied patch 1 and patch 2 from v3
> >> - Rebased on top of Eelco series
> >> - Fix coredump clear in IOTLB cache removal (David)
> >> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> >> - Fixed release note (David, Chenbo)
> >>
> >> Changes in v2/v3:
> >> =================
> >> - Fixed mem_set_dump() parameter (patch 4)
> >> - Fixed accidental comment change (patch 7, Chenbo)
> >> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> >> - move change from patch 12 to 13 (Chenbo)
> >> - Enable locks annotation for control queue (Patch 17)
> >> - Send control queue notification when used descriptors enqueued (Patch
> 17)
> >> - Lock control queue IOTLB lock (Patch 17)
> >> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> >> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> >> - Enable more Virtio features (Patch 18)
> >> - Remove calls to pthread_setcancelstate() (Patch 22)
> >> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a
> set
> >> (Patch 22)
> >> - Use RTE_DIM() to get requests string array size (Patch 22)
> >> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> >> - Fix queues enablement with multiqueue (Patch 26)
> >> - Move kickfd creation for better logging (Patch 26)
> >> - Improve logging (Patch 26)
> >> - Uninstall cvq kickfd in case of handler installation failure (Patch
> 27)
> >> - Enable CVQ notifications once handler is installed (Patch 27)
> >> - Don't advertise multiqueue and control queue if app only request
> single
> >> queue pair (Patch 27)
> >> - Add release notes
> >>
> >> Maxime Coquelin (26):
> >>    vhost: fix IOTLB entries overlap check with previous entry
> >>    vhost: add helper of IOTLB entries coredump
> >>    vhost: add helper for IOTLB entries shared page check
> >>    vhost: don't dump unneeded pages with IOTLB
> >>    vhost: change to single IOTLB cache per device
> >>    vhost: add offset field to IOTLB entries
> >>    vhost: add page size info to IOTLB entry
> >>    vhost: retry translating IOVA after IOTLB miss
> >>    vhost: introduce backend ops
> >>    vhost: add IOTLB cache entry removal callback
> >>    vhost: add helper for IOTLB misses
> >>    vhost: add helper for interrupt injection
> >>    vhost: add API to set max queue pairs
> >>    net/vhost: use API to set max queue pairs
> >>    vhost: add control virtqueue support
> >>    vhost: add VDUSE device creation and destruction
> >>    vhost: add VDUSE callback for IOTLB miss
> >>    vhost: add VDUSE callback for IOTLB entry removal
> >>    vhost: add VDUSE callback for IRQ injection
> >>    vhost: add VDUSE events handler
> >>    vhost: add support for virtqueue state get event
> >>    vhost: add support for VDUSE status set event
> >>    vhost: add support for VDUSE IOTLB update event
> >>    vhost: add VDUSE device startup
> >>    vhost: add multiqueue support to VDUSE
> >>    vhost: add VDUSE device stop
> >>
> >>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
> >>   doc/guides/rel_notes/release_23_07.rst |  12 +
> >>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
> >>   lib/vhost/iotlb.c                      | 333 +++++++------
> >>   lib/vhost/iotlb.h                      |  45 +-
> >>   lib/vhost/meson.build                  |   5 +
> >>   lib/vhost/rte_vhost.h                  |  17 +
> >>   lib/vhost/socket.c                     |  72 ++-
> >>   lib/vhost/vduse.c                      | 646
> +++++++++++++++++++++++++
> >>   lib/vhost/vduse.h                      |  33 ++
> >>   lib/vhost/version.map                  |   1 +
> >>   lib/vhost/vhost.c                      |  70 ++-
> >>   lib/vhost/vhost.h                      |  57 ++-
> >>   lib/vhost/vhost_user.c                 |  51 +-
> >>   lib/vhost/vhost_user.h                 |   2 +-
> >>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
> >>   lib/vhost/virtio_net_ctrl.h            |  10 +
> >>   17 files changed, 1409 insertions(+), 238 deletions(-)
> >>   create mode 100644 lib/vhost/vduse.c
> >>   create mode 100644 lib/vhost/vduse.h
> >>   create mode 100644 lib/vhost/virtio_net_ctrl.c
> >>   create mode 100644 lib/vhost/virtio_net_ctrl.h
> >>
> >> --
> >> 2.40.1
> >


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 24/26] vhost: add VDUSE device startup
  2023-06-06  8:18 ` [PATCH v5 24/26] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-06-08  2:10   ` Xia, Chenbo
  0 siblings, 0 replies; 36+ messages in thread
From: Xia, Chenbo @ 2023-06-08  2:10 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 6, 2023 4:19 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v5 24/26] vhost: add VDUSE device startup
> 
> This patch adds the device and its virtqueues
> initialization once the Virtio driver has set the DRIVER_OK
> in the Virtio status register.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 126 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 110654ec68..ff4c54c321 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -141,6 +141,128 @@ static struct vhost_backend_ops vduse_backend_ops =
> {
>  	.inject_irq = vduse_inject_irq,
>  };
> 
> +static void
> +vduse_vring_setup(struct virtio_net *dev, unsigned int index)
> +{
> +	struct vhost_virtqueue *vq = dev->virtqueue[index];
> +	struct vhost_vring_addr *ra = &vq->ring_addrs;
> +	struct vduse_vq_info vq_info;
> +	struct vduse_vq_eventfd vq_efd;
> +	int ret;
> +
> +	vq_info.index = index;
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u
> info: %s\n",
> +				index, strerror(errno));
> +		return;
> +	}
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n",
> vq_info.desc_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n",
> vq_info.driver_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n",
> vq_info.device_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n",
> vq_info.split.avail_index);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
> +
> +	vq->last_avail_idx = vq_info.split.avail_index;
> +	vq->size = vq_info.num;
> +	vq->ready = true;
> +	vq->enabled = vq_info.ready;
> +	ra->desc_user_addr = vq_info.desc_addr;
> +	ra->avail_user_addr = vq_info.driver_addr;
> +	ra->used_user_addr = vq_info.device_addr;
> +
> +	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> +	if (vq->kickfd < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for
> VQ %u: %s\n",
> +				index, strerror(errno));
> +		vq->kickfd = VIRTIO_INVALID_EVENTFD;
> +		return;
> +	}
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tkick fd: %d\n", vq->kickfd);
> +
> +	vq->shadow_used_split = rte_malloc_socket(NULL,
> +				vq->size * sizeof(struct vring_used_elem),
> +				RTE_CACHE_LINE_SIZE, 0);
> +	vq->batch_copy_elems = rte_malloc_socket(NULL,
> +				vq->size * sizeof(struct batch_copy_elem),
> +				RTE_CACHE_LINE_SIZE, 0);
> +
> +	vhost_user_iotlb_rd_lock(vq);
> +	if (vring_translate(dev, vq))
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate
> vring %d addresses\n",
> +				index);
> +
> +	if (vhost_enable_guest_notification(dev, vq, 0))
> +		VHOST_LOG_CONFIG(dev->ifname, ERR,
> +				"Failed to disable guest notifications on
> vring %d\n",
> +				index);
> +	vhost_user_iotlb_rd_unlock(vq);
> +
> +	vq_efd.index = index;
> +	vq_efd.fd = vq->kickfd;
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for
> VQ %u: %s\n",
> +				index, strerror(errno));
> +		close(vq->kickfd);
> +		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +		return;
> +	}
> +}
> +
> +static void
> +vduse_device_start(struct virtio_net *dev)
> +{
> +	unsigned int i, ret;
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "Starting device...\n");
> +
> +	dev->notify_ops = vhost_driver_callback_get(dev->ifname);
> +	if (!dev->notify_ops) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR,
> +				"Failed to get callback ops for driver\n");
> +		return;
> +	}
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev-
> >features);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get
> features: %s\n",
> +				strerror(errno));
> +		return;
> +	}
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "Negotiated Virtio features:
> 0x%" PRIx64 "\n",
> +		dev->features);
> +
> +	if (dev->features &
> +		((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
> +		 (1ULL << VIRTIO_F_VERSION_1) |
> +		 (1ULL << VIRTIO_F_RING_PACKED))) {
> +		dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	} else {
> +		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
> +	}
> +
> +	for (i = 0; i < dev->nr_vring; i++)
> +		vduse_vring_setup(dev, i);
> +
> +	dev->flags |= VIRTIO_DEV_READY;
> +
> +	if (dev->notify_ops->new_device(dev->vid) == 0)
> +		dev->flags |= VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < dev->nr_vring; i++) {
> +		struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> +		if (dev->notify_ops->vring_state_changed)
> +			dev->notify_ops->vring_state_changed(dev->vid, i, vq-
> >enabled);
> +	}
> +}
> +
>  static void
>  vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>  {
> @@ -199,6 +321,10 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  				strerror(errno));
>  		return;
>  	}
> +
> +	if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> +		vduse_device_start(dev);
> +
>  	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled
> successfully\n",
>  			vduse_req_id_to_str(req.type), req.type);
>  }
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5 26/26] vhost: add VDUSE device stop
  2023-06-06  8:18 ` [PATCH v5 26/26] vhost: add VDUSE device stop Maxime Coquelin
@ 2023-06-08  2:11   ` Xia, Chenbo
  0 siblings, 0 replies; 36+ messages in thread
From: Xia, Chenbo @ 2023-06-08  2:11 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 6, 2023 4:19 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v5 26/26] vhost: add VDUSE device stop
> 
> This patch adds VDUSE device stop and cleanup of its
> virtqueues.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/rel_notes/release_23_07.rst |  7 +++
>  lib/vhost/vduse.c                      | 72 +++++++++++++++++++++++---
>  2 files changed, 71 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_23_07.rst
> b/doc/guides/rel_notes/release_23_07.rst
> index 7034fb664c..0f44c859da 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -67,6 +67,13 @@ New Features
>    Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
> the
>    maximum number of supported queue pairs, required for VDUSE support.
> 
> +* **Added VDUSE support into Vhost library.**
> +
> +  VDUSE aims at implementing vDPA devices in userspace. It can be used as
> an
> +  alternative to Vhost-user when using Vhost-vDPA, but also enable
> providing a
> +  virtio-net netdev to the host when using Virtio-vDPA driver. A
> limitation in
> +  this release is the lack of reconnection support.
> +
> 
>  Removed Items
>  -------------
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index d3759077ff..a509daf80c 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
> int index)
>  	}
>  }
> 
> +static void
> +vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
> +{
> +	struct vhost_virtqueue *vq = dev->virtqueue[index];
> +	struct vduse_vq_eventfd vq_efd;
> +	int ret;
> +
> +	if (vq == dev->cvq && vq->kickfd >= 0) {
> +		fdset_del(&vduse.fdset, vq->kickfd);
> +		fdset_pipe_notify(&vduse.fdset);
> +	}
> +
> +	vq_efd.index = index;
> +	vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
> +	if (ret)
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd
> for VQ %u: %s\n",
> +				index, strerror(errno));
> +
> +	close(vq->kickfd);
> +	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +
> +	vring_invalidate(dev, vq);
> +
> +	rte_free(vq->batch_copy_elems);
> +	vq->batch_copy_elems = NULL;
> +
> +	rte_free(vq->shadow_used_split);
> +	vq->shadow_used_split = NULL;
> +
> +	vq->enabled = false;
> +	vq->ready = false;
> +	vq->size = 0;
> +	vq->last_used_idx = 0;
> +	vq->last_avail_idx = 0;
> +}
> +
>  static void
>  vduse_device_start(struct virtio_net *dev)
>  {
> @@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
>  	}
>  }
> 
> +static void
> +vduse_device_stop(struct virtio_net *dev)
> +{
> +	unsigned int i;
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
> +
> +	vhost_destroy_device_notify(dev);
> +
> +	dev->flags &= ~VIRTIO_DEV_READY;
> +
> +	for (i = 0; i < dev->nr_vring; i++)
> +		vduse_vring_cleanup(dev, i);
> +
> +	vhost_user_iotlb_flush_all(dev);
> +}
> +
>  static void
>  vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>  {
> @@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  	struct vduse_dev_request req;
>  	struct vduse_dev_response resp;
>  	struct vhost_virtqueue *vq;
> +	uint8_t old_status = dev->status;
>  	int ret;
> 
>  	memset(&resp, 0, sizeof(resp));
> @@ -339,6 +395,7 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  	case VDUSE_SET_STATUS:
>  		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
>  				req.s.status);
> +		old_status = dev->status;
>  		dev->status = req.s.status;
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
> @@ -363,8 +420,12 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  		return;
>  	}
> 
> -	if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> -		vduse_device_start(dev);
> +	if ((old_status ^ dev->status) & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
> +		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> +			vduse_device_start(dev);
> +		else
> +			vduse_device_stop(dev);
> +	}
> 
>  	VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled
> successfully\n",
>  			vduse_req_id_to_str(req.type), req.type);
> @@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
>  	if (vid == RTE_MAX_VHOST_DEVICE)
>  		return -1;
> 
> -	if (dev->cvq && dev->cvq->kickfd >= 0) {
> -		fdset_del(&vduse.fdset, dev->cvq->kickfd);
> -		fdset_pipe_notify(&vduse.fdset);
> -		close(dev->cvq->kickfd);
> -		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> -	}
> +	vduse_device_stop(dev);
> 
>  	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
>  	fdset_pipe_notify(&vduse.fdset);
> --
> 2.40.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-07  8:05 ` David Marchand
@ 2023-06-08  9:17   ` Maxime Coquelin
  2023-06-08 12:44     ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-08  9:17 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu



On 6/7/23 10:05, David Marchand wrote:
> On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
> 
> 9 is a nice but undefined value. 8 is enough.
> In general, I prefer "human readable" strings, like *:debug ;-).
> 
> 
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0  ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>    ############################################################################
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
> 
> We are missing a reference to the kernel patches required to have
> vduse accept net devices.

Right, I mention it in the cover letter, but it should be in the release
note also. I propose to append this to the release note:
"While VDUSE support is already available in upstream Kernel, a couple
of patches are required to support network device type, which are being
upstreamed: 
https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/"

Does that sound good to you?

Thanks,
Maxime

> 
> I had played with the patches at v1 and it was working ok.
> I did not review in depth the latest revisions, but I followed your
> series from the PoC/start.
> Overall, the series lgtm.
> 
> For the series,
> Acked-by: David Marchand <david.marchand@redhat.com>
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-08  9:17   ` Maxime Coquelin
@ 2023-06-08 12:44     ` David Marchand
  0 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2023-06-08 12:44 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, chenbo.xia, mkp, fbl, jasowang, cunming.liang, xieyongji,
	echaudro, eperezma, amorenoz, lulu, Thomas Monjalon

On Thu, Jun 8, 2023 at 11:17 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 6/7/23 10:05, David Marchand wrote:
> > On Tue, Jun 6, 2023 at 10:19 AM Maxime Coquelin
> > <maxime.coquelin@redhat.com> wrote:
> >>
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >>
> >> Benchmark results:
> >> ==================
> >>
> >> On this v2, PVP reference benchmark has been run & compared with
> >> Vhost-user.
> >>
> >> When doing macswap forwarding in the worload, no difference is seen.
> >> When doing io forwarding in the workload, we see 4% performance
> >> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> >> explained by the use of the IOTLB layer in the Vhost-library when using
> >> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> >>
> >> Usage:
> >> ======
> >>
> >> 1. Probe required Kernel modules
> >> # modprobe vdpa
> >> # modprobe vduse
> >> # modprobe virtio-vdpa
> >>
> >> 2. Build (require vduse kernel headers to be available)
> >> # meson build
> >> # ninja -C build
> >>
> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> >> testpmd (with 4 queue pairs in this example)
> >> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
> >
> > 9 is a nice but undefined value. 8 is enough.
> > In general, I prefer "human readable" strings, like *:debug ;-).
> >
> >
> >>
> >> 4. Attach the VDUSE device to the vDPA bus
> >> # vdpa dev add name vduse0 mgmtdev vduse
> >> => The virtio-net netdev shows up (eth0 here)
> >> # ip l show eth0
> >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
> >>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> >>
> >> 5. Start/stop traffic in testpmd
> >> testpmd> start
> >> testpmd> show port stats 0
> >>    ######################## NIC statistics for port 0  ########################
> >>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
> >>    RX-errors: 0
> >>    RX-nombuf:  0
> >>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> >>
> >>    Throughput (since last show)
> >>    Rx-pps:            0          Rx-bps:            0
> >>    Tx-pps:            0          Tx-bps:            0
> >>    ############################################################################
> >> testpmd> stop
> >>
> >> 6. Detach the VDUSE device from the vDPA bus
> >> # vdpa dev del vduse0
> >>
> >> 7. Quit testpmd
> >> testpmd> quit
> >>
> >> Known issues & remaining work:
> >> ==============================
> >> - Fix issue in FD manager (still polling while FD has been removed)
> >> - Add Netlink support in Vhost library
> >> - Support device reconnection
> >>   -> a temporary patch to support reconnection via a tmpfs file is available,
> >>      upstream solution would be in-kernel and is being developed.
> >>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> >> - Support packed ring
> >> - Provide more performance benchmark results
> >
> > We are missing a reference to the kernel patches required to have
> > vduse accept net devices.
>
> Right, I mention it in the cover letter, but it should be in the release
> note also. I propose to append this to the release note:
> "While VDUSE support is already available in upstream Kernel, a couple
> of patches are required to support network device type, which are being
> upstreamed:
> https://lore.kernel.org/all/20230419134329.346825-1-maxime.coquelin@redhat.com/"
>
> Does that sound good to you?

Ok for me.
Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
  2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (27 preceding siblings ...)
  2023-06-07  8:05 ` David Marchand
@ 2023-06-08 14:29 ` Maxime Coquelin
  28 siblings, 0 replies; 36+ messages in thread
From: Maxime Coquelin @ 2023-06-08 14:29 UTC (permalink / raw)
  To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu



On 6/6/23 10:18, Maxime Coquelin wrote:
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
> 
> Benchmark results:
> ==================
> 
> On this v2, PVP reference benchmark has been run & compared with
> Vhost-user.
> 
> When doing macswap forwarding in the worload, no difference is seen.
> When doing io forwarding in the workload, we see 4% performance
> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
> explained by the use of the IOTLB layer in the Vhost-library when using
> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
> 
> Usage:
> ======
> 
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
> 
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
> 
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>   
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> 
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>    ######################## NIC statistics for port 0  ########################
>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>    RX-errors: 0
>    RX-nombuf:  0
>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> 
>    Throughput (since last show)
>    Rx-pps:            0          Rx-bps:            0
>    Tx-pps:            0          Tx-bps:            0
>    ############################################################################
> testpmd> stop
> 
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
> 
> 7. Quit testpmd
> testpmd> quit
> 
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
>   -> a temporary patch to support reconnection via a tmpfs file is available,
>      upstream solution would be in-kernel and is being developed.
>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
> - Support packed ring
> - Provide more performance benchmark results
> 
> Changes in v5:
> ==============
> - Delay starting/stopping the device to after having replied to the VDUSE
>    event in order to avoid a deadlock encountered when testing with OVS.
> - Mention reconnection support lack in the release note.
> 
> Changes in v4:
> ==============
> - Applied patch 1 and patch 2 from v3
> - Rebased on top of Eelco series
> - Fix coredump clear in IOTLB cache removal (David)
> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
> - Fixed release note (David, Chenbo)
> 
> Changes in v2/v3:
> =================
> - Fixed mem_set_dump() parameter (patch 4)
> - Fixed accidental comment change (patch 7, Chenbo)
> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
> - move change from patch 12 to 13 (Chenbo)
> - Enable locks annotation for control queue (Patch 17)
> - Send control queue notification when used descriptors enqueued (Patch 17)
> - Lock control queue IOTLB lock (Patch 17)
> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
> - Set VDUSE dev FD as NONBLOCK (Patch 18)
> - Enable more Virtio features (Patch 18)
> - Remove calls to pthread_setcancelstate() (Patch 22)
> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
> - Use RTE_DIM() to get requests string array size (Patch 22)
> - Set reply result for IOTLB update message (Patch 25, Chenbo)
> - Fix queues enablement with multiqueue (Patch 26)
> - Move kickfd creation for better logging (Patch 26)
> - Improve logging (Patch 26)
> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
> - Enable CVQ notifications once handler is installed (Patch 27)
> - Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
> - Add release notes
> 
> Maxime Coquelin (26):
>    vhost: fix IOTLB entries overlap check with previous entry
>    vhost: add helper of IOTLB entries coredump
>    vhost: add helper for IOTLB entries shared page check
>    vhost: don't dump unneeded pages with IOTLB
>    vhost: change to single IOTLB cache per device
>    vhost: add offset field to IOTLB entries
>    vhost: add page size info to IOTLB entry
>    vhost: retry translating IOVA after IOTLB miss
>    vhost: introduce backend ops
>    vhost: add IOTLB cache entry removal callback
>    vhost: add helper for IOTLB misses
>    vhost: add helper for interrupt injection
>    vhost: add API to set max queue pairs
>    net/vhost: use API to set max queue pairs
>    vhost: add control virtqueue support
>    vhost: add VDUSE device creation and destruction
>    vhost: add VDUSE callback for IOTLB miss
>    vhost: add VDUSE callback for IOTLB entry removal
>    vhost: add VDUSE callback for IRQ injection
>    vhost: add VDUSE events handler
>    vhost: add support for virtqueue state get event
>    vhost: add support for VDUSE status set event
>    vhost: add support for VDUSE IOTLB update event
>    vhost: add VDUSE device startup
>    vhost: add multiqueue support to VDUSE
>    vhost: add VDUSE device stop
> 
>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>   doc/guides/rel_notes/release_23_07.rst |  12 +
>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>   lib/vhost/iotlb.c                      | 333 +++++++------
>   lib/vhost/iotlb.h                      |  45 +-
>   lib/vhost/meson.build                  |   5 +
>   lib/vhost/rte_vhost.h                  |  17 +
>   lib/vhost/socket.c                     |  72 ++-
>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>   lib/vhost/vduse.h                      |  33 ++
>   lib/vhost/version.map                  |   1 +
>   lib/vhost/vhost.c                      |  70 ++-
>   lib/vhost/vhost.h                      |  57 ++-
>   lib/vhost/vhost_user.c                 |  51 +-
>   lib/vhost/vhost_user.h                 |   2 +-
>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>   lib/vhost/virtio_net_ctrl.h            |  10 +
>   17 files changed, 1409 insertions(+), 238 deletions(-)
>   create mode 100644 lib/vhost/vduse.c
>   create mode 100644 lib/vhost/vduse.h
>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>   create mode 100644 lib/vhost/virtio_net_ctrl.h
> 


Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-06-08 14:29 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06  8:18 [PATCH v5 00/26] Add VDUSE support to Vhost library Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 09/26] vhost: introduce backend ops Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 12/26] vhost: add helper for interrupt injection Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 13/26] vhost: add API to set max queue pairs Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 14/26] net/vhost: use " Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 15/26] vhost: add control virtqueue support Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 20/26] vhost: add VDUSE events handler Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 24/26] vhost: add VDUSE device startup Maxime Coquelin
2023-06-08  2:10   ` Xia, Chenbo
2023-06-06  8:18 ` [PATCH v5 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 26/26] vhost: add VDUSE device stop Maxime Coquelin
2023-06-08  2:11   ` Xia, Chenbo
2023-06-07  6:48 ` [PATCH v5 00/26] Add VDUSE support to Vhost library Xia, Chenbo
2023-06-07 14:58   ` Maxime Coquelin
2023-06-08  1:53     ` Xia, Chenbo
2023-06-07  8:05 ` David Marchand
2023-06-08  9:17   ` Maxime Coquelin
2023-06-08 12:44     ` David Marchand
2023-06-08 14:29 ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).