DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC 00/27] Add VDUSE support to Vhost library
@ 2023-03-31 15:42 Maxime Coquelin
  2023-03-31 15:42 ` [RFC 01/27] vhost: fix missing guest notif stat increment Maxime Coquelin
                   ` (29 more replies)
  0 siblings, 30 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This series introduces a new type of backend, VDUSE,
to the Vhost library.

VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.

Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.

It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.

While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:

https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc

In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.

Usage:
======

1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa

2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build

3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
 
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff

5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
  ######################## NIC statistics for port 0  ########################
  RX-packets: 11         RX-missed: 0          RX-bytes:  1482
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 1          TX-errors: 0          TX-bytes:  62

  Throughput (since last show)
  Rx-pps:            0          Rx-bps:            0
  Tx-pps:            0          Tx-bps:            0
  ############################################################################
testpmd> stop

6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0

7. Quit testpmd
testpmd> quit

Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
- Support packed ring
- Enable & test more Virtio features
- Provide performance benchmark results


Maxime Coquelin (27):
  vhost: fix missing guest notif stat increment
  vhost: fix invalid call FD handling
  vhost: fix IOTLB entries overlap check with previous entry
  vhost: add helper of IOTLB entries coredump
  vhost: add helper for IOTLB entries shared page check
  vhost: don't dump unneeded pages with IOTLB
  vhost: change to single IOTLB cache per device
  vhost: add offset field to IOTLB entries
  vhost: add page size info to IOTLB entry
  vhost: retry translating IOVA after IOTLB miss
  vhost: introduce backend ops
  vhost: add IOTLB cache entry removal callback
  vhost: add helper for IOTLB misses
  vhost: add helper for interrupt injection
  vhost: add API to set max queue pairs
  net/vhost: use API to set max queue pairs
  vhost: add control virtqueue support
  vhost: add VDUSE device creation and destruction
  vhost: add VDUSE callback for IOTLB miss
  vhost: add VDUSE callback for IOTLB entry removal
  vhost: add VDUSE callback for IRQ injection
  vhost: add VDUSE events handler
  vhost: add support for virtqueue state get event
  vhost: add support for VDUSE status set event
  vhost: add support for VDUSE IOTLB update event
  vhost: add VDUSE device startup
  vhost: add multiqueue support to VDUSE

 doc/guides/prog_guide/vhost_lib.rst |   4 +
 drivers/net/vhost/rte_eth_vhost.c   |   3 +
 lib/vhost/iotlb.c                   | 333 +++++++++--------
 lib/vhost/iotlb.h                   |  45 ++-
 lib/vhost/meson.build               |   5 +
 lib/vhost/rte_vhost.h               |  17 +
 lib/vhost/socket.c                  |  72 +++-
 lib/vhost/vduse.c                   | 553 ++++++++++++++++++++++++++++
 lib/vhost/vduse.h                   |  33 ++
 lib/vhost/version.map               |   3 +
 lib/vhost/vhost.c                   |  51 ++-
 lib/vhost/vhost.h                   |  90 +++--
 lib/vhost/vhost_user.c              |  53 ++-
 lib/vhost/vhost_user.h              |   2 +-
 lib/vhost/virtio_net_ctrl.c         | 282 ++++++++++++++
 lib/vhost/virtio_net_ctrl.h         |  10 +
 16 files changed, 1317 insertions(+), 239 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 01/27] vhost: fix missing guest notif stat increment
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-24  2:57   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 02/27] vhost: fix invalid call FD handling Maxime Coquelin
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin, stable

Guest notification counter was only incremented for split
ring, this patch adds it also for packed ring.

Fixes: 1ea74efd7fa4 ("vhost: add statistics for guest notification")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8fdab13c70..8554ab4002 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -973,6 +973,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 kick:
 	if (kick) {
 		eventfd_write(vq->callfd, (eventfd_t)1);
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			vq->stats.guest_notifications++;
 		if (dev->notify_ops->guest_notified)
 			dev->notify_ops->guest_notified(dev->vid);
 	}
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 02/27] vhost: fix invalid call FD handling
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
  2023-03-31 15:42 ` [RFC 01/27] vhost: fix missing guest notif stat increment Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-24  2:58   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin, stable

This patch fixes cases where IRQ injection is tried while
the call FD is not valid, which should not happen.

Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification suppression")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8554ab4002..40863f7bfd 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -902,9 +902,9 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			"%s: used_event_idx=%d, old=%d, new=%d\n",
 			__func__, vhost_used_event(vq), old, new);
 
-		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
-					(vq->callfd >= 0)) ||
-				unlikely(!signalled_used_valid)) {
+		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
+					unlikely(!signalled_used_valid)) &&
+				vq->callfd >= 0) {
 			eventfd_write(vq->callfd, (eventfd_t) 1);
 			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 				vq->stats.guest_notifications++;
@@ -971,7 +971,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (vhost_need_event(off, new, old))
 		kick = true;
 kick:
-	if (kick) {
+	if (kick && vq->callfd >= 0) {
 		eventfd_write(vq->callfd, (eventfd_t)1);
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			vq->stats.guest_notifications++;
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
  2023-03-31 15:42 ` [RFC 01/27] vhost: fix missing guest notif stat increment Maxime Coquelin
  2023-03-31 15:42 ` [RFC 02/27] vhost: fix invalid call FD handling Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-17 19:15   ` Mike Pattrick
  2023-04-24  2:58   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 04/27] vhost: add helper of IOTLB entries coredump Maxime Coquelin
                   ` (26 subsequent siblings)
  29 siblings, 2 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin, stable

Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap check")
fixed the check to ensure the entry to be removed does not
overlap with the next one in the IOTLB cache before marking
it as DONTDUMP with madvise(). This is not enough, because
the same issue is present when comparing with the previous
entry in the cache, where the end address of the previous
entry should be used, not the start one.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 3f45bc6061..870c8acb88 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 			mask = ~(alignment - 1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
@@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			mask = ~(alignment-1);
 
 			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL ||
-					(node->uaddr & mask) != (prev_node->uaddr & mask)) {
+			if (prev_node == NULL || (node->uaddr & mask) !=
+					((prev_node->uaddr + prev_node->size - 1) & mask)) {
 				next_node = RTE_TAILQ_NEXT(node, next);
 				/* Don't disable coredump if the next node is in the same page */
 				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 04/27] vhost: add helper of IOTLB entries coredump
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (2 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-24  2:59   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 05/27] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch reworks IOTLB code to extract madvise-related
bits into dedicated helper. This refactoring improves code
sharing.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 77 +++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 36 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 870c8acb88..e8f1cb661e 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,34 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+{
+	uint64_t align;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+
+	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+}
+
+static void
+vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
+{
+	uint64_t align, mask;
+
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	mask = ~(align - 1);
+
+	/* Don't disable coredump if the previous node is in the same page */
+	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+		/* Don't disable coredump if the next node is in the same page */
+		if (next == NULL ||
+				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+	}
+}
+
 static struct vhost_iotlb_entry *
 vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
 {
@@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
 	rte_rwlock_write_lock(&vq->iotlb_lock);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
-		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
-			hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
+		vhost_user_iotlb_set_dump(dev, node);
+
 		TAILQ_REMOVE(&vq->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(vq, node);
 	}
@@ -164,7 +192,6 @@ static void
 vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 	int entry_idx;
 
 	rte_rwlock_write_lock(&vq->iotlb_lock);
@@ -173,20 +200,10 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
 
 	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment - 1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
 			vq->iotlb_cache_nr--;
@@ -240,16 +257,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 			vhost_user_iotlb_pool_put(vq, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-				hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+			vhost_user_iotlb_set_dump(dev, new_node);
+
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			vq->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
-	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
-		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+	vhost_user_iotlb_set_dump(dev, new_node);
+
 	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
 	vq->iotlb_cache_nr++;
 
@@ -265,7 +282,6 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 					uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
-	uint64_t alignment, mask;
 
 	if (unlikely(!size))
 		return;
@@ -278,20 +294,9 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 			break;
 
 		if (iova < node->iova + node->size) {
-			struct vhost_iotlb_entry *next_node;
-			alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-			mask = ~(alignment-1);
-
-			/* Don't disable coredump if the previous node is in the same page */
-			if (prev_node == NULL || (node->uaddr & mask) !=
-					((prev_node->uaddr + prev_node->size - 1) & mask)) {
-				next_node = RTE_TAILQ_NEXT(node, next);
-				/* Don't disable coredump if the next node is in the same page */
-				if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
-						(next_node->uaddr & mask))
-					mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
-							false, alignment);
-			}
+			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
 			TAILQ_REMOVE(&vq->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(vq, node);
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 05/27] vhost: add helper for IOTLB entries shared page check
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (3 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 04/27] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-17 19:39   ` Mike Pattrick
  2023-04-24  2:59   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 06/27] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
                   ` (24 subsequent siblings)
  29 siblings, 2 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch introduces a helper to check whether two IOTLB
entries share a page.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index e8f1cb661e..d919f74704 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static bool
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
+		uint64_t align)
+{
+	uint64_t a_end, b_start;
+
+	if (a == NULL || b == NULL)
+		return false;
+
+	/* Assumes entry a lower than entry b */
+	RTE_ASSERT(a->uaddr < b->uaddr);
+	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+
+	return a_end > b_start;
+}
+
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
@@ -37,16 +54,14 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, mask;
+	uint64_t align;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
-	mask = ~(align - 1);
 
 	/* Don't disable coredump if the previous node is in the same page */
-	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+	if (!vhost_user_iotlb_share_page(prev, node, align)) {
 		/* Don't disable coredump if the next node is in the same page */
-		if (next == NULL ||
-				((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+		if (!vhost_user_iotlb_share_page(node, next, align))
 			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
 	}
 }
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 06/27] vhost: don't dump unneeded pages with IOTLB
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (4 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 05/27] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-20 17:11   ` Mike Pattrick
  2023-04-24  3:00   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 07/27] vhost: change to single IOTLB cache per device Maxime Coquelin
                   ` (23 subsequent siblings)
  29 siblings, 2 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin, stable

On IOTLB entry removal, previous fixes took care of not
marking pages shared with other IOTLB entries as DONTDUMP.

However, if an IOTLB entry is spanned on multiple pages,
the other pages were kept as DODUMP while they might not
have been shared with other entries, increasing needlessly
the coredump size.

This patch addresses this issue by excluding only the
shared pages from madvise's DONTDUMP.

Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index d919f74704..f598c0a8c4 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -54,16 +54,23 @@ static void
 vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align;
+	uint64_t align, start, end;
+
+	start = node->uaddr;
+	end = node->uaddr + node->size;
 
 	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
 
-	/* Don't disable coredump if the previous node is in the same page */
-	if (!vhost_user_iotlb_share_page(prev, node, align)) {
-		/* Don't disable coredump if the next node is in the same page */
-		if (!vhost_user_iotlb_share_page(node, next, align))
-			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
-	}
+	/* Skip first page if shared with previous entry. */
+	if (vhost_user_iotlb_share_page(prev, node, align))
+		start = RTE_ALIGN_CEIL(start, align);
+
+	/* Skip last page if shared with next entry. */
+	if (vhost_user_iotlb_share_page(node, next, align))
+		end = RTE_ALIGN_FLOOR(end, align);
+
+	if (end > start)
+		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
 }
 
 static struct vhost_iotlb_entry *
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 07/27] vhost: change to single IOTLB cache per device
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (5 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 06/27] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-25  6:19   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 08/27] vhost: add offset field to IOTLB entries Maxime Coquelin
                   ` (22 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch simplifies IOTLB implementation and improves
IOTLB memory consumption by having a single IOTLB cache
per device, instead of having one per queue.

In order to not impact performance, it keeps an IOTLB lock
per virtqueue, so that there is no contention between
multiple queue trying to acquire it.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  43 ++++++---
 lib/vhost/vhost.c      |  18 ++--
 lib/vhost/vhost.h      |  16 ++--
 lib/vhost/vhost_user.c |  25 +++--
 5 files changed, 160 insertions(+), 154 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index f598c0a8c4..a91115cf1c 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -74,86 +74,81 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 }
 
 static struct vhost_iotlb_entry *
-vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pool_get(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node;
 
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	node = SLIST_FIRST(&vq->iotlb_free_list);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	node = SLIST_FIRST(&dev->iotlb_free_list);
 	if (node != NULL)
-		SLIST_REMOVE_HEAD(&vq->iotlb_free_list, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+		SLIST_REMOVE_HEAD(&dev->iotlb_free_list, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 	return node;
 }
 
 static void
-vhost_user_iotlb_pool_put(struct vhost_virtqueue *vq,
-	struct vhost_iotlb_entry *node)
+vhost_user_iotlb_pool_put(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	rte_spinlock_lock(&vq->iotlb_free_lock);
-	SLIST_INSERT_HEAD(&vq->iotlb_free_list, node, next_free);
-	rte_spinlock_unlock(&vq->iotlb_free_lock);
+	rte_spinlock_lock(&dev->iotlb_free_lock);
+	SLIST_INSERT_HEAD(&dev->iotlb_free_list, node, next_free);
+	rte_spinlock_unlock(&dev->iotlb_free_lock);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq);
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev);
 
 static void
-vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pending_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next, temp_node) {
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 bool
-vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-				uint8_t perm)
+vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	bool found = false;
 
-	rte_rwlock_read_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_pending_list, next) {
 		if ((node->iova == iova) && (node->perm == perm)) {
 			found = true;
 			break;
 		}
 	}
 
-	rte_rwlock_read_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_read_unlock(&dev->iotlb_pending_lock);
 
 	return found;
 }
 
 void
-vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint8_t perm)
+vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 
-	node = vhost_user_iotlb_pool_get(vq);
+	node = vhost_user_iotlb_pool_get(dev);
 	if (node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool for vq %"PRIu32" empty, clear entries for pending insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
-			vhost_user_iotlb_pending_remove_all(vq);
+			"IOTLB pool empty, clear entries for pending insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_pending_list))
+			vhost_user_iotlb_pending_remove_all(dev);
 		else
-			vhost_user_iotlb_cache_random_evict(dev, vq);
-		node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_cache_random_evict(dev);
+		node = vhost_user_iotlb_pool_get(dev);
 		if (node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, pending insertion failure\n",
-				vq->index);
+				"IOTLB pool still empty, pending insertion failure\n");
 			return;
 		}
 	}
@@ -161,22 +156,21 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *
 	node->iova = iova;
 	node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
+	TAILQ_INSERT_TAIL(&dev->iotlb_pending_list, node, next);
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 void
-vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t size, uint8_t perm)
+vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_lock(&dev->iotlb_pending_lock);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next,
 				temp_node) {
 		if (node->iova < iova)
 			continue;
@@ -184,81 +178,78 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
 			continue;
 		if ((node->perm & perm) != node->perm)
 			continue;
-		TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+	rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
 }
 
 static void
-vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		vhost_user_iotlb_set_dump(dev, node);
 
-		TAILQ_REMOVE(&vq->iotlb_list, node, next);
-		vhost_user_iotlb_pool_put(vq, node);
+		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_pool_put(dev, node);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 	int entry_idx;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	entry_idx = rte_rand() % vq->iotlb_cache_nr;
+	entry_idx = rte_rand() % dev->iotlb_cache_nr;
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
 			break;
 		}
 		prev_node = node;
 		entry_idx--;
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-				uint64_t iova, uint64_t uaddr,
+vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 				uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
-	new_node = vhost_user_iotlb_pool_get(vq);
+	new_node = vhost_user_iotlb_pool_get(dev);
 	if (new_node == NULL) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG,
-			"IOTLB pool vq %"PRIu32" empty, clear entries for cache insertion\n",
-			vq->index);
-		if (!TAILQ_EMPTY(&vq->iotlb_list))
-			vhost_user_iotlb_cache_random_evict(dev, vq);
+			"IOTLB pool empty, clear entries for cache insertion\n");
+		if (!TAILQ_EMPTY(&dev->iotlb_list))
+			vhost_user_iotlb_cache_random_evict(dev);
 		else
-			vhost_user_iotlb_pending_remove_all(vq);
-		new_node = vhost_user_iotlb_pool_get(vq);
+			vhost_user_iotlb_pending_remove_all(dev);
+		new_node = vhost_user_iotlb_pool_get(dev);
 		if (new_node == NULL) {
 			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"IOTLB pool vq %"PRIu32" still empty, cache insertion failed\n",
-				vq->index);
+				"IOTLB pool still empty, cache insertion failed\n");
 			return;
 		}
 	}
@@ -268,49 +259,47 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
 	new_node->size = size;
 	new_node->perm = perm;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/*
 		 * Entries must be invalidated before being updated.
 		 * So if iova already in list, assume identical.
 		 */
 		if (node->iova == new_node->iova) {
-			vhost_user_iotlb_pool_put(vq, new_node);
+			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
 			vhost_user_iotlb_set_dump(dev, new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
-			vq->iotlb_cache_nr++;
+			dev->iotlb_cache_nr++;
 			goto unlock;
 		}
 	}
 
 	vhost_user_iotlb_set_dump(dev, new_node);
 
-	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
-	vq->iotlb_cache_nr++;
+	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
+	dev->iotlb_cache_nr++;
 
 unlock:
-	vhost_user_iotlb_pending_remove(vq, iova, size, perm);
-
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_pending_remove(dev, iova, size, perm);
 
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 void
-vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size)
+vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size)
 {
 	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
 
 	if (unlikely(!size))
 		return;
 
-	rte_rwlock_write_lock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_lock_all(dev);
 
-	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
 		/* Sorted list */
 		if (unlikely(iova + size < node->iova))
 			break;
@@ -320,19 +309,19 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
 
 			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
 
-			TAILQ_REMOVE(&vq->iotlb_list, node, next);
-			vhost_user_iotlb_pool_put(vq, node);
-			vq->iotlb_cache_nr--;
-		} else
+			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_pool_put(dev, node);
+			dev->iotlb_cache_nr--;
+		} else {
 			prev_node = node;
+		}
 	}
 
-	rte_rwlock_write_unlock(&vq->iotlb_lock);
+	vhost_user_iotlb_wr_unlock_all(dev);
 }
 
 uint64_t
-vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
-						uint64_t *size, uint8_t perm)
+vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node;
 	uint64_t offset, vva = 0, mapped = 0;
@@ -340,7 +329,7 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 	if (unlikely(!*size))
 		goto out;
 
-	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+	TAILQ_FOREACH(node, &dev->iotlb_list, next) {
 		/* List sorted by iova */
 		if (unlikely(iova < node->iova))
 			break;
@@ -373,60 +362,57 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
 }
 
 void
-vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_flush_all(struct virtio_net *dev)
 {
-	vhost_user_iotlb_cache_remove_all(dev, vq);
-	vhost_user_iotlb_pending_remove_all(vq);
+	vhost_user_iotlb_cache_remove_all(dev);
+	vhost_user_iotlb_pending_remove_all(dev);
 }
 
 int
-vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_init(struct virtio_net *dev)
 {
 	unsigned int i;
 	int socket = 0;
 
-	if (vq->iotlb_pool) {
+	if (dev->iotlb_pool) {
 		/*
 		 * The cache has already been initialized,
 		 * just drop all cached and pending entries.
 		 */
-		vhost_user_iotlb_flush_all(dev, vq);
-		rte_free(vq->iotlb_pool);
+		vhost_user_iotlb_flush_all(dev);
+		rte_free(dev->iotlb_pool);
 	}
 
 #ifdef RTE_LIBRTE_VHOST_NUMA
-	if (get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR) != 0)
+	if (get_mempolicy(&socket, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR) != 0)
 		socket = 0;
 #endif
 
-	rte_spinlock_init(&vq->iotlb_free_lock);
-	rte_rwlock_init(&vq->iotlb_lock);
-	rte_rwlock_init(&vq->iotlb_pending_lock);
+	rte_spinlock_init(&dev->iotlb_free_lock);
+	rte_rwlock_init(&dev->iotlb_pending_lock);
 
-	SLIST_INIT(&vq->iotlb_free_list);
-	TAILQ_INIT(&vq->iotlb_list);
-	TAILQ_INIT(&vq->iotlb_pending_list);
+	SLIST_INIT(&dev->iotlb_free_list);
+	TAILQ_INIT(&dev->iotlb_list);
+	TAILQ_INIT(&dev->iotlb_pending_list);
 
 	if (dev->flags & VIRTIO_DEV_SUPPORT_IOMMU) {
-		vq->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
+		dev->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
 			sizeof(struct vhost_iotlb_entry), 0, socket);
-		if (!vq->iotlb_pool) {
-			VHOST_LOG_CONFIG(dev->ifname, ERR,
-				"Failed to create IOTLB cache pool for vq %"PRIu32"\n",
-				vq->index);
+		if (!dev->iotlb_pool) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB cache pool\n");
 			return -1;
 		}
 		for (i = 0; i < IOTLB_CACHE_SIZE; i++)
-			vhost_user_iotlb_pool_put(vq, &vq->iotlb_pool[i]);
+			vhost_user_iotlb_pool_put(dev, &dev->iotlb_pool[i]);
 	}
 
-	vq->iotlb_cache_nr = 0;
+	dev->iotlb_cache_nr = 0;
 
 	return 0;
 }
 
 void
-vhost_user_iotlb_destroy(struct vhost_virtqueue *vq)
+vhost_user_iotlb_destroy(struct virtio_net *dev)
 {
-	rte_free(vq->iotlb_pool);
+	rte_free(dev->iotlb_pool);
 }
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 73b5465b41..3490b9e6be 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -37,20 +37,37 @@ vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq)
 	rte_rwlock_write_unlock(&vq->iotlb_lock);
 }
 
-void vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t uaddr,
+static __rte_always_inline void
+vhost_user_iotlb_wr_lock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_lock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
+	__rte_no_thread_safety_analysis
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++)
+		rte_rwlock_write_unlock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
 					uint64_t size, uint8_t perm);
-void vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
-					uint64_t iova, uint64_t size);
-uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
+uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
-bool vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
-						uint8_t perm);
-void vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
-						uint64_t iova, uint8_t perm);
-void vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq, uint64_t iova,
+bool vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova,
 						uint64_t size, uint8_t perm);
-void vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq);
-int vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq);
-void vhost_user_iotlb_destroy(struct vhost_virtqueue *vq);
+void vhost_user_iotlb_flush_all(struct virtio_net *dev);
+int vhost_user_iotlb_init(struct virtio_net *dev);
+void vhost_user_iotlb_destroy(struct virtio_net *dev);
+
 #endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index ef37943817..d35075b96c 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -63,7 +63,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	tmp_size = *size;
 
-	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
 	if (tmp_size == *size) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
 			vq->stats.iotlb_hits++;
@@ -75,7 +75,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	iova += tmp_size;
 
-	if (!vhost_user_iotlb_pending_miss(vq, iova, perm)) {
+	if (!vhost_user_iotlb_pending_miss(dev, iova, perm)) {
 		/*
 		 * iotlb_lock is read-locked for a full burst,
 		 * but it only protects the iotlb cache.
@@ -85,12 +85,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		 */
 		vhost_user_iotlb_rd_unlock(vq);
 
-		vhost_user_iotlb_pending_insert(dev, vq, iova, perm);
+		vhost_user_iotlb_pending_insert(dev, iova, perm);
 		if (vhost_user_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
-			vhost_user_iotlb_pending_remove(vq, iova, 1, perm);
+			vhost_user_iotlb_pending_remove(dev, iova, 1, perm);
 		}
 
 		vhost_user_iotlb_rd_lock(vq);
@@ -397,7 +397,6 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	vhost_free_async_mem(vq);
 	rte_spinlock_unlock(&vq->access_lock);
 	rte_free(vq->batch_copy_elems);
-	vhost_user_iotlb_destroy(vq);
 	rte_free(vq->log_cache);
 	rte_free(vq);
 }
@@ -575,7 +574,7 @@ vring_invalidate(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq
 }
 
 static void
-init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
+init_vring_queue(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq,
 	uint32_t vring_idx)
 {
 	int numa_node = SOCKET_ID_ANY;
@@ -595,8 +594,6 @@ init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	}
 #endif
 	vq->numa_node = numa_node;
-
-	vhost_user_iotlb_init(dev, vq);
 }
 
 static void
@@ -631,6 +628,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 		dev->virtqueue[i] = vq;
 		init_vring_queue(dev, vq, i);
 		rte_spinlock_init(&vq->access_lock);
+		rte_rwlock_init(&vq->iotlb_lock);
 		vq->avail_wrap_counter = 1;
 		vq->used_wrap_counter = 1;
 		vq->signalled_used_valid = false;
@@ -795,6 +793,10 @@ vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags, bool stats
 		dev->flags |= VIRTIO_DEV_SUPPORT_IOMMU;
 	else
 		dev->flags &= ~VIRTIO_DEV_SUPPORT_IOMMU;
+
+	if (vhost_user_iotlb_init(dev) < 0)
+		VHOST_LOG_CONFIG("device", ERR, "failed to init IOTLB\n");
+
 }
 
 void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 40863f7bfd..67cc4a2fdb 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -302,13 +302,6 @@ struct vhost_virtqueue {
 	struct log_cache_entry	*log_cache;
 
 	rte_rwlock_t	iotlb_lock;
-	rte_rwlock_t	iotlb_pending_lock;
-	struct vhost_iotlb_entry *iotlb_pool;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
-	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
-	int				iotlb_cache_nr;
-	rte_spinlock_t	iotlb_free_lock;
-	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
 
 	/* Used to notify the guest (trigger interrupt) */
 	int			callfd;
@@ -483,6 +476,15 @@ struct virtio_net {
 	int			extbuf;
 	int			linearbuf;
 	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+
+	rte_rwlock_t	iotlb_pending_lock;
+	struct vhost_iotlb_entry *iotlb_pool;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
+	int				iotlb_cache_nr;
+	rte_spinlock_t	iotlb_free_lock;
+	SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
+
 	struct inflight_mem_info *inflight_info;
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index d60e39b6bc..81ebef0137 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -7,7 +7,7 @@
  * The vhost-user protocol connection is an external interface, so it must be
  * robust against invalid inputs.
  *
- * This is important because the vhost-user frontend is only one step removed
+* This is important because the vhost-user frontend is only one step removed
  * from the guest.  Malicious guests that have escaped will then launch further
  * attacks from the vhost-user frontend.
  *
@@ -237,6 +237,8 @@ vhost_backend_cleanup(struct virtio_net *dev)
 	}
 
 	dev->postcopy_listening = 0;
+
+	vhost_user_iotlb_destroy(dev);
 }
 
 static void
@@ -539,7 +541,6 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 	if (vq != dev->virtqueue[vq->index]) {
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "reallocated virtqueue on node %d\n", node);
 		dev->virtqueue[vq->index] = vq;
-		vhost_user_iotlb_init(dev, vq);
 	}
 
 	if (vq_is_packed(dev)) {
@@ -664,6 +665,8 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
 		return;
 	}
 	dev->guest_pages = gp;
+
+	vhost_user_iotlb_init(dev);
 }
 #else
 static void
@@ -1360,8 +1363,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
 
 		/* Flush IOTLB cache as previous HVAs are now invalid */
 		if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
-			for (i = 0; i < dev->nr_vring; i++)
-				vhost_user_iotlb_flush_all(dev, dev->virtqueue[i]);
+			vhost_user_iotlb_flush_all(dev);
 
 		free_mem_region(dev);
 		rte_free(dev->mem);
@@ -2194,7 +2196,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 	ctx->msg.size = sizeof(ctx->msg.payload.state);
 	ctx->fd_num = 0;
 
-	vhost_user_iotlb_flush_all(dev, vq);
+	vhost_user_iotlb_flush_all(dev);
 
 	vring_invalidate(dev, vq);
 
@@ -2639,15 +2641,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_insert(dev, vq, imsg->iova, vva,
-					len, imsg->perm);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_spinlock_lock(&vq->access_lock);
 				translate_ring_addresses(&dev, &vq);
@@ -2657,15 +2658,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_user_iotlb_cache_remove(dev, imsg->iova, imsg->size);
+
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			if (!vq)
 				continue;
 
-			vhost_user_iotlb_cache_remove(dev, vq, imsg->iova,
-					imsg->size);
-
 			if (is_vring_iotlb(dev, vq, imsg)) {
 				rte_spinlock_lock(&vq->access_lock);
 				vring_invalidate(dev, vq);
@@ -2674,8 +2674,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		}
 		break;
 	default:
-		VHOST_LOG_CONFIG(dev->ifname, ERR,
-			"invalid IOTLB message type (%d)\n",
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "invalid IOTLB message type (%d)\n",
 			imsg->type);
 		return RTE_VHOST_MSG_RESULT_ERR;
 	}
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 08/27] vhost: add offset field to IOTLB entries
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (6 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 07/27] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-25  6:20   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 09/27] vhost: add page size info to IOTLB entry Maxime Coquelin
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch is a preliminary work to prepare for VDUSE
support, for which we need to keep track of the mmaped base
address and offset in order to be able to unmap it later
when IOTLB entry is invalidated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c      | 30 ++++++++++++++++++------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost_user.c |  2 +-
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index a91115cf1c..51f118bc48 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -17,6 +17,7 @@ struct vhost_iotlb_entry {
 
 	uint64_t iova;
 	uint64_t uaddr;
+	uint64_t uoffset;
 	uint64_t size;
 	uint8_t perm;
 };
@@ -27,15 +28,18 @@ static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
 		uint64_t align)
 {
-	uint64_t a_end, b_start;
+	uint64_t a_start, a_end, b_start;
 
 	if (a == NULL || b == NULL)
 		return false;
 
+	a_start = a->uaddr + a->uoffset;
+	b_start = b->uaddr + b->uoffset;
+
 	/* Assumes entry a lower than entry b */
-	RTE_ASSERT(a->uaddr < b->uaddr);
-	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+	RTE_ASSERT(a_start < b_start);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
+	b_start = RTE_ALIGN_FLOOR(b_start, align);
 
 	return a_end > b_start;
 }
@@ -43,11 +47,12 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 static void
 vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
 {
-	uint64_t align;
+	uint64_t align, start;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	start = node->uaddr + node->uoffset;
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
-	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, false, align);
 }
 
 static void
@@ -56,10 +61,10 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
 {
 	uint64_t align, start, end;
 
-	start = node->uaddr;
-	end = node->uaddr + node->size;
+	start = node->uaddr + node->uoffset;
+	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
 
 	/* Skip first page if shared with previous entry. */
 	if (vhost_user_iotlb_share_page(prev, node, align))
@@ -234,7 +239,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -256,6 +261,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 
 	new_node->iova = iova;
 	new_node->uaddr = uaddr;
+	new_node->uoffset = uoffset;
 	new_node->size = size;
 	new_node->perm = perm;
 
@@ -344,7 +350,7 @@ vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *siz
 
 		offset = iova - node->iova;
 		if (!vva)
-			vva = node->uaddr + offset;
+			vva = node->uaddr + node->uoffset + offset;
 
 		mapped += node->size - offset;
 		iova = node->iova + node->size;
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 3490b9e6be..bee36c5903 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t size, uint8_t perm);
+					uint64_t uoffset, uint64_t size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 81ebef0137..93673d3902 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -2641,7 +2641,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 09/27] vhost: add page size info to IOTLB entry
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (7 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 08/27] vhost: add offset field to IOTLB entries Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-04-25  6:20   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 10/27] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

VDUSE will close the file descriptor after having mapped
the shared memory, so it will not be possible to get the
page size afterwards.

This patch adds an new page_shift field to the IOTLB entry,
so that the information will be passed at IOTLB cache
insertion time. The information is stored as a bit shift
value so that IOTLB entry keeps fitting in a single
cacheline.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
 lib/vhost/iotlb.h      |  2 +-
 lib/vhost/vhost.h      |  1 -
 lib/vhost/vhost_user.c |  8 +++++---
 4 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 51f118bc48..188dfb8e38 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
 	uint64_t uaddr;
 	uint64_t uoffset;
 	uint64_t size;
+	uint8_t page_shift;
 	uint8_t perm;
 };
 
 #define IOTLB_CACHE_SIZE 2048
 
 static bool
-vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
-		uint64_t align)
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
 	uint64_t a_start, a_end, b_start;
 
@@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
 
 	/* Assumes entry a lower than entry b */
 	RTE_ASSERT(a_start < b_start);
-	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
-	b_start = RTE_ALIGN_FLOOR(b_start, align);
+	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
+	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
 
 	return a_end > b_start;
 }
 
 static void
-vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
 {
-	uint64_t align, start;
+	uint64_t start;
 
 	start = node->uaddr + node->uoffset;
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
-	mem_set_dump((void *)(uintptr_t)start, node->size, false, align);
+	mem_set_dump((void *)(uintptr_t)start, node->size, false, RTE_BIT64(node->page_shift));
 }
 
 static void
-vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
 		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
 {
-	uint64_t align, start, end;
+	uint64_t start, end;
 
 	start = node->uaddr + node->uoffset;
 	end = start + node->size;
 
-	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
 	/* Skip first page if shared with previous entry. */
-	if (vhost_user_iotlb_share_page(prev, node, align))
-		start = RTE_ALIGN_CEIL(start, align);
+	if (vhost_user_iotlb_share_page(prev, node))
+		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
 
 	/* Skip last page if shared with next entry. */
-	if (vhost_user_iotlb_share_page(node, next, align))
-		end = RTE_ALIGN_FLOOR(end, align);
+	if (vhost_user_iotlb_share_page(node, next))
+		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
 
 	if (end > start)
-		mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
+		mem_set_dump((void *)(uintptr_t)start, end - start, false,
+			RTE_BIT64(node->page_shift));
 }
 
 static struct vhost_iotlb_entry *
@@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 	vhost_user_iotlb_wr_lock_all(dev);
 
 	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
-		vhost_user_iotlb_set_dump(dev, node);
+		vhost_user_iotlb_set_dump(node);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
 		vhost_user_iotlb_pool_put(dev, node);
@@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 		if (!entry_idx) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
@@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 
 void
 vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-				uint64_t uoffset, uint64_t size, uint8_t perm)
+				uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm)
 {
 	struct vhost_iotlb_entry *node, *new_node;
 
@@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 	new_node->uaddr = uaddr;
 	new_node->uoffset = uoffset;
 	new_node->size = size;
+	new_node->page_shift = __builtin_ctz(page_size);
 	new_node->perm = perm;
 
 	vhost_user_iotlb_wr_lock_all(dev);
@@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 			vhost_user_iotlb_pool_put(dev, new_node);
 			goto unlock;
 		} else if (node->iova > new_node->iova) {
-			vhost_user_iotlb_set_dump(dev, new_node);
+			vhost_user_iotlb_set_dump(new_node);
 
 			TAILQ_INSERT_BEFORE(node, new_node, next);
 			dev->iotlb_cache_nr++;
@@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
 		}
 	}
 
-	vhost_user_iotlb_set_dump(dev, new_node);
+	vhost_user_iotlb_set_dump(new_node);
 
 	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
 	dev->iotlb_cache_nr++;
@@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 		if (iova < node->iova + node->size) {
 			struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
 
-			vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
 			vhost_user_iotlb_pool_put(dev, node);
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index bee36c5903..81ca04df21 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
 }
 
 void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
-					uint64_t uoffset, uint64_t size, uint8_t perm);
+		uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm);
 void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
 uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
 					uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 67cc4a2fdb..4ace5ab081 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -1016,6 +1016,5 @@ mbuf_is_consumed(struct rte_mbuf *m)
 	return true;
 }
 
-uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
 void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 93673d3902..a989f2c46d 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	return log_gpa;
 }
 
-uint64_t
+static uint64_t
 hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
 {
 	struct rte_vhost_mem_region *r;
@@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
 	uint16_t i;
-	uint64_t vva, len;
+	uint64_t vva, len, pg_sz;
 
 	switch (imsg->type) {
 	case VHOST_IOTLB_UPDATE:
@@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
 		if (!vva)
 			return RTE_VHOST_MSG_RESULT_ERR;
 
-		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
+		pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
+
+		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, pg_sz, imsg->perm);
 
 		for (i = 0; i < dev->nr_vring; i++) {
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 10/27] vhost: retry translating IOVA after IOTLB miss
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (8 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 09/27] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 11/27] vhost: introduce backend ops Maxime Coquelin
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

Vhost-user backend IOTLB misses and updates are
asynchronous, so IOVA address translation function
just fails after having sent an IOTLB miss update if needed
entry was not in the IOTLB cache.

This is not the case for VDUSE, for which the needed IOTLB
update is returned directly when sending an IOTLB miss.

This patch retry again finding the needed entry in the
IOTLB cache after having sent an IOTLB miss.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index d35075b96c..4f16307e4d 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -96,6 +96,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_lock(vq);
 	}
 
+	tmp_size = *size;
+	/* Retry in case of VDUSE, as it is synchronous */
+	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
+	if (tmp_size == *size)
+		return vva;
+
 	return 0;
 }
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 11/27] vhost: introduce backend ops
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (9 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 10/27] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 12/27] vhost: add IOTLB cache entry removal callback Maxime Coquelin
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch introduces backend ops struct, that will enable
calling backend specifics callbacks (Vhost-user, VDUSE), in
shared code.

This is an empty shell for now, it will be filled in later
patches.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/socket.c     |  2 +-
 lib/vhost/vhost.c      |  8 +++++++-
 lib/vhost/vhost.h      | 10 +++++++++-
 lib/vhost/vhost_user.c |  8 ++++++++
 lib/vhost/vhost_user.h |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 669c322e12..ba54263824 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -221,7 +221,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		return;
 	}
 
-	vid = vhost_new_device();
+	vid = vhost_user_new_device();
 	if (vid == -1) {
 		goto err;
 	}
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 4f16307e4d..41f212315e 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -676,11 +676,16 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(struct vhost_backend_ops *ops)
 {
 	struct virtio_net *dev;
 	int i;
 
+	if (ops == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing backend ops.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
@@ -708,6 +713,7 @@ vhost_new_device(void)
 	dev->backend_req_fd = -1;
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->backend_req_lock);
+	dev->backend_ops = ops;
 
 	return i;
 }
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 4ace5ab081..cc5c707205 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,6 +89,12 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+/**
+ * Structure that contains backend-specific ops.
+ */
+struct vhost_backend_ops {
+};
+
 /**
  * Structure contains buffer address, length and descriptor index
  * from vring to do scatter RX.
@@ -513,6 +519,8 @@ struct virtio_net {
 	void			*extern_data;
 	/* pre and post vhost user message handlers for the device */
 	struct rte_vhost_user_extern_ops extern_ops;
+
+	struct vhost_backend_ops *backend_ops;
 } __rte_cache_aligned;
 
 static inline void
@@ -812,7 +820,7 @@ get_device(int vid)
 	return dev;
 }
 
-int vhost_new_device(void);
+int vhost_new_device(struct vhost_backend_ops *ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index a989f2c46d..2d5dec5bc1 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3464,3 +3464,11 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 
 	return ret;
 }
+
+static struct vhost_backend_ops vhost_user_backend_ops;
+
+int
+vhost_user_new_device(void)
+{
+	return vhost_new_device(&vhost_user_backend_ops);
+}
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index a0987a58f9..61456049c8 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -185,5 +185,6 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
 		int *fd_num);
 int send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int vhost_user_new_device(void);
 
 #endif
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 12/27] vhost: add IOTLB cache entry removal callback
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (10 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 11/27] vhost: introduce backend ops Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 13/27] vhost: add helper for IOTLB misses Maxime Coquelin
                   ` (17 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

VDUSE will need to munmap() the IOTLB entry on removal
from the cache, as it performs mmap() before insertion.

This patch introduces a callback that VDUSE layer will
implement to achieve this.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/iotlb.c | 12 ++++++++++++
 lib/vhost/vhost.h |  4 ++++
 2 files changed, 16 insertions(+)

diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 188dfb8e38..86b0be62b4 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct vhost_iotlb_entry *entry)
+{
+	if (dev->backend_ops->iotlb_remove_notify == NULL)
+		return;
+
+	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset, entry->size);
+}
+
 static bool
 vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
 {
@@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
 		vhost_user_iotlb_set_dump(node);
 
 		TAILQ_REMOVE(&dev->iotlb_list, node, next);
+		vhost_user_iotlb_remove_notify(dev, node);
 		vhost_user_iotlb_pool_put(dev, node);
 	}
 
@@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 			break;
@@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
 			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
 
 			TAILQ_REMOVE(&dev->iotlb_list, node, next);
+			vhost_user_iotlb_remove_notify(dev, node);
 			vhost_user_iotlb_pool_put(dev, node);
 			dev->iotlb_cache_nr--;
 		} else {
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index cc5c707205..2ad26f6951 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,10 +89,14 @@
 	for (iter = val; iter < num; iter++)
 #endif
 
+struct virtio_net;
+typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
+	vhost_iotlb_remove_notify iotlb_remove_notify;
 };
 
 /**
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 13/27] vhost: add helper for IOTLB misses
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (11 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 12/27] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-03-31 15:42 ` [RFC 14/27] vhost: add helper for interrupt injection Maxime Coquelin
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds a helper for sending IOTLB misses as VDUSE
will use an ioctl while Vhost-user use a dedicated
Vhost-user backend request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c      | 13 ++++++++++++-
 lib/vhost/vhost.h      |  3 +++
 lib/vhost/vhost_user.c |  6 ++++--
 lib/vhost/vhost_user.h |  1 -
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 41f212315e..790eb06b28 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -52,6 +52,12 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
 
 #define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
 
+static int
+vhost_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+	return dev->backend_ops->iotlb_miss(dev, iova, perm);
+}
+
 uint64_t
 __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		    uint64_t iova, uint64_t *size, uint8_t perm)
@@ -86,7 +92,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_user_iotlb_rd_unlock(vq);
 
 		vhost_user_iotlb_pending_insert(dev, iova, perm);
-		if (vhost_user_iotlb_miss(dev, iova, perm)) {
+		if (vhost_iotlb_miss(dev, iova, perm)) {
 			VHOST_LOG_DATA(dev->ifname, ERR,
 				"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
 				iova);
@@ -686,6 +692,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->iotlb_miss == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IOTLB miss backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 2ad26f6951..ee7640e901 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -92,11 +92,14 @@
 struct virtio_net;
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
+typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
+	vhost_iotlb_miss_cb iotlb_miss;
 };
 
 /**
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 2d5dec5bc1..6a9f32972a 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3305,7 +3305,7 @@ vhost_user_msg_handler(int vid, int fd)
 	return ret;
 }
 
-int
+static int
 vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
 	int ret;
@@ -3465,7 +3465,9 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
-static struct vhost_backend_ops vhost_user_backend_ops;
+static struct vhost_backend_ops vhost_user_backend_ops = {
+	.iotlb_miss = vhost_user_iotlb_miss,
+};
 
 int
 vhost_user_new_device(void)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index 61456049c8..1ffeca92f3 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -179,7 +179,6 @@ struct __rte_packed vhu_msg_context {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
-int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 /* socket.c */
 int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 14/27] vhost: add helper for interrupt injection
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (12 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 13/27] vhost: add helper for IOTLB misses Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 15/27] vhost: add API to set max queue pairs Maxime Coquelin
                   ` (15 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

Vhost-user uses eventfd to inject IRQs, but VDUSE uses
an ioctl.

This patch prepares vhost_vring_call_split() and
vhost_vring_call_packed() to support VDUSE by introducing
a new helper.

It also adds a new counter to for guest notification
failures, which could happen in case of uninitialized call
file descriptor for example.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c      |  6 +++++
 lib/vhost/vhost.h      | 54 +++++++++++++++++++++++-------------------
 lib/vhost/vhost_user.c | 10 ++++++++
 3 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 790eb06b28..c07028f2b3 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -44,6 +44,7 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
 	{"size_1024_1518_packets", offsetof(struct vhost_virtqueue, stats.size_bins[6])},
 	{"size_1519_max_packets",  offsetof(struct vhost_virtqueue, stats.size_bins[7])},
 	{"guest_notifications",    offsetof(struct vhost_virtqueue, stats.guest_notifications)},
+	{"guest_notifications_error",    offsetof(struct vhost_virtqueue, stats.guest_notifications_error)},
 	{"iotlb_hits",             offsetof(struct vhost_virtqueue, stats.iotlb_hits)},
 	{"iotlb_misses",           offsetof(struct vhost_virtqueue, stats.iotlb_misses)},
 	{"inflight_submitted",     offsetof(struct vhost_virtqueue, stats.inflight_submitted)},
@@ -697,6 +698,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
 		return -1;
 	}
 
+	if (ops->inject_irq == NULL) {
+		VHOST_LOG_CONFIG("device", ERR, "missing IRQ injection backend op.\n");
+		return -1;
+	}
+
 	pthread_mutex_lock(&vhost_dev_lock);
 	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
 		if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ee7640e901..8f0875b4e2 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -90,16 +90,20 @@
 #endif
 
 struct virtio_net;
+struct vhost_virtqueue;
+
 typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
 
 typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
+typedef int (*vhost_vring_inject_irq_cb)(struct virtio_net *dev, struct vhost_virtqueue *vq);
 /**
  * Structure that contains backend-specific ops.
  */
 struct vhost_backend_ops {
 	vhost_iotlb_remove_notify iotlb_remove_notify;
 	vhost_iotlb_miss_cb iotlb_miss;
+	vhost_vring_inject_irq_cb inject_irq;
 };
 
 /**
@@ -149,6 +153,7 @@ struct virtqueue_stats {
 	/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
 	uint64_t size_bins[8];
 	uint64_t guest_notifications;
+	uint64_t guest_notifications_error;
 	uint64_t iotlb_hits;
 	uint64_t iotlb_misses;
 	uint64_t inflight_submitted;
@@ -900,6 +905,24 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
 	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
 }
 
+static __rte_always_inline void
+vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	int ret;
+
+	ret = dev->backend_ops->inject_irq(dev, vq);
+	if (ret) {
+		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+			vq->stats.guest_notifications_error++;
+		return;
+	}
+
+	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+		vq->stats.guest_notifications++;
+	if (dev->notify_ops->guest_notified)
+		dev->notify_ops->guest_notified(dev->vid);
+}
+
 static __rte_always_inline void
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -919,25 +942,13 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			"%s: used_event_idx=%d, old=%d, new=%d\n",
 			__func__, vhost_used_event(vq), old, new);
 
-		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
-					unlikely(!signalled_used_valid)) &&
-				vq->callfd >= 0) {
-			eventfd_write(vq->callfd, (eventfd_t) 1);
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				vq->stats.guest_notifications++;
-			if (dev->notify_ops->guest_notified)
-				dev->notify_ops->guest_notified(dev->vid);
-		}
+		if (vhost_need_event(vhost_used_event(vq), new, old) ||
+					unlikely(!signalled_used_valid))
+			vhost_vring_inject_irq(dev, vq);
 	} else {
 		/* Kick the guest if necessary. */
-		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-				&& (vq->callfd >= 0)) {
-			eventfd_write(vq->callfd, (eventfd_t)1);
-			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-				vq->stats.guest_notifications++;
-			if (dev->notify_ops->guest_notified)
-				dev->notify_ops->guest_notified(dev->vid);
-		}
+		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
+			vhost_vring_inject_irq(dev, vq);
 	}
 }
 
@@ -988,13 +999,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (vhost_need_event(off, new, old))
 		kick = true;
 kick:
-	if (kick && vq->callfd >= 0) {
-		eventfd_write(vq->callfd, (eventfd_t)1);
-		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			vq->stats.guest_notifications++;
-		if (dev->notify_ops->guest_notified)
-			dev->notify_ops->guest_notified(dev->vid);
-	}
+	if (kick)
+		vhost_vring_inject_irq(dev, vq);
 }
 
 static __rte_always_inline void
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 6a9f32972a..2e4a9fdea4 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3465,8 +3465,18 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 	return ret;
 }
 
+static int
+vhost_user_inject_irq(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq)
+{
+	if (vq->callfd < 0)
+		return -1;
+
+	return eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
 static struct vhost_backend_ops vhost_user_backend_ops = {
 	.iotlb_miss = vhost_user_iotlb_miss,
+	.inject_irq = vhost_user_inject_irq,
 };
 
 int
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 15/27] vhost: add API to set max queue pairs
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (13 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 14/27] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 16/27] net/vhost: use " Maxime Coquelin
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch introduces a new rte_vhost_driver_set_max_queues
API as preliminary work for multiqueue support with VDUSE.

Indeed, with VDUSE we need to pre-allocate the vrings at
device creation time, so we need such API not to allocate
the 128 queue pairs supported by the Vhost library.

Calling the API is optional, 128 queue pairs remaining the
default.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst |  4 ++++
 lib/vhost/rte_vhost.h               | 17 ++++++++++++++
 lib/vhost/socket.c                  | 36 +++++++++++++++++++++++++++--
 lib/vhost/version.map               |  3 +++
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index e8bb8c9b7b..cd4b109139 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -334,6 +334,10 @@ The following is an overview of some key Vhost API functions:
   Clean DMA vChannel finished to use. After this function is called,
   the specified DMA vChannel should no longer be used by the Vhost library.
 
+* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
+
+  Set the maximum number of queue pairs supported by the device.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 58a5d4be92..44cbfcb469 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -588,6 +588,23 @@ rte_vhost_driver_get_protocol_features(const char *path,
 int
 rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set the maximum number of queue pairs supported by the device.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param max_queue_pairs
+ *  The maximum number of queue pairs
+ * @return
+ *  0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs);
+
 /**
  * Get the feature bits after negotiation
  *
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index ba54263824..e95c3ffeac 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -56,6 +56,8 @@ struct vhost_user_socket {
 
 	uint64_t protocol_features;
 
+	uint32_t max_queue_pairs;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	struct rte_vhost_device_ops const *notify_ops;
@@ -821,7 +823,7 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 
 	vdpa_dev = vsocket->vdpa_dev;
 	if (!vdpa_dev) {
-		*queue_num = VHOST_MAX_QUEUE_PAIRS;
+		*queue_num = vsocket->max_queue_pairs;
 		goto unlock_exit;
 	}
 
@@ -831,7 +833,36 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
 		goto unlock_exit;
 	}
 
-	*queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS, vdpa_queue_num);
+	*queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
+
+unlock_exit:
+	pthread_mutex_unlock(&vhost_user.mutex);
+	return ret;
+}
+
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)
+{
+	struct vhost_user_socket *vsocket;
+	int ret = 0;
+
+	VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n", max_queue_pairs);
+
+	if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
+		VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u queue pairs\n",
+				VHOST_MAX_QUEUE_PAIRS);
+		return -1;
+	}
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	if (!vsocket) {
+		VHOST_LOG_CONFIG(path, ERR, "socket file is not registered yet.\n");
+		ret = -1;
+		goto unlock_exit;
+	}
+
+	vsocket->max_queue_pairs = max_queue_pairs;
 
 unlock_exit:
 	pthread_mutex_unlock(&vhost_user.mutex);
@@ -890,6 +921,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		goto out_free;
 	}
 	vsocket->vdpa_dev = NULL;
+	vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
 	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
 	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
 	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index d322a4a888..dffb126aa8 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -98,6 +98,9 @@ EXPERIMENTAL {
 	# added in 22.11
 	rte_vhost_async_dma_unconfigure;
 	rte_vhost_vring_call_nonblock;
+
+	# added in 23.07
+	rte_vhost_driver_set_max_queue_num;
 };
 
 INTERNAL {
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 16/27] net/vhost: use API to set max queue pairs
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (14 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 15/27] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-05  5:07   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 17/27] vhost: add control virtqueue support Maxime Coquelin
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

In order to support multiqueue with VDUSE, we need to
be able to limit the maximum number of queue pairs, to
avoid unnecessary memory consumption since the maximum
number of queue pairs need to be allocated at device
creation time, as opposed to Vhost-user which allocate
only when the frontend initialize them.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 62ef955ebc..8d37ec9775 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1013,6 +1013,9 @@ vhost_driver_setup(struct rte_eth_dev *eth_dev)
 			goto drv_unreg;
 	}
 
+	if (rte_vhost_driver_set_max_queue_num(internal->iface_name, internal->max_queues))
+		goto drv_unreg;
+
 	if (rte_vhost_driver_callback_register(internal->iface_name,
 					       &vhost_ops) < 0) {
 		VHOST_LOG(ERR, "Can't register callbacks\n");
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 17/27] vhost: add control virtqueue support
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (15 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 16/27] net/vhost: use " Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:29   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 18/27] vhost: add VDUSE device creation and destruction Maxime Coquelin
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

In order to support multi-queue with VDUSE, having
control queue support in required.

This patch adds control queue implementation, it will be
used later when adding VDUSE support. Only split ring
layout is supported for now, packed ring support will be
added later.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/meson.build       |   1 +
 lib/vhost/vhost.h           |   2 +
 lib/vhost/virtio_net_ctrl.c | 282 ++++++++++++++++++++++++++++++++++++
 lib/vhost/virtio_net_ctrl.h |  10 ++
 4 files changed, 295 insertions(+)
 create mode 100644 lib/vhost/virtio_net_ctrl.c
 create mode 100644 lib/vhost/virtio_net_ctrl.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index 197a51d936..cdcd403df3 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -28,6 +28,7 @@ sources = files(
         'vhost_crypto.c',
         'vhost_user.c',
         'virtio_net.c',
+        'virtio_net_ctrl.c',
 )
 headers = files(
         'rte_vdpa.h',
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8f0875b4e2..76663aed24 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -525,6 +525,8 @@ struct virtio_net {
 	int			postcopy_ufd;
 	int			postcopy_listening;
 
+	struct vhost_virtqueue	*cvq;
+
 	struct rte_vdpa_device *vdpa_dev;
 
 	/* context data for the external message handlers */
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
new file mode 100644
index 0000000000..16ea63b42f
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#undef RTE_ANNOTATE_LOCKS
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "vhost.h"
+#include "virtio_net_ctrl.h"
+
+struct virtio_net_ctrl {
+	uint8_t class;
+	uint8_t command;
+	uint8_t command_data[];
+};
+
+struct virtio_net_ctrl_elem {
+	struct virtio_net_ctrl *ctrl_req;
+	uint16_t head_idx;
+	uint16_t n_descs;
+	uint8_t *desc_ack;
+};
+
+static int
+virtio_net_ctrl_pop(struct virtio_net *dev, struct virtio_net_ctrl_elem *ctrl_elem)
+{
+	struct vhost_virtqueue *cvq = dev->cvq;
+	uint16_t avail_idx, desc_idx, n_descs = 0;
+	uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
+	uint8_t *ctrl_req;
+	struct vring_desc *descs;
+
+	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	if (avail_idx == cvq->last_avail_idx) {
+		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
+		return 0;
+	}
+
+	desc_idx = cvq->avail->ring[cvq->last_avail_idx];
+	if (desc_idx >= cvq->size) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index, dropping\n");
+		goto err;
+	}
+
+	ctrl_elem->head_idx = desc_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (1) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		n_descs++;
+
+		if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			if (desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Invalid ack size for ctrl req, dropping\n");
+				goto err;
+			}
+
+			ctrl_elem->desc_ack = (uint8_t *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_WO);
+			if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t)) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Failed to map ctrl ack descriptor\n");
+				goto err;
+			}
+		} else {
+			if (ctrl_elem->desc_ack) {
+				VHOST_LOG_CONFIG(dev->ifname, ERR,
+						"Unexpected ctrl chain layout\n");
+				goto err;
+			}
+
+			data_len += desc_len;
+		}
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	desc_idx = ctrl_elem->head_idx;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
+		ctrl_elem->n_descs = 1;
+	else
+		ctrl_elem->n_descs = n_descs;
+
+	if (!ctrl_elem->desc_ack) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack descriptor\n");
+		goto err;
+	}
+
+	if (data_len < sizeof(ctrl_elem->ctrl_req->class) + sizeof(ctrl_elem->ctrl_req->command)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header size\n");
+		goto err;
+	}
+
+	ctrl_elem->ctrl_req = malloc(data_len);
+	if (!ctrl_elem->ctrl_req) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl request\n");
+		goto err;
+	}
+
+	ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
+
+	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+		desc_len = cvq->desc[desc_idx].len;
+		desc_iova = cvq->desc[desc_idx].addr;
+
+		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+					desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!descs || desc_len != cvq->desc[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+			goto err;
+		}
+
+		desc_idx = 0;
+	} else {
+		descs = cvq->desc;
+	}
+
+	while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
+		desc_len = descs[desc_idx].len;
+		desc_iova = descs[desc_idx].addr;
+
+		desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len, VHOST_ACCESS_RO);
+		if (!desc_addr || desc_len < descs[desc_idx].len) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl descriptor\n");
+			goto free_err;
+		}
+
+		memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
+		ctrl_req += desc_len;
+
+		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+			break;
+
+		desc_idx = descs[desc_idx].next;
+	}
+
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return 1;
+
+free_err:
+	free(ctrl_elem->ctrl_req);
+err:
+	cvq->last_avail_idx++;
+	if (cvq->last_avail_idx >= cvq->size)
+		cvq->last_avail_idx -= cvq->size;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+	return -1;
+}
+
+static uint8_t
+virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl *ctrl_req)
+{
+	uint8_t ret = VIRTIO_NET_ERR;
+
+	if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
+			ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+		uint16_t queue_pairs;
+		uint32_t i;
+
+		queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue pairs\n", queue_pairs);
+		ret = VIRTIO_NET_OK;
+
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+			bool enable;
+
+			if (vq == dev->cvq)
+				continue;
+
+			if (i < queue_pairs * 2)
+				enable = true;
+			else
+				enable = false;
+
+			vq->enabled = enable;
+			if (dev->notify_ops->vring_state_changed)
+				dev->notify_ops->vring_state_changed(dev->vid, i, enable);
+		}
+	}
+
+	return ret;
+}
+
+static int
+virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem *ctrl_elem)
+{
+	struct vhost_virtqueue *cvq = dev->cvq;
+	struct vring_used_elem *used_elem;
+
+	used_elem = &cvq->used->ring[cvq->last_used_idx];
+	used_elem->id = ctrl_elem->head_idx;
+	used_elem->len = ctrl_elem->n_descs;
+
+	cvq->last_used_idx++;
+	if (cvq->last_used_idx >= cvq->size)
+		cvq->last_used_idx -= cvq->size;
+
+	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+
+	free(ctrl_elem->ctrl_req);
+
+	return 0;
+}
+
+int
+virtio_net_ctrl_handle(struct virtio_net *dev)
+{
+	int ret = 0;
+
+	if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported yet\n");
+		return -1;
+	}
+
+	if (!dev->cvq) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
+		return -1;
+	}
+
+	rte_spinlock_lock(&dev->cvq->access_lock);
+
+	while (1) {
+		struct virtio_net_ctrl_elem ctrl_elem;
+
+		memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
+
+		ret = virtio_net_ctrl_pop(dev, &ctrl_elem);
+		if (ret <= 0)
+			break;
+
+		*ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev, ctrl_elem.ctrl_req);
+
+		ret = virtio_net_ctrl_push(dev, &ctrl_elem);
+		if (ret < 0)
+			break;
+	}
+
+	rte_spinlock_unlock(&dev->cvq->access_lock);
+
+	return ret;
+}
diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
new file mode 100644
index 0000000000..9a90f4b9da
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VIRTIO_NET_CTRL_H
+#define _VIRTIO_NET_CTRL_H
+
+int virtio_net_ctrl_handle(struct virtio_net *dev);
+
+#endif
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 18/27] vhost: add VDUSE device creation and destruction
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (16 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 17/27] vhost: add control virtqueue support Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:31   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 19/27] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds initial support for VDUSE, which includes
the device creation and destruction.

It does not include the virtqueues configuration, so this is
not functionnal at this point.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/meson.build |   4 +
 lib/vhost/socket.c    |  34 +++++---
 lib/vhost/vduse.c     | 184 ++++++++++++++++++++++++++++++++++++++++++
 lib/vhost/vduse.h     |  33 ++++++++
 lib/vhost/vhost.h     |   2 +
 5 files changed, 245 insertions(+), 12 deletions(-)
 create mode 100644 lib/vhost/vduse.c
 create mode 100644 lib/vhost/vduse.h

diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index cdcd403df3..a57a15937f 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -30,6 +30,10 @@ sources = files(
         'virtio_net.c',
         'virtio_net_ctrl.c',
 )
+if cc.has_header('linux/vduse.h')
+    sources += files('vduse.c')
+    cflags += '-DVHOST_HAS_VDUSE'
+endif
 headers = files(
         'rte_vdpa.h',
         'rte_vhost.h',
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index e95c3ffeac..a8a1c4cd2b 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 
 #include "fd_man.h"
+#include "vduse.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
@@ -35,6 +36,7 @@ struct vhost_user_socket {
 	int socket_fd;
 	struct sockaddr_un un;
 	bool is_server;
+	bool is_vduse;
 	bool reconnect;
 	bool iommu_support;
 	bool use_builtin_virtio_net;
@@ -992,18 +994,21 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 #endif
 	}
 
-	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
-		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-		if (vsocket->reconnect && reconn_tid == 0) {
-			if (vhost_user_reconnect_init() != 0)
-				goto out_mutex;
-		}
+	if (!strncmp("/dev/vduse/", path, strlen("/dev/vduse/"))) {
+		vsocket->is_vduse = true;
 	} else {
-		vsocket->is_server = true;
-	}
-	ret = create_unix_socket(vsocket);
-	if (ret < 0) {
-		goto out_mutex;
+		if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
+			vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
+			if (vsocket->reconnect && reconn_tid == 0) {
+				if (vhost_user_reconnect_init() != 0)
+					goto out_mutex;
+			}
+		} else {
+			vsocket->is_server = true;
+		}
+		ret = create_unix_socket(vsocket);
+		if (ret < 0)
+			goto out_mutex;
 	}
 
 	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -1068,7 +1073,9 @@ rte_vhost_driver_unregister(const char *path)
 		if (strcmp(vsocket->path, path))
 			continue;
 
-		if (vsocket->is_server) {
+		if (vsocket->is_vduse) {
+			vduse_device_destroy(path);
+		} else if (vsocket->is_server) {
 			/*
 			 * If r/wcb is executing, release vhost_user's
 			 * mutex lock, and try again since the r/wcb
@@ -1171,6 +1178,9 @@ rte_vhost_driver_start(const char *path)
 	if (!vsocket)
 		return -1;
 
+	if (vsocket->is_vduse)
+		return vduse_device_create(path);
+
 	if (fdset_tid == 0) {
 		/**
 		 * create a pipe which will be waited by poll and notified to
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
new file mode 100644
index 0000000000..336761c97a
--- /dev/null
+++ b/lib/vhost/vduse.c
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+
+#include <linux/vduse.h>
+#include <linux/virtio_net.h>
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <rte_common.h>
+
+#include "vduse.h"
+#include "vhost.h"
+
+#define VHOST_VDUSE_API_VERSION 0
+#define VDUSE_CTRL_PATH "/dev/vduse/control"
+
+#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
+				(1ULL << VIRTIO_F_ANY_LAYOUT) | \
+				(1ULL << VIRTIO_F_VERSION_1)   | \
+				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
+				(1ULL << VIRTIO_RING_F_EVENT_IDX) | \
+				(1ULL << VIRTIO_F_IN_ORDER) | \
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+static struct vhost_backend_ops vduse_backend_ops = {
+};
+
+int
+vduse_device_create(const char *path)
+{
+	int control_fd, dev_fd, vid, ret;
+	uint32_t i;
+	struct virtio_net *dev;
+	uint64_t ver = VHOST_VDUSE_API_VERSION;
+	struct vduse_dev_config *dev_config = NULL;
+	const char *name = path + strlen("/dev/vduse/");
+
+	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
+	if (control_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
+				VDUSE_CTRL_PATH, strerror(errno));
+		return -1;
+	}
+
+	if (ioctl(control_fd, VDUSE_SET_API_VERSION, &ver)) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to set API version: %" PRIu64 ": %s\n",
+				ver, strerror(errno));
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	if (!dev_config) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
+		ret = -1;
+		goto out_ctrl_close;
+	}
+
+	memset(dev_config, 0, sizeof(struct vduse_dev_config));
+
+	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
+	dev_config->device_id = VIRTIO_ID_NET;
+	dev_config->vendor_id = 0;
+	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
+	dev_config->vq_num = 2;
+	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
+	dev_config->config_size = 0;
+
+	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create VDUSE device: %s\n",
+				strerror(errno));
+		goto out_free;
+	}
+
+	dev_fd = open(path, O_RDWR);
+	if (dev_fd < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to open device %s: %s\n",
+				path, strerror(errno));
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	vid = vhost_new_device(&vduse_backend_ops);
+	if (vid < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to create new Vhost device\n");
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	dev = get_device(vid);
+	if (!dev) {
+		ret = -1;
+		goto out_dev_close;
+	}
+
+	strncpy(dev->ifname, path, IF_NAME_SZ - 1);
+	dev->vduse_ctrl_fd = control_fd;
+	dev->vduse_dev_fd = dev_fd;
+	vhost_setup_virtio_net(dev->vid, true, true, true, true);
+
+	for (i = 0; i < 2; i++) {
+		struct vduse_vq_config vq_cfg = { 0 };
+
+		ret = alloc_vring_queue(dev, i);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to alloc vring %d metadata\n", i);
+			goto out_dev_destroy;
+		}
+
+		vq_cfg.index = i;
+		vq_cfg.max_size = 1024;
+
+		ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP, &vq_cfg);
+		if (ret) {
+			VHOST_LOG_CONFIG(name, ERR, "Failed to set-up VQ %d\n", i);
+			goto out_dev_destroy;
+		}
+	}
+
+	free(dev_config);
+
+	return 0;
+
+out_dev_destroy:
+	vhost_destroy_device(vid);
+out_dev_close:
+	if (dev_fd >= 0)
+		close(dev_fd);
+	ioctl(control_fd, VDUSE_DESTROY_DEV, name);
+out_free:
+	free(dev_config);
+out_ctrl_close:
+	close(control_fd);
+
+	return ret;
+}
+
+int
+vduse_device_destroy(const char *path)
+{
+	const char *name = path + strlen("/dev/vduse/");
+	struct virtio_net *dev;
+	int vid, ret;
+
+	for (vid = 0; vid < RTE_MAX_VHOST_DEVICE; vid++) {
+		dev = vhost_devices[vid];
+
+		if (dev == NULL)
+			continue;
+
+		if (!strcmp(path, dev->ifname))
+			break;
+	}
+
+	if (vid == RTE_MAX_VHOST_DEVICE)
+		return -1;
+
+	if (dev->vduse_dev_fd >= 0) {
+		close(dev->vduse_dev_fd);
+		dev->vduse_dev_fd = -1;
+	}
+
+	if (dev->vduse_ctrl_fd >= 0) {
+		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
+		if (ret)
+			VHOST_LOG_CONFIG(name, ERR, "Failed to destroy VDUSE device: %s\n",
+					strerror(errno));
+		close(dev->vduse_ctrl_fd);
+		dev->vduse_ctrl_fd = -1;
+	}
+
+	vhost_destroy_device(vid);
+
+	return 0;
+}
diff --git a/lib/vhost/vduse.h b/lib/vhost/vduse.h
new file mode 100644
index 0000000000..a15e5d4c16
--- /dev/null
+++ b/lib/vhost/vduse.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VDUSE_H
+#define _VDUSE_H
+
+#include "vhost.h"
+
+#ifdef VHOST_HAS_VDUSE
+
+int vduse_device_create(const char *path);
+int vduse_device_destroy(const char *path);
+
+#else
+
+static inline int
+vduse_device_create(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+static inline int
+vduse_device_destroy(const char *path)
+{
+	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+	return -1;
+}
+
+#endif /* VHOST_HAS_VDUSE */
+
+#endif /* _VDUSE_H */
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 76663aed24..c8f2a0d43a 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -524,6 +524,8 @@ struct virtio_net {
 
 	int			postcopy_ufd;
 	int			postcopy_listening;
+	int			vduse_ctrl_fd;
+	int			vduse_dev_fd;
 
 	struct vhost_virtqueue	*cvq;
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 19/27] vhost: add VDUSE callback for IOTLB miss
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (17 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 18/27] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:31   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB misses,
where it unmaps the pages from the invalidated IOTLB entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 336761c97a..f46823f589 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -13,9 +13,11 @@
 
 #include <sys/ioctl.h>
 #include <sys/mman.h>
+#include <sys/stat.h>
 
 #include <rte_common.h>
 
+#include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
 
@@ -30,7 +32,63 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
+{
+	struct vduse_iotlb_entry entry;
+	uint64_t size, page_size;
+	struct stat stat;
+	void *mmap_addr;
+	int fd, ret;
+
+	entry.start = iova;
+	entry.last = iova + 1;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_IOTLB_GET_FD, &entry);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get IOTLB entry for 0x%" PRIx64 "\n",
+				iova);
+		return -1;
+	}
+
+	fd = ret;
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "New IOTLB entry:\n");
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tIOVA: %" PRIx64 " - %" PRIx64 "\n",
+			(uint64_t)entry.start, (uint64_t)entry.last);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\toffset: %" PRIx64 "\n", (uint64_t)entry.offset);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tfd: %d\n", fd);
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tperm: %x\n", entry.perm);
+
+	size = entry.last - entry.start + 1;
+	mmap_addr = mmap(0, size + entry.offset, entry.perm, MAP_SHARED, fd, 0);
+	if (!mmap_addr) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to mmap IOTLB entry for 0x%" PRIx64 "\n", iova);
+		ret = -1;
+		goto close_fd;
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get page size.\n");
+		munmap(mmap_addr, entry.offset + size);
+		goto close_fd;
+	}
+	page_size = (uint64_t)stat.st_blksize;
+
+	vhost_user_iotlb_cache_insert(dev, entry.start, (uint64_t)(uintptr_t)mmap_addr,
+		entry.offset, size, page_size, entry.perm);
+
+	ret = 0;
+close_fd:
+	close(fd);
+
+	return ret;
+}
+
 static struct vhost_backend_ops vduse_backend_ops = {
+	.iotlb_miss = vduse_iotlb_miss,
 };
 
 int
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (18 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 19/27] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:32   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 21/27] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
                   ` (9 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for IOTLB misses,
where it unmaps the pages from the invalidated IOTLB entry

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index f46823f589..ff4c9e72f1 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -32,6 +32,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static void
+vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
+{
+	munmap((void *)(uintptr_t)addr, offset + size);
+}
+
 static int
 vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
 {
@@ -89,6 +95,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
+	.iotlb_remove_notify = vduse_iotlb_remove_notify,
 };
 
 int
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 21/27] vhost: add VDUSE callback for IRQ injection
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (19 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:33   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 22/27] vhost: add VDUSE events handler Maxime Coquelin
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch implements the VDUSE callback for kicking
virtqueues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index ff4c9e72f1..afa8a39498 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -32,6 +32,12 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+static int
+vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	return ioctl(dev->vduse_dev_fd, VDUSE_VQ_INJECT_IRQ, &vq->index);
+}
+
 static void
 vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
 {
@@ -96,6 +102,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
 static struct vhost_backend_ops vduse_backend_ops = {
 	.iotlb_miss = vduse_iotlb_miss,
 	.iotlb_remove_notify = vduse_iotlb_remove_notify,
+	.inject_irq = vduse_inject_irq,
 };
 
 int
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 22/27] vhost: add VDUSE events handler
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (20 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 21/27] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:34   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 23/27] vhost: add support for virtqueue state get event Maxime Coquelin
                   ` (7 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch makes use of Vhost lib's FD manager to install
a handler for VDUSE events occurring on the VDUSE device FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index afa8a39498..2a183130d3 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -17,6 +17,7 @@
 
 #include <rte_common.h>
 
+#include "fd_man.h"
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
@@ -32,6 +33,27 @@
 				(1ULL << VIRTIO_F_IN_ORDER) | \
 				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
+struct vduse {
+	struct fdset fdset;
+};
+
+static struct vduse vduse = {
+	.fdset = {
+		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.num = 0
+	},
+};
+
+static bool vduse_events_thread;
+
+static const char * const vduse_reqs_str[] = {
+	"VDUSE_GET_VQ_STATE",
+	"VDUSE_SET_STATUS",
+	"VDUSE_UPDATE_IOTLB",
+};
+
 static int
 vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -105,16 +127,84 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	struct vduse_dev_request req;
+	struct vduse_dev_response resp;
+	int ret;
+
+	memset(&resp, 0, sizeof(resp));
+
+	ret = read(fd, &req, sizeof(req));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read request: %s\n",
+				strerror(errno));
+		return;
+	} else if (ret < (int)sizeof(req)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Incomplete to read request %d\n", ret);
+		return;
+	}
+
+	pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "New request: %s (%u)\n",
+			req.type < RTE_DIM(vduse_reqs_str) ?
+			vduse_reqs_str[req.type] : "Unknown",
+			req.type);
+
+	switch (req.type) {
+	default:
+		resp.result = VDUSE_REQ_RESULT_FAILED;
+		break;
+	}
+
+	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
+
+	resp.request_id = req.request_id;
+
+	ret = write(dev->vduse_dev_fd, &resp, sizeof(resp));
+	if (ret != sizeof(resp)) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to write response %s\n",
+				strerror(errno));
+		return;
+	}
+}
+
 int
 vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
+	pthread_t fdset_tid;
 	uint32_t i;
 	struct virtio_net *dev;
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
 
+	/* If first device, create events dispatcher thread */
+	if (vduse_events_thread == false) {
+		/**
+		 * create a pipe which will be waited by poll and notified to
+		 * rebuild the wait list of poll.
+		 */
+		if (fdset_pipe_init(&vduse.fdset) < 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create pipe for vduse fdset\n");
+			return -1;
+		}
+
+		ret = rte_ctrl_thread_create(&fdset_tid, "vduse-events", NULL,
+				fdset_event_dispatch, &vduse.fdset);
+		if (ret != 0) {
+			VHOST_LOG_CONFIG(path, ERR, "failed to create vduse fdset handling thread\n");
+			fdset_pipe_uninit(&vduse.fdset);
+			return -1;
+		}
+
+		vduse_events_thread = true;
+	}
+
 	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
 	if (control_fd < 0) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
@@ -198,6 +288,13 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
+	if (ret) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
+				dev->vduse_dev_fd);
+		goto out_dev_destroy;
+	}
+
 	free(dev_config);
 
 	return 0;
@@ -236,11 +333,16 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
+
 	if (dev->vduse_dev_fd >= 0) {
 		close(dev->vduse_dev_fd);
 		dev->vduse_dev_fd = -1;
 	}
 
+	sleep(2); //ToDo: Need to rework fdman to ensure the deleted FD is no
+		  //more being polled, otherwise VDUSE_DESTROY_DEV will fail.
+
 	if (dev->vduse_ctrl_fd >= 0) {
 		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
 		if (ret)
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 23/27] vhost: add support for virtqueue state get event
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (21 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 22/27] vhost: add VDUSE events handler Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:34   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 24/27] vhost: add support for VDUSE status set event Maxime Coquelin
                   ` (6 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds support for VDUSE_GET_VQ_STATE event
handling, which consists in providing the backend last
available index for the specified virtqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 2a183130d3..36028b7315 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -133,6 +133,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 	struct virtio_net *dev = arg;
 	struct vduse_dev_request req;
 	struct vduse_dev_response resp;
+	struct vhost_virtqueue *vq;
 	int ret;
 
 	memset(&resp, 0, sizeof(resp));
@@ -155,6 +156,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 			req.type);
 
 	switch (req.type) {
+	case VDUSE_GET_VQ_STATE:
+		vq = dev->virtqueue[req.vq_state.index];
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tvq index: %u, avail_index: %u\n",
+				req.vq_state.index, vq->last_avail_idx);
+		resp.vq_state.split.avail_index = vq->last_avail_idx;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 24/27] vhost: add support for VDUSE status set event
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (22 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 23/27] vhost: add support for virtqueue state get event Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:34   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 25/27] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds support for VDUSE_SET_STATUS event
handling, which consists in updating the Virtio device
status set by the Virtio driver.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 36028b7315..7d59a5f709 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -163,6 +163,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		resp.vq_state.split.avail_index = vq->last_avail_idx;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_SET_STATUS:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
+				req.s.status);
+		dev->status = req.s.status;
+		resp.result = VDUSE_REQ_RESULT_OK;
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 25/27] vhost: add support for VDUSE IOTLB update event
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (23 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 24/27] vhost: add support for VDUSE status set event Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:35   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 26/27] vhost: add VDUSE device startup Maxime Coquelin
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds support for VDUSE_UPDATE_IOTLB event
handling, which consists in invaliding IOTLB entries for
the range specified in the request.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 7d59a5f709..b5b9fa2eb1 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -169,6 +169,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		dev->status = req.s.status;
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
+	case VDUSE_UPDATE_IOTLB:
+		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 " - %" PRIx64 "\n",
+				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
+		vhost_user_iotlb_cache_remove(dev, req.iova.start,
+				req.iova.last - req.iova.start + 1);
+		break;
 	default:
 		resp.result = VDUSE_REQ_RESULT_FAILED;
 		break;
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 26/27] vhost: add VDUSE device startup
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (24 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 25/27] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:35   ` Xia, Chenbo
  2023-03-31 15:42 ` [RFC 27/27] vhost: add multiqueue support to VDUSE Maxime Coquelin
                   ` (3 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch adds the device and its virtqueues
initialization once the Virtio driver has set the DRIVER_OK
in the Virtio status register.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index b5b9fa2eb1..1cd04b4872 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -127,6 +127,120 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_vring_setup(struct virtio_net *dev, unsigned int index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[index];
+	struct vhost_vring_addr *ra = &vq->ring_addrs;
+	struct vduse_vq_info vq_info;
+	struct vduse_vq_eventfd vq_efd;
+	int ret;
+
+	vq_info.index = index;
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u info: %s\n",
+				index, strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n", vq_info.desc_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n", vq_info.driver_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n", vq_info.device_addr);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n", vq_info.split.avail_index);
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
+
+	vq->last_avail_idx = vq_info.split.avail_index;
+	vq->size = vq_info.num;
+	vq->ready = vq_info.ready;
+	vq->enabled = true;
+	ra->desc_user_addr = vq_info.desc_addr;
+	ra->avail_user_addr = vq_info.driver_addr;
+	ra->used_user_addr = vq_info.device_addr;
+
+	vq->shadow_used_split = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct vring_used_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+	vq->batch_copy_elems = rte_malloc_socket(NULL,
+				vq->size * sizeof(struct batch_copy_elem),
+				RTE_CACHE_LINE_SIZE, 0);
+
+	vhost_user_iotlb_rd_lock(vq);
+	if (vring_translate(dev, vq))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate vring %d addresses\n",
+				index);
+	vhost_user_iotlb_rd_unlock(vq);
+
+	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+	if (vq->kickfd < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		vq->kickfd = VIRTIO_INVALID_EVENTFD;
+		return;
+	}
+
+	vq_efd.index = index;
+	vq_efd.fd = vq->kickfd;
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for VQ %u: %s\n",
+				index, strerror(errno));
+		close(vq->kickfd);
+		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		return;
+	}
+}
+
+static void
+vduse_device_start(struct virtio_net *dev)
+{
+	unsigned int i, ret;
+
+	dev->notify_ops = vhost_driver_callback_get(dev->ifname);
+	if (!dev->notify_ops) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR,
+				"Failed to get callback ops for driver\n");
+		return;
+	}
+
+	ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev->features);
+	if (ret) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get features: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, INFO, "negotiated Virtio features: 0x%" PRIx64 "\n",
+		dev->features);
+
+	if (dev->features &
+		((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+		 (1ULL << VIRTIO_F_VERSION_1) |
+		 (1ULL << VIRTIO_F_RING_PACKED))) {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	} else {
+		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->nr_vring; i++)
+		vduse_vring_setup(dev, i);
+
+	dev->flags |= VIRTIO_DEV_READY;
+
+	if (dev->notify_ops->new_device(dev->vid) == 0)
+		dev->flags |= VIRTIO_DEV_RUNNING;
+
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (dev->notify_ops->vring_state_changed)
+			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
+	}
+}
+
 static void
 vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 {
@@ -167,6 +281,10 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
 		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
 				req.s.status);
 		dev->status = req.s.status;
+
+		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+			vduse_device_start(dev);
+
 		resp.result = VDUSE_REQ_RESULT_OK;
 		break;
 	case VDUSE_UPDATE_IOTLB:
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [RFC 27/27] vhost: add multiqueue support to VDUSE
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (25 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 26/27] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-03-31 15:42 ` Maxime Coquelin
  2023-05-09  5:35   ` Xia, Chenbo
  2023-04-06  3:44 ` [RFC 00/27] Add VDUSE support to Vhost library Yongji Xie
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-03-31 15:42 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia, mkp, fbl, jasowang,
	cunming.liang, xieyongji, echaudro, eperezma, amorenoz
  Cc: Maxime Coquelin

This patch enables control queue support in order to
support multiqueue.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vduse.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 1cd04b4872..135e78fc35 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -21,6 +21,7 @@
 #include "iotlb.h"
 #include "vduse.h"
 #include "vhost.h"
+#include "virtio_net_ctrl.h"
 
 #define VHOST_VDUSE_API_VERSION 0
 #define VDUSE_CTRL_PATH "/dev/vduse/control"
@@ -31,7 +32,9 @@
 				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
 				(1ULL << VIRTIO_RING_F_EVENT_IDX) | \
 				(1ULL << VIRTIO_F_IN_ORDER) | \
-				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
+				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+				(1ULL << VIRTIO_NET_F_MQ))
 
 struct vduse {
 	struct fdset fdset;
@@ -127,6 +130,25 @@ static struct vhost_backend_ops vduse_backend_ops = {
 	.inject_irq = vduse_inject_irq,
 };
 
+static void
+vduse_control_queue_event(int fd, void *arg, int *remove __rte_unused)
+{
+	struct virtio_net *dev = arg;
+	uint64_t buf;
+	int ret;
+
+	ret = read(fd, &buf, sizeof(buf));
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read control queue event: %s\n",
+				strerror(errno));
+		return;
+	}
+
+	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue kicked\n");
+	if (virtio_net_ctrl_handle(dev))
+		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to handle ctrl request\n");
+}
+
 static void
 vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 {
@@ -192,6 +214,18 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
 		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 		return;
 	}
+
+	if (vq == dev->cvq) {
+		vhost_enable_guest_notification(dev, vq, 1);
+		ret = fdset_add(&vduse.fdset, vq->kickfd, vduse_control_queue_event, NULL, dev);
+		if (ret) {
+			VHOST_LOG_CONFIG(dev->ifname, ERR,
+					"Failed to setup kickfd handler for VQ %u: %s\n",
+					index, strerror(errno));
+			close(vq->kickfd);
+			vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+		}
+	}
 }
 
 static void
@@ -236,6 +270,9 @@ vduse_device_start(struct virtio_net *dev)
 	for (i = 0; i < dev->nr_vring; i++) {
 		struct vhost_virtqueue *vq = dev->virtqueue[i];
 
+		if (vq == dev->cvq)
+			continue;
+
 		if (dev->notify_ops->vring_state_changed)
 			dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
 	}
@@ -315,8 +352,9 @@ vduse_device_create(const char *path)
 {
 	int control_fd, dev_fd, vid, ret;
 	pthread_t fdset_tid;
-	uint32_t i;
+	uint32_t i, max_queue_pairs;
 	struct virtio_net *dev;
+	struct virtio_net_config vnet_config = { 0 };
 	uint64_t ver = VHOST_VDUSE_API_VERSION;
 	struct vduse_dev_config *dev_config = NULL;
 	const char *name = path + strlen("/dev/vduse/");
@@ -357,22 +395,33 @@ vduse_device_create(const char *path)
 		goto out_ctrl_close;
 	}
 
-	dev_config = malloc(offsetof(struct vduse_dev_config, config));
+	dev_config = malloc(offsetof(struct vduse_dev_config, config) +
+			sizeof(vnet_config));
 	if (!dev_config) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
 		ret = -1;
 		goto out_ctrl_close;
 	}
 
+	ret = rte_vhost_driver_get_queue_num(path, &max_queue_pairs);
+	if (ret < 0) {
+		VHOST_LOG_CONFIG(name, ERR, "Failed to get max queue pairs\n");
+		goto out_free;
+	}
+
+	VHOST_LOG_CONFIG(path, INFO, "VDUSE max queue pairs: %u\n", max_queue_pairs);
+
+	vnet_config.max_virtqueue_pairs = max_queue_pairs;
 	memset(dev_config, 0, sizeof(struct vduse_dev_config));
 
 	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
 	dev_config->device_id = VIRTIO_ID_NET;
 	dev_config->vendor_id = 0;
 	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
-	dev_config->vq_num = 2;
+	dev_config->vq_num = max_queue_pairs * 2 + 1; /* Includes ctrl queue */
 	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
-	dev_config->config_size = 0;
+	dev_config->config_size = sizeof(struct virtio_net_config);
+	memcpy(dev_config->config, &vnet_config, sizeof(vnet_config));
 
 	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
 	if (ret < 0) {
@@ -407,7 +456,7 @@ vduse_device_create(const char *path)
 	dev->vduse_dev_fd = dev_fd;
 	vhost_setup_virtio_net(dev->vid, true, true, true, true);
 
-	for (i = 0; i < 2; i++) {
+	for (i = 0; i < max_queue_pairs * 2 + 1; i++) {
 		struct vduse_vq_config vq_cfg = { 0 };
 
 		ret = alloc_vring_queue(dev, i);
@@ -426,6 +475,8 @@ vduse_device_create(const char *path)
 		}
 	}
 
+	dev->cvq = dev->virtqueue[max_queue_pairs * 2];
+
 	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
 	if (ret) {
 		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
@@ -471,6 +522,12 @@ vduse_device_destroy(const char *path)
 	if (vid == RTE_MAX_VHOST_DEVICE)
 		return -1;
 
+	if (dev->cvq && dev->cvq->kickfd >= 0) {
+		fdset_del(&vduse.fdset, dev->cvq->kickfd);
+		close(dev->cvq->kickfd);
+		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+	}
+
 	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
 
 	if (dev->vduse_dev_fd >= 0) {
-- 
2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (26 preceding siblings ...)
  2023-03-31 15:42 ` [RFC 27/27] vhost: add multiqueue support to VDUSE Maxime Coquelin
@ 2023-04-06  3:44 ` Yongji Xie
  2023-04-06  8:16   ` Maxime Coquelin
  2023-04-12 11:33 ` Ferruh Yigit
  2023-05-05  5:53 ` Xia, Chenbo
  29 siblings, 1 reply; 79+ messages in thread
From: Yongji Xie @ 2023-04-06  3:44 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, David Marchand, chenbo.xia, mkp, fbl, Jason Wang,
	cunming.liang, echaudro, Eugenio Perez Martin,
	Adrian Moreno Zapata

Hi Maxime,

On Fri, Mar 31, 2023 at 11:43 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
>
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
>
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
>
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
>
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
>
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
>
> Usage:
> ======
>
> 1. Probe required Kernel modules
> # modprobe vdpa
> # modprobe vduse
> # modprobe virtio-vdpa
>
> 2. Build (require vduse kernel headers to be available)
> # meson build
> # ninja -C build
>
> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> testpmd (with 4 queue pairs in this example)
> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>
> 4. Attach the VDUSE device to the vDPA bus
> # vdpa dev add name vduse0 mgmtdev vduse
> => The virtio-net netdev shows up (eth0 here)
> # ip l show eth0
> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>     link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>
> 5. Start/stop traffic in testpmd
> testpmd> start
> testpmd> show port stats 0
>   ######################## NIC statistics for port 0  ########################
>   RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 1          TX-errors: 0          TX-bytes:  62
>
>   Throughput (since last show)
>   Rx-pps:            0          Rx-bps:            0
>   Tx-pps:            0          Tx-bps:            0
>   ############################################################################
> testpmd> stop
>
> 6. Detach the VDUSE device from the vDPA bus
> # vdpa dev del vduse0
>
> 7. Quit testpmd
> testpmd> quit
>
> Known issues & remaining work:
> ==============================
> - Fix issue in FD manager (still polling while FD has been removed)
> - Add Netlink support in Vhost library
> - Support device reconnection
> - Support packed ring
> - Enable & test more Virtio features
> - Provide performance benchmark results
>

Nice work! Thanks for bringing VDUSE to the network area. I wonder if
you have some plan to support userspace memory registration [1]? I
think this feature can benefit the performance since an extra data
copy could be eliminated in our case.

[1] https://lwn.net/Articles/902809/

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-06  3:44 ` [RFC 00/27] Add VDUSE support to Vhost library Yongji Xie
@ 2023-04-06  8:16   ` Maxime Coquelin
  2023-04-06 11:04     ` Yongji Xie
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-04-06  8:16 UTC (permalink / raw)
  To: Yongji Xie
  Cc: dev, David Marchand, chenbo.xia, mkp, fbl, Jason Wang,
	cunming.liang, echaudro, Eugenio Perez Martin,
	Adrian Moreno Zapata

Hi Yongji,

On 4/6/23 05:44, Yongji Xie wrote:
> Hi Maxime,
> 
> On Fri, Mar 31, 2023 at 11:43 PM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0  ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>    ############################################################################
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>> - Support packed ring
>> - Enable & test more Virtio features
>> - Provide performance benchmark results
>>
> 
> Nice work! Thanks for bringing VDUSE to the network area. I wonder if
> you have some plan to support userspace memory registration [1]? I
> think this feature can benefit the performance since an extra data
> copy could be eliminated in our case.

I plan to have a closer look later, once VDUSE support will be added.
I think it will be difficult to support it in the case of DPDK for
networking:

  - For dequeue path it would be basically re-introducing dequeue zero-
copy support that we removed some time ago. It was a hack where we
replaced the regular mbuf buffer with the descriptor one, increased the
reference counter, and at next dequeue API calls checked if the former
mbufs ref counter is 1 and restore the mbuf. Issue is that physical NIC
drivers usually release sent mbufs by pool, once a certain threshold is
met. So it can cause draining of the virtqueue as the descs are not
written back into the used ring for quite some time, depending on the
NIC/traffic/...

- For enqueue path, I don't think this is possible with virtual switches
by design, as when a mbuf is received on a physical port, we don't know
in which Vhost/VDUSE port it will be switched to. And for VM to VM
communication, should it use the src VM buffer or the dest VM one?

Only case it could work is if you had a simple forwarder between a VDUSE
device and a physical port. But I don't think there is much interest in
such use-case.

Any thoughts?

Thanks,
Maxime

> [1] https://lwn.net/Articles/902809/
> 
> Thanks,
> Yongji
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-06  8:16   ` Maxime Coquelin
@ 2023-04-06 11:04     ` Yongji Xie
  0 siblings, 0 replies; 79+ messages in thread
From: Yongji Xie @ 2023-04-06 11:04 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, David Marchand, chenbo.xia, mkp, fbl, Jason Wang,
	cunming.liang, echaudro, Eugenio Perez Martin,
	Adrian Moreno Zapata

On Thu, Apr 6, 2023 at 4:17 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi Yongji,
>
> On 4/6/23 05:44, Yongji Xie wrote:
> > Hi Maxime,
> >
> > On Fri, Mar 31, 2023 at 11:43 PM Maxime Coquelin
> > <maxime.coquelin@redhat.com> wrote:
> >>
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >>
> >> Usage:
> >> ======
> >>
> >> 1. Probe required Kernel modules
> >> # modprobe vdpa
> >> # modprobe vduse
> >> # modprobe virtio-vdpa
> >>
> >> 2. Build (require vduse kernel headers to be available)
> >> # meson build
> >> # ninja -C build
> >>
> >> 3. Create a VDUSE device (vduse0) using Vhost PMD with
> >> testpmd (with 4 queue pairs in this example)
> >> # ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --txq=4 --rxq=4
> >>
> >> 4. Attach the VDUSE device to the vDPA bus
> >> # vdpa dev add name vduse0 mgmtdev vduse
> >> => The virtio-net netdev shows up (eth0 here)
> >> # ip l show eth0
> >> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
> >>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
> >>
> >> 5. Start/stop traffic in testpmd
> >> testpmd> start
> >> testpmd> show port stats 0
> >>    ######################## NIC statistics for port 0  ########################
> >>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
> >>    RX-errors: 0
> >>    RX-nombuf:  0
> >>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
> >>
> >>    Throughput (since last show)
> >>    Rx-pps:            0          Rx-bps:            0
> >>    Tx-pps:            0          Tx-bps:            0
> >>    ############################################################################
> >> testpmd> stop
> >>
> >> 6. Detach the VDUSE device from the vDPA bus
> >> # vdpa dev del vduse0
> >>
> >> 7. Quit testpmd
> >> testpmd> quit
> >>
> >> Known issues & remaining work:
> >> ==============================
> >> - Fix issue in FD manager (still polling while FD has been removed)
> >> - Add Netlink support in Vhost library
> >> - Support device reconnection
> >> - Support packed ring
> >> - Enable & test more Virtio features
> >> - Provide performance benchmark results
> >>
> >
> > Nice work! Thanks for bringing VDUSE to the network area. I wonder if
> > you have some plan to support userspace memory registration [1]? I
> > think this feature can benefit the performance since an extra data
> > copy could be eliminated in our case.
>
> I plan to have a closer look later, once VDUSE support will be added.
> I think it will be difficult to support it in the case of DPDK for
> networking:
>
>   - For dequeue path it would be basically re-introducing dequeue zero-
> copy support that we removed some time ago. It was a hack where we
> replaced the regular mbuf buffer with the descriptor one, increased the
> reference counter, and at next dequeue API calls checked if the former
> mbufs ref counter is 1 and restore the mbuf. Issue is that physical NIC
> drivers usually release sent mbufs by pool, once a certain threshold is
> met. So it can cause draining of the virtqueue as the descs are not
> written back into the used ring for quite some time, depending on the
> NIC/traffic/...
>

OK, I see. Could this issue be mitigated by releasing sent mbufs one
by one once we sent it out or simply increasing the virtqueue size?

> - For enqueue path, I don't think this is possible with virtual switches
> by design, as when a mbuf is received on a physical port, we don't know
> in which Vhost/VDUSE port it will be switched to. And for VM to VM
> communication, should it use the src VM buffer or the dest VM one?
>

Yes, I agree that it's hard to achieve that in the enqueue path.

> Only case it could work is if you had a simple forwarder between a VDUSE
> device and a physical port. But I don't think there is much interest in
> such use-case.
>

OK, I get it.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (27 preceding siblings ...)
  2023-04-06  3:44 ` [RFC 00/27] Add VDUSE support to Vhost library Yongji Xie
@ 2023-04-12 11:33 ` Ferruh Yigit
  2023-04-12 15:28   ` Maxime Coquelin
  2023-05-05  5:53 ` Xia, Chenbo
  29 siblings, 1 reply; 79+ messages in thread
From: Ferruh Yigit @ 2023-04-12 11:33 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, chenbo.xia, mkp, fbl,
	jasowang, cunming.liang, xieyongji, echaudro, eperezma, amorenoz

On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.

Hi Maxime,

Is this a replacement to the existing DPDK vDPA framework? What is the
plan for long term?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-12 11:33 ` Ferruh Yigit
@ 2023-04-12 15:28   ` Maxime Coquelin
  2023-04-12 19:40     ` Morten Brørup
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-04-12 15:28 UTC (permalink / raw)
  To: Ferruh Yigit, dev, david.marchand, chenbo.xia, mkp, fbl,
	jasowang, cunming.liang, xieyongji, echaudro, eperezma, amorenoz

Hi Ferruh,

On 4/12/23 13:33, Ferruh Yigit wrote:
> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
> 
> Hi Maxime,
> 
> Is this a replacement to the existing DPDK vDPA framework? What is the
> plan for long term?
> 

No, this is not a replacement for DPDK vDPA framework.

We (Red Hat) don't have plans to support DPDK vDPA framework in our
products, but there are still contribution to DPDK vDPA by several vDPA
hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
to be deprecated soon.

Regards,
Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-12 15:28   ` Maxime Coquelin
@ 2023-04-12 19:40     ` Morten Brørup
  2023-04-13  7:08       ` Xia, Chenbo
  0 siblings, 1 reply; 79+ messages in thread
From: Morten Brørup @ 2023-04-12 19:40 UTC (permalink / raw)
  To: Maxime Coquelin, Ferruh Yigit, dev, david.marchand, chenbo.xia,
	mkp, fbl, jasowang, cunming.liang, xieyongji, echaudro, eperezma,
	amorenoz

> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, 12 April 2023 17.28
> 
> Hi Ferruh,
> 
> On 4/12/23 13:33, Ferruh Yigit wrote:
> > On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> >> This series introduces a new type of backend, VDUSE,
> >> to the Vhost library.
> >>
> >> VDUSE stands for vDPA device in Userspace, it enables
> >> implementing a Virtio device in userspace and have it
> >> attached to the Kernel vDPA bus.
> >>
> >> Once attached to the vDPA bus, it can be used either by
> >> Kernel Virtio drivers, like virtio-net in our case, via
> >> the virtio-vdpa driver. Doing that, the device is visible
> >> to the Kernel networking stack and is exposed to userspace
> >> as a regular netdev.
> >>
> >> It can also be exposed to userspace thanks to the
> >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >> passed to QEMU or Virtio-user PMD.
> >>
> >> While VDUSE support is already available in upstream
> >> Kernel, a couple of patches are required to support
> >> network device type:
> >>
> >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> >>
> >> In order to attach the created VDUSE device to the vDPA
> >> bus, a recent iproute2 version containing the vdpa tool is
> >> required.
> >
> > Hi Maxime,
> >
> > Is this a replacement to the existing DPDK vDPA framework? What is the
> > plan for long term?
> >
> 
> No, this is not a replacement for DPDK vDPA framework.
> 
> We (Red Hat) don't have plans to support DPDK vDPA framework in our
> products, but there are still contribution to DPDK vDPA by several vDPA
> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
> to be deprecated soon.

Ferruh's question made me curious...

I don't know anything about VDUSE or vDPA, and don't use any of it, so consider me ignorant in this area.

Is VDUSE an alternative to the existing DPDK vDPA framework? What are the differences, e.g. in which cases would an application developer (or user) choose one or the other?

And if it is a better alternative, perhaps the documentation should mention that it is recommended over DPDK vDPA. Just like we started recommending alternatives to the KNI driver, so we could phase it out and eventually get rid of it.

> 
> Regards,
> Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-12 19:40     ` Morten Brørup
@ 2023-04-13  7:08       ` Xia, Chenbo
  2023-04-13  7:58         ` Morten Brørup
  2023-04-13  7:59         ` Maxime Coquelin
  0 siblings, 2 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-13  7:08 UTC (permalink / raw)
  To: Morten Brørup, Maxime Coquelin, Ferruh Yigit, dev,
	david.marchand, mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji,
	echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, April 13, 2023 3:41 AM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
> 
> > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > Sent: Wednesday, 12 April 2023 17.28
> >
> > Hi Ferruh,
> >
> > On 4/12/23 13:33, Ferruh Yigit wrote:
> > > On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> > >> This series introduces a new type of backend, VDUSE,
> > >> to the Vhost library.
> > >>
> > >> VDUSE stands for vDPA device in Userspace, it enables
> > >> implementing a Virtio device in userspace and have it
> > >> attached to the Kernel vDPA bus.
> > >>
> > >> Once attached to the vDPA bus, it can be used either by
> > >> Kernel Virtio drivers, like virtio-net in our case, via
> > >> the virtio-vdpa driver. Doing that, the device is visible
> > >> to the Kernel networking stack and is exposed to userspace
> > >> as a regular netdev.
> > >>
> > >> It can also be exposed to userspace thanks to the
> > >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> > >> passed to QEMU or Virtio-user PMD.
> > >>
> > >> While VDUSE support is already available in upstream
> > >> Kernel, a couple of patches are required to support
> > >> network device type:
> > >>
> > >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> > >>
> > >> In order to attach the created VDUSE device to the vDPA
> > >> bus, a recent iproute2 version containing the vdpa tool is
> > >> required.
> > >
> > > Hi Maxime,
> > >
> > > Is this a replacement to the existing DPDK vDPA framework? What is the
> > > plan for long term?
> > >
> >
> > No, this is not a replacement for DPDK vDPA framework.
> >
> > We (Red Hat) don't have plans to support DPDK vDPA framework in our
> > products, but there are still contribution to DPDK vDPA by several vDPA
> > hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
> > to be deprecated soon.
> 
> Ferruh's question made me curious...
> 
> I don't know anything about VDUSE or vDPA, and don't use any of it, so
> consider me ignorant in this area.
> 
> Is VDUSE an alternative to the existing DPDK vDPA framework? What are the
> differences, e.g. in which cases would an application developer (or user)
> choose one or the other?

Maxime should give better explanation.. but let me just explain a bit.

Vendors have vDPA HW that support vDPA framework (most likely in their DPU/IPU
products). This work is introducing a way to emulate a SW vDPA device in
userspace (DPDK), and this SW vDPA device also supports vDPA framework.

So it's not an alternative to existing DPDK vDPA framework :)

Thanks,
Chenbo

> 
> And if it is a better alternative, perhaps the documentation should
> mention that it is recommended over DPDK vDPA. Just like we started
> recommending alternatives to the KNI driver, so we could phase it out and
> eventually get rid of it.
> 
> >
> > Regards,
> > Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-13  7:08       ` Xia, Chenbo
@ 2023-04-13  7:58         ` Morten Brørup
  2023-04-13  7:59         ` Maxime Coquelin
  1 sibling, 0 replies; 79+ messages in thread
From: Morten Brørup @ 2023-04-13  7:58 UTC (permalink / raw)
  To: Xia, Chenbo, Maxime Coquelin, Ferruh Yigit, dev, david.marchand,
	mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji, echaudro,
	eperezma, amorenoz

> From: Xia, Chenbo [mailto:chenbo.xia@intel.com]
> Sent: Thursday, 13 April 2023 09.08
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Thursday, April 13, 2023 3:41 AM
> >
> > > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > > Sent: Wednesday, 12 April 2023 17.28
> > >
> > > Hi Ferruh,
> > >
> > > On 4/12/23 13:33, Ferruh Yigit wrote:
> > > > On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> > > >> This series introduces a new type of backend, VDUSE,
> > > >> to the Vhost library.
> > > >>
> > > >> VDUSE stands for vDPA device in Userspace, it enables
> > > >> implementing a Virtio device in userspace and have it
> > > >> attached to the Kernel vDPA bus.
> > > >>
> > > >> Once attached to the vDPA bus, it can be used either by
> > > >> Kernel Virtio drivers, like virtio-net in our case, via
> > > >> the virtio-vdpa driver. Doing that, the device is visible
> > > >> to the Kernel networking stack and is exposed to userspace
> > > >> as a regular netdev.
> > > >>
> > > >> It can also be exposed to userspace thanks to the
> > > >> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> > > >> passed to QEMU or Virtio-user PMD.
> > > >>
> > > >> While VDUSE support is already available in upstream
> > > >> Kernel, a couple of patches are required to support
> > > >> network device type:
> > > >>
> > > >> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> > > >>
> > > >> In order to attach the created VDUSE device to the vDPA
> > > >> bus, a recent iproute2 version containing the vdpa tool is
> > > >> required.
> > > >
> > > > Hi Maxime,
> > > >
> > > > Is this a replacement to the existing DPDK vDPA framework? What is the
> > > > plan for long term?
> > > >
> > >
> > > No, this is not a replacement for DPDK vDPA framework.
> > >
> > > We (Red Hat) don't have plans to support DPDK vDPA framework in our
> > > products, but there are still contribution to DPDK vDPA by several vDPA
> > > hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
> > > to be deprecated soon.
> >
> > Ferruh's question made me curious...
> >
> > I don't know anything about VDUSE or vDPA, and don't use any of it, so
> > consider me ignorant in this area.
> >
> > Is VDUSE an alternative to the existing DPDK vDPA framework? What are the
> > differences, e.g. in which cases would an application developer (or user)
> > choose one or the other?
> 
> Maxime should give better explanation.. but let me just explain a bit.
> 
> Vendors have vDPA HW that support vDPA framework (most likely in their DPU/IPU
> products). This work is introducing a way to emulate a SW vDPA device in
> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
> 
> So it's not an alternative to existing DPDK vDPA framework :)
> 
> Thanks,
> Chenbo

Not an alternative, then nothing further from me. :-)

> 
> >
> > And if it is a better alternative, perhaps the documentation should
> > mention that it is recommended over DPDK vDPA. Just like we started
> > recommending alternatives to the KNI driver, so we could phase it out and
> > eventually get rid of it.
> >
> > >
> > > Regards,
> > > Maxime


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-13  7:08       ` Xia, Chenbo
  2023-04-13  7:58         ` Morten Brørup
@ 2023-04-13  7:59         ` Maxime Coquelin
  2023-04-14 10:48           ` Ferruh Yigit
  1 sibling, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-04-13  7:59 UTC (permalink / raw)
  To: Xia, Chenbo, Morten Brørup, Ferruh Yigit, dev,
	david.marchand, mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji,
	echaudro, eperezma, amorenoz

Hi,

On 4/13/23 09:08, Xia, Chenbo wrote:
>> -----Original Message-----
>> From: Morten Brørup <mb@smartsharesystems.com>
>> Sent: Thursday, April 13, 2023 3:41 AM
>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
>> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
>> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
>>
>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>> Sent: Wednesday, 12 April 2023 17.28
>>>
>>> Hi Ferruh,
>>>
>>> On 4/12/23 13:33, Ferruh Yigit wrote:
>>>> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
>>>>> This series introduces a new type of backend, VDUSE,
>>>>> to the Vhost library.
>>>>>
>>>>> VDUSE stands for vDPA device in Userspace, it enables
>>>>> implementing a Virtio device in userspace and have it
>>>>> attached to the Kernel vDPA bus.
>>>>>
>>>>> Once attached to the vDPA bus, it can be used either by
>>>>> Kernel Virtio drivers, like virtio-net in our case, via
>>>>> the virtio-vdpa driver. Doing that, the device is visible
>>>>> to the Kernel networking stack and is exposed to userspace
>>>>> as a regular netdev.
>>>>>
>>>>> It can also be exposed to userspace thanks to the
>>>>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>>>>> passed to QEMU or Virtio-user PMD.
>>>>>
>>>>> While VDUSE support is already available in upstream
>>>>> Kernel, a couple of patches are required to support
>>>>> network device type:
>>>>>
>>>>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>>>>
>>>>> In order to attach the created VDUSE device to the vDPA
>>>>> bus, a recent iproute2 version containing the vdpa tool is
>>>>> required.
>>>>
>>>> Hi Maxime,
>>>>
>>>> Is this a replacement to the existing DPDK vDPA framework? What is the
>>>> plan for long term?
>>>>
>>>
>>> No, this is not a replacement for DPDK vDPA framework.
>>>
>>> We (Red Hat) don't have plans to support DPDK vDPA framework in our
>>> products, but there are still contribution to DPDK vDPA by several vDPA
>>> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
>>> to be deprecated soon.
>>
>> Ferruh's question made me curious...
>>
>> I don't know anything about VDUSE or vDPA, and don't use any of it, so
>> consider me ignorant in this area.
>>
>> Is VDUSE an alternative to the existing DPDK vDPA framework? What are the
>> differences, e.g. in which cases would an application developer (or user)
>> choose one or the other?
> 
> Maxime should give better explanation.. but let me just explain a bit.
> 
> Vendors have vDPA HW that support vDPA framework (most likely in their DPU/IPU
> products). This work is introducing a way to emulate a SW vDPA device in
> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
> 
> So it's not an alternative to existing DPDK vDPA framework :)

Correct.

When using DPDK vDPA, the datapath of a Vhost-user port is offloaded to
a compatible physical NIC (i.e. a NIC that implements Virtio rings
support), the control path remains the same as a regular Vhost-user
port, i.e. it provides a Vhost-user unix socket to the application (like
QEMU or DPDK Virtio-user PMD).

When using Kernel vDPA, the datapath is also offloaded to a vDPA
compatible device, and the control path is managed by the vDPA bus.
It can either be consumed by a Kernel Virtio device (here Virtio-net)
when using Virtio-vDPA. In this case the device is exposed as a regular
netdev and, in the case of Kubernetes, can be used as primary interfaces
for the pods.
Or it can be exposed to user-space via Vhost-vDPA, a chardev that can be
seen as an alternative to Vhost-user sockets. In this case it can for
example be used by QEMU or DPDK Virtio-user PMD. In Kubernetes, it can
be used as a secondary interface.

Now comes VDUSE. VDUSE is a Kernel vDPA device, but instead of being a
physical device where the Virtio datapath is offloaded, the Virtio
datapath is offloaded to a user-space application. With this series, a
DPDK application, like OVS-DPDK for instance, can create VDUSE device
and expose them either as regular netdev when binding them to Kernel
Virtio-net driver via Virtio-vDPA, or as Vhost-vDPA interface to be
consumed by another userspace appliation like QEMU or DPDK application
using Virtio-user PMD. With this solution, OVS-DPDK could serve both
primary and secondary interfaces of Kubernetes pods.

I hope it clarifies, I will add these information in the cover-letter
for next revisions. Let me know if anything is still unclear.

I did a presentation at last DPDK summit [0], maybe the diagrams will 
help to clarify furthermore.

Regards,
Maxime

> Thanks,
> Chenbo
> 
>>
>> And if it is a better alternative, perhaps the documentation should
>> mention that it is recommended over DPDK vDPA. Just like we started
>> recommending alternatives to the KNI driver, so we could phase it out and
>> eventually get rid of it.
>>
>>>
>>> Regards,
>>> Maxime
> 

[0]: 
https://static.sched.com/hosted_files/dpdkuserspace22/9f/Open%20DPDK%20to%20containers%20networking%20with%20VDUSE.pdf


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-13  7:59         ` Maxime Coquelin
@ 2023-04-14 10:48           ` Ferruh Yigit
  2023-04-14 12:06             ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Ferruh Yigit @ 2023-04-14 10:48 UTC (permalink / raw)
  To: Maxime Coquelin, Xia, Chenbo, Morten Brørup, dev,
	david.marchand, mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji,
	echaudro, eperezma, amorenoz

On 4/13/2023 8:59 AM, Maxime Coquelin wrote:
> Hi,
> 
> On 4/13/23 09:08, Xia, Chenbo wrote:
>>> -----Original Message-----
>>> From: Morten Brørup <mb@smartsharesystems.com>
>>> Sent: Thursday, April 13, 2023 3:41 AM
>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
>>> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
>>> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
>>> Yongji
>>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>>> amorenoz@redhat.com
>>> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
>>>
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Wednesday, 12 April 2023 17.28
>>>>
>>>> Hi Ferruh,
>>>>
>>>> On 4/12/23 13:33, Ferruh Yigit wrote:
>>>>> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
>>>>>> This series introduces a new type of backend, VDUSE,
>>>>>> to the Vhost library.
>>>>>>
>>>>>> VDUSE stands for vDPA device in Userspace, it enables
>>>>>> implementing a Virtio device in userspace and have it
>>>>>> attached to the Kernel vDPA bus.
>>>>>>
>>>>>> Once attached to the vDPA bus, it can be used either by
>>>>>> Kernel Virtio drivers, like virtio-net in our case, via
>>>>>> the virtio-vdpa driver. Doing that, the device is visible
>>>>>> to the Kernel networking stack and is exposed to userspace
>>>>>> as a regular netdev.
>>>>>>
>>>>>> It can also be exposed to userspace thanks to the
>>>>>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>>>>>> passed to QEMU or Virtio-user PMD.
>>>>>>
>>>>>> While VDUSE support is already available in upstream
>>>>>> Kernel, a couple of patches are required to support
>>>>>> network device type:
>>>>>>
>>>>>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>>>>>
>>>>>> In order to attach the created VDUSE device to the vDPA
>>>>>> bus, a recent iproute2 version containing the vdpa tool is
>>>>>> required.
>>>>>
>>>>> Hi Maxime,
>>>>>
>>>>> Is this a replacement to the existing DPDK vDPA framework? What is the
>>>>> plan for long term?
>>>>>
>>>>
>>>> No, this is not a replacement for DPDK vDPA framework.
>>>>
>>>> We (Red Hat) don't have plans to support DPDK vDPA framework in our
>>>> products, but there are still contribution to DPDK vDPA by several vDPA
>>>> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
>>>> to be deprecated soon.
>>>
>>> Ferruh's question made me curious...
>>>
>>> I don't know anything about VDUSE or vDPA, and don't use any of it, so
>>> consider me ignorant in this area.
>>>
>>> Is VDUSE an alternative to the existing DPDK vDPA framework? What are
>>> the
>>> differences, e.g. in which cases would an application developer (or
>>> user)
>>> choose one or the other?
>>
>> Maxime should give better explanation.. but let me just explain a bit.
>>
>> Vendors have vDPA HW that support vDPA framework (most likely in their
>> DPU/IPU
>> products). This work is introducing a way to emulate a SW vDPA device in
>> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
>>
>> So it's not an alternative to existing DPDK vDPA framework :)
> 
> Correct.
> 
> When using DPDK vDPA, the datapath of a Vhost-user port is offloaded to
> a compatible physical NIC (i.e. a NIC that implements Virtio rings
> support), the control path remains the same as a regular Vhost-user
> port, i.e. it provides a Vhost-user unix socket to the application (like
> QEMU or DPDK Virtio-user PMD).
> 
> When using Kernel vDPA, the datapath is also offloaded to a vDPA
> compatible device, and the control path is managed by the vDPA bus.
> It can either be consumed by a Kernel Virtio device (here Virtio-net)
> when using Virtio-vDPA. In this case the device is exposed as a regular
> netdev and, in the case of Kubernetes, can be used as primary interfaces
> for the pods.
> Or it can be exposed to user-space via Vhost-vDPA, a chardev that can be
> seen as an alternative to Vhost-user sockets. In this case it can for
> example be used by QEMU or DPDK Virtio-user PMD. In Kubernetes, it can
> be used as a secondary interface.
> 
> Now comes VDUSE. VDUSE is a Kernel vDPA device, but instead of being a
> physical device where the Virtio datapath is offloaded, the Virtio
> datapath is offloaded to a user-space application. With this series, a
> DPDK application, like OVS-DPDK for instance, can create VDUSE device
> and expose them either as regular netdev when binding them to Kernel
> Virtio-net driver via Virtio-vDPA, or as Vhost-vDPA interface to be
> consumed by another userspace appliation like QEMU or DPDK application
> using Virtio-user PMD. With this solution, OVS-DPDK could serve both
> primary and secondary interfaces of Kubernetes pods.
> 
> I hope it clarifies, I will add these information in the cover-letter
> for next revisions. Let me know if anything is still unclear.
> 
> I did a presentation at last DPDK summit [0], maybe the diagrams will
> help to clarify furthermore.
> 

Thanks Chenbo, Maxime for clarification.

After reading a little more (I think) I got it better, slides [0] were
useful.

So this is more like a backend/handler, similar to vhost-user, although
it is vDPA device emulation.
Can you please describe more the benefit of vduse comparing to vhost-user?

Also what is "VDUSE daemon", which is referred a few times in
documentation, is it another userspace implementation of the vduse?


>>>
>>> And if it is a better alternative, perhaps the documentation should
>>> mention that it is recommended over DPDK vDPA. Just like we started
>>> recommending alternatives to the KNI driver, so we could phase it out
>>> and
>>> eventually get rid of it.
>>>
>>>>
>>>> Regards,
>>>> Maxime
>>
> 
> [0]:
> https://static.sched.com/hosted_files/dpdkuserspace22/9f/Open%20DPDK%20to%20containers%20networking%20with%20VDUSE.pdf
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-14 10:48           ` Ferruh Yigit
@ 2023-04-14 12:06             ` Maxime Coquelin
  2023-04-14 14:25               ` Ferruh Yigit
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-04-14 12:06 UTC (permalink / raw)
  To: Ferruh Yigit, Xia, Chenbo, Morten Brørup, dev,
	david.marchand, mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji,
	echaudro, eperezma, amorenoz



On 4/14/23 12:48, Ferruh Yigit wrote:
> On 4/13/2023 8:59 AM, Maxime Coquelin wrote:
>> Hi,
>>
>> On 4/13/23 09:08, Xia, Chenbo wrote:
>>>> -----Original Message-----
>>>> From: Morten Brørup <mb@smartsharesystems.com>
>>>> Sent: Thursday, April 13, 2023 3:41 AM
>>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
>>>> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
>>>> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>>>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
>>>> Yongji
>>>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>>>> amorenoz@redhat.com
>>>> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
>>>>
>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>>> Sent: Wednesday, 12 April 2023 17.28
>>>>>
>>>>> Hi Ferruh,
>>>>>
>>>>> On 4/12/23 13:33, Ferruh Yigit wrote:
>>>>>> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
>>>>>>> This series introduces a new type of backend, VDUSE,
>>>>>>> to the Vhost library.
>>>>>>>
>>>>>>> VDUSE stands for vDPA device in Userspace, it enables
>>>>>>> implementing a Virtio device in userspace and have it
>>>>>>> attached to the Kernel vDPA bus.
>>>>>>>
>>>>>>> Once attached to the vDPA bus, it can be used either by
>>>>>>> Kernel Virtio drivers, like virtio-net in our case, via
>>>>>>> the virtio-vdpa driver. Doing that, the device is visible
>>>>>>> to the Kernel networking stack and is exposed to userspace
>>>>>>> as a regular netdev.
>>>>>>>
>>>>>>> It can also be exposed to userspace thanks to the
>>>>>>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>>>>>>> passed to QEMU or Virtio-user PMD.
>>>>>>>
>>>>>>> While VDUSE support is already available in upstream
>>>>>>> Kernel, a couple of patches are required to support
>>>>>>> network device type:
>>>>>>>
>>>>>>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>>>>>>
>>>>>>> In order to attach the created VDUSE device to the vDPA
>>>>>>> bus, a recent iproute2 version containing the vdpa tool is
>>>>>>> required.
>>>>>>
>>>>>> Hi Maxime,
>>>>>>
>>>>>> Is this a replacement to the existing DPDK vDPA framework? What is the
>>>>>> plan for long term?
>>>>>>
>>>>>
>>>>> No, this is not a replacement for DPDK vDPA framework.
>>>>>
>>>>> We (Red Hat) don't have plans to support DPDK vDPA framework in our
>>>>> products, but there are still contribution to DPDK vDPA by several vDPA
>>>>> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is going
>>>>> to be deprecated soon.
>>>>
>>>> Ferruh's question made me curious...
>>>>
>>>> I don't know anything about VDUSE or vDPA, and don't use any of it, so
>>>> consider me ignorant in this area.
>>>>
>>>> Is VDUSE an alternative to the existing DPDK vDPA framework? What are
>>>> the
>>>> differences, e.g. in which cases would an application developer (or
>>>> user)
>>>> choose one or the other?
>>>
>>> Maxime should give better explanation.. but let me just explain a bit.
>>>
>>> Vendors have vDPA HW that support vDPA framework (most likely in their
>>> DPU/IPU
>>> products). This work is introducing a way to emulate a SW vDPA device in
>>> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
>>>
>>> So it's not an alternative to existing DPDK vDPA framework :)
>>
>> Correct.
>>
>> When using DPDK vDPA, the datapath of a Vhost-user port is offloaded to
>> a compatible physical NIC (i.e. a NIC that implements Virtio rings
>> support), the control path remains the same as a regular Vhost-user
>> port, i.e. it provides a Vhost-user unix socket to the application (like
>> QEMU or DPDK Virtio-user PMD).
>>
>> When using Kernel vDPA, the datapath is also offloaded to a vDPA
>> compatible device, and the control path is managed by the vDPA bus.
>> It can either be consumed by a Kernel Virtio device (here Virtio-net)
>> when using Virtio-vDPA. In this case the device is exposed as a regular
>> netdev and, in the case of Kubernetes, can be used as primary interfaces
>> for the pods.
>> Or it can be exposed to user-space via Vhost-vDPA, a chardev that can be
>> seen as an alternative to Vhost-user sockets. In this case it can for
>> example be used by QEMU or DPDK Virtio-user PMD. In Kubernetes, it can
>> be used as a secondary interface.
>>
>> Now comes VDUSE. VDUSE is a Kernel vDPA device, but instead of being a
>> physical device where the Virtio datapath is offloaded, the Virtio
>> datapath is offloaded to a user-space application. With this series, a
>> DPDK application, like OVS-DPDK for instance, can create VDUSE device
>> and expose them either as regular netdev when binding them to Kernel
>> Virtio-net driver via Virtio-vDPA, or as Vhost-vDPA interface to be
>> consumed by another userspace appliation like QEMU or DPDK application
>> using Virtio-user PMD. With this solution, OVS-DPDK could serve both
>> primary and secondary interfaces of Kubernetes pods.
>>
>> I hope it clarifies, I will add these information in the cover-letter
>> for next revisions. Let me know if anything is still unclear.
>>
>> I did a presentation at last DPDK summit [0], maybe the diagrams will
>> help to clarify furthermore.
>>
> 
> Thanks Chenbo, Maxime for clarification.
> 
> After reading a little more (I think) I got it better, slides [0] were
> useful.
> 
> So this is more like a backend/handler, similar to vhost-user, although
> it is vDPA device emulation.
> Can you please describe more the benefit of vduse comparing to vhost-user?

The main benefit is that VDUSE device can be exposed as a regular
netdev, while this is not possible with Vhost-user.

> Also what is "VDUSE daemon", which is referred a few times in
> documentation, is it another userspace implementation of the vduse?

VDUSE daemon is the application that implements the VDUSE device, e.g.
OVS-DPDK with DPDK Vhost library using this series in our case.

Maxime
> 
>>>>
>>>> And if it is a better alternative, perhaps the documentation should
>>>> mention that it is recommended over DPDK vDPA. Just like we started
>>>> recommending alternatives to the KNI driver, so we could phase it out
>>>> and
>>>> eventually get rid of it.
>>>>
>>>>>
>>>>> Regards,
>>>>> Maxime
>>>
>>
>> [0]:
>> https://static.sched.com/hosted_files/dpdkuserspace22/9f/Open%20DPDK%20to%20containers%20networking%20with%20VDUSE.pdf
>>
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-14 12:06             ` Maxime Coquelin
@ 2023-04-14 14:25               ` Ferruh Yigit
  2023-04-17  3:10                 ` Jason Wang
  0 siblings, 1 reply; 79+ messages in thread
From: Ferruh Yigit @ 2023-04-14 14:25 UTC (permalink / raw)
  To: Maxime Coquelin, Xia, Chenbo, Morten Brørup, dev,
	david.marchand, mkp, fbl, jasowang, Liang, Cunming, Xie, Yongji,
	echaudro, eperezma, amorenoz

On 4/14/2023 1:06 PM, Maxime Coquelin wrote:
> 
> 
> On 4/14/23 12:48, Ferruh Yigit wrote:
>> On 4/13/2023 8:59 AM, Maxime Coquelin wrote:
>>> Hi,
>>>
>>> On 4/13/23 09:08, Xia, Chenbo wrote:
>>>>> -----Original Message-----
>>>>> From: Morten Brørup <mb@smartsharesystems.com>
>>>>> Sent: Thursday, April 13, 2023 3:41 AM
>>>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
>>>>> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
>>>>> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>>>>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
>>>>> Yongji
>>>>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>>>>> amorenoz@redhat.com
>>>>> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
>>>>>
>>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>>>> Sent: Wednesday, 12 April 2023 17.28
>>>>>>
>>>>>> Hi Ferruh,
>>>>>>
>>>>>> On 4/12/23 13:33, Ferruh Yigit wrote:
>>>>>>> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
>>>>>>>> This series introduces a new type of backend, VDUSE,
>>>>>>>> to the Vhost library.
>>>>>>>>
>>>>>>>> VDUSE stands for vDPA device in Userspace, it enables
>>>>>>>> implementing a Virtio device in userspace and have it
>>>>>>>> attached to the Kernel vDPA bus.
>>>>>>>>
>>>>>>>> Once attached to the vDPA bus, it can be used either by
>>>>>>>> Kernel Virtio drivers, like virtio-net in our case, via
>>>>>>>> the virtio-vdpa driver. Doing that, the device is visible
>>>>>>>> to the Kernel networking stack and is exposed to userspace
>>>>>>>> as a regular netdev.
>>>>>>>>
>>>>>>>> It can also be exposed to userspace thanks to the
>>>>>>>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>>>>>>>> passed to QEMU or Virtio-user PMD.
>>>>>>>>
>>>>>>>> While VDUSE support is already available in upstream
>>>>>>>> Kernel, a couple of patches are required to support
>>>>>>>> network device type:
>>>>>>>>
>>>>>>>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
>>>>>>>>
>>>>>>>> In order to attach the created VDUSE device to the vDPA
>>>>>>>> bus, a recent iproute2 version containing the vdpa tool is
>>>>>>>> required.
>>>>>>>
>>>>>>> Hi Maxime,
>>>>>>>
>>>>>>> Is this a replacement to the existing DPDK vDPA framework? What
>>>>>>> is the
>>>>>>> plan for long term?
>>>>>>>
>>>>>>
>>>>>> No, this is not a replacement for DPDK vDPA framework.
>>>>>>
>>>>>> We (Red Hat) don't have plans to support DPDK vDPA framework in our
>>>>>> products, but there are still contribution to DPDK vDPA by several
>>>>>> vDPA
>>>>>> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is
>>>>>> going
>>>>>> to be deprecated soon.
>>>>>
>>>>> Ferruh's question made me curious...
>>>>>
>>>>> I don't know anything about VDUSE or vDPA, and don't use any of it, so
>>>>> consider me ignorant in this area.
>>>>>
>>>>> Is VDUSE an alternative to the existing DPDK vDPA framework? What are
>>>>> the
>>>>> differences, e.g. in which cases would an application developer (or
>>>>> user)
>>>>> choose one or the other?
>>>>
>>>> Maxime should give better explanation.. but let me just explain a bit.
>>>>
>>>> Vendors have vDPA HW that support vDPA framework (most likely in their
>>>> DPU/IPU
>>>> products). This work is introducing a way to emulate a SW vDPA
>>>> device in
>>>> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
>>>>
>>>> So it's not an alternative to existing DPDK vDPA framework :)
>>>
>>> Correct.
>>>
>>> When using DPDK vDPA, the datapath of a Vhost-user port is offloaded to
>>> a compatible physical NIC (i.e. a NIC that implements Virtio rings
>>> support), the control path remains the same as a regular Vhost-user
>>> port, i.e. it provides a Vhost-user unix socket to the application (like
>>> QEMU or DPDK Virtio-user PMD).
>>>
>>> When using Kernel vDPA, the datapath is also offloaded to a vDPA
>>> compatible device, and the control path is managed by the vDPA bus.
>>> It can either be consumed by a Kernel Virtio device (here Virtio-net)
>>> when using Virtio-vDPA. In this case the device is exposed as a regular
>>> netdev and, in the case of Kubernetes, can be used as primary interfaces
>>> for the pods.
>>> Or it can be exposed to user-space via Vhost-vDPA, a chardev that can be
>>> seen as an alternative to Vhost-user sockets. In this case it can for
>>> example be used by QEMU or DPDK Virtio-user PMD. In Kubernetes, it can
>>> be used as a secondary interface.
>>>
>>> Now comes VDUSE. VDUSE is a Kernel vDPA device, but instead of being a
>>> physical device where the Virtio datapath is offloaded, the Virtio
>>> datapath is offloaded to a user-space application. With this series, a
>>> DPDK application, like OVS-DPDK for instance, can create VDUSE device
>>> and expose them either as regular netdev when binding them to Kernel
>>> Virtio-net driver via Virtio-vDPA, or as Vhost-vDPA interface to be
>>> consumed by another userspace appliation like QEMU or DPDK application
>>> using Virtio-user PMD. With this solution, OVS-DPDK could serve both
>>> primary and secondary interfaces of Kubernetes pods.
>>>
>>> I hope it clarifies, I will add these information in the cover-letter
>>> for next revisions. Let me know if anything is still unclear.
>>>
>>> I did a presentation at last DPDK summit [0], maybe the diagrams will
>>> help to clarify furthermore.
>>>
>>
>> Thanks Chenbo, Maxime for clarification.
>>
>> After reading a little more (I think) I got it better, slides [0] were
>> useful.
>>
>> So this is more like a backend/handler, similar to vhost-user, although
>> it is vDPA device emulation.
>> Can you please describe more the benefit of vduse comparing to
>> vhost-user?
> 
> The main benefit is that VDUSE device can be exposed as a regular
> netdev, while this is not possible with Vhost-user.
> 

Got it, thanks. I think better to highlight this in commit logs.

And out of curiosity,
Why there is no virtio(guest) to virtio(host) communication support
(without vdpa), something like adding virtio as backend to vhost-net, is
it not needed or technical difficulties?

>> Also what is "VDUSE daemon", which is referred a few times in
>> documentation, is it another userspace implementation of the vduse?
> 
> VDUSE daemon is the application that implements the VDUSE device, e.g.
> OVS-DPDK with DPDK Vhost library using this series in our case.
> 
> Maxime
>>
>>>>>
>>>>> And if it is a better alternative, perhaps the documentation should
>>>>> mention that it is recommended over DPDK vDPA. Just like we started
>>>>> recommending alternatives to the KNI driver, so we could phase it out
>>>>> and
>>>>> eventually get rid of it.
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Maxime
>>>>
>>>
>>> [0]:
>>> https://static.sched.com/hosted_files/dpdkuserspace22/9f/Open%20DPDK%20to%20containers%20networking%20with%20VDUSE.pdf
>>>
>>
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 00/27] Add VDUSE support to Vhost library
  2023-04-14 14:25               ` Ferruh Yigit
@ 2023-04-17  3:10                 ` Jason Wang
  0 siblings, 0 replies; 79+ messages in thread
From: Jason Wang @ 2023-04-17  3:10 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Maxime Coquelin, Xia, Chenbo, Morten Brørup, dev,
	david.marchand, mkp, fbl, Liang, Cunming, Xie, Yongji, echaudro,
	eperezma, amorenoz

On Fri, Apr 14, 2023 at 10:25 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 4/14/2023 1:06 PM, Maxime Coquelin wrote:
> >
> >
> > On 4/14/23 12:48, Ferruh Yigit wrote:
> >> On 4/13/2023 8:59 AM, Maxime Coquelin wrote:
> >>> Hi,
> >>>
> >>> On 4/13/23 09:08, Xia, Chenbo wrote:
> >>>>> -----Original Message-----
> >>>>> From: Morten Brørup <mb@smartsharesystems.com>
> >>>>> Sent: Thursday, April 13, 2023 3:41 AM
> >>>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Ferruh Yigit
> >>>>> <ferruh.yigit@amd.com>; dev@dpdk.org; david.marchand@redhat.com; Xia,
> >>>>> Chenbo <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> >>>>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie,
> >>>>> Yongji
> >>>>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> >>>>> amorenoz@redhat.com
> >>>>> Subject: RE: [RFC 00/27] Add VDUSE support to Vhost library
> >>>>>
> >>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >>>>>> Sent: Wednesday, 12 April 2023 17.28
> >>>>>>
> >>>>>> Hi Ferruh,
> >>>>>>
> >>>>>> On 4/12/23 13:33, Ferruh Yigit wrote:
> >>>>>>> On 3/31/2023 4:42 PM, Maxime Coquelin wrote:
> >>>>>>>> This series introduces a new type of backend, VDUSE,
> >>>>>>>> to the Vhost library.
> >>>>>>>>
> >>>>>>>> VDUSE stands for vDPA device in Userspace, it enables
> >>>>>>>> implementing a Virtio device in userspace and have it
> >>>>>>>> attached to the Kernel vDPA bus.
> >>>>>>>>
> >>>>>>>> Once attached to the vDPA bus, it can be used either by
> >>>>>>>> Kernel Virtio drivers, like virtio-net in our case, via
> >>>>>>>> the virtio-vdpa driver. Doing that, the device is visible
> >>>>>>>> to the Kernel networking stack and is exposed to userspace
> >>>>>>>> as a regular netdev.
> >>>>>>>>
> >>>>>>>> It can also be exposed to userspace thanks to the
> >>>>>>>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> >>>>>>>> passed to QEMU or Virtio-user PMD.
> >>>>>>>>
> >>>>>>>> While VDUSE support is already available in upstream
> >>>>>>>> Kernel, a couple of patches are required to support
> >>>>>>>> network device type:
> >>>>>>>>
> >>>>>>>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> >>>>>>>>
> >>>>>>>> In order to attach the created VDUSE device to the vDPA
> >>>>>>>> bus, a recent iproute2 version containing the vdpa tool is
> >>>>>>>> required.
> >>>>>>>
> >>>>>>> Hi Maxime,
> >>>>>>>
> >>>>>>> Is this a replacement to the existing DPDK vDPA framework? What
> >>>>>>> is the
> >>>>>>> plan for long term?
> >>>>>>>
> >>>>>>
> >>>>>> No, this is not a replacement for DPDK vDPA framework.
> >>>>>>
> >>>>>> We (Red Hat) don't have plans to support DPDK vDPA framework in our
> >>>>>> products, but there are still contribution to DPDK vDPA by several
> >>>>>> vDPA
> >>>>>> hardware vendors (Intel, Nvidia, Xilinx), so I don't think it is
> >>>>>> going
> >>>>>> to be deprecated soon.
> >>>>>
> >>>>> Ferruh's question made me curious...
> >>>>>
> >>>>> I don't know anything about VDUSE or vDPA, and don't use any of it, so
> >>>>> consider me ignorant in this area.
> >>>>>
> >>>>> Is VDUSE an alternative to the existing DPDK vDPA framework? What are
> >>>>> the
> >>>>> differences, e.g. in which cases would an application developer (or
> >>>>> user)
> >>>>> choose one or the other?
> >>>>
> >>>> Maxime should give better explanation.. but let me just explain a bit.
> >>>>
> >>>> Vendors have vDPA HW that support vDPA framework (most likely in their
> >>>> DPU/IPU
> >>>> products). This work is introducing a way to emulate a SW vDPA
> >>>> device in
> >>>> userspace (DPDK), and this SW vDPA device also supports vDPA framework.
> >>>>
> >>>> So it's not an alternative to existing DPDK vDPA framework :)
> >>>
> >>> Correct.
> >>>
> >>> When using DPDK vDPA, the datapath of a Vhost-user port is offloaded to
> >>> a compatible physical NIC (i.e. a NIC that implements Virtio rings
> >>> support), the control path remains the same as a regular Vhost-user
> >>> port, i.e. it provides a Vhost-user unix socket to the application (like
> >>> QEMU or DPDK Virtio-user PMD).
> >>>
> >>> When using Kernel vDPA, the datapath is also offloaded to a vDPA
> >>> compatible device, and the control path is managed by the vDPA bus.
> >>> It can either be consumed by a Kernel Virtio device (here Virtio-net)
> >>> when using Virtio-vDPA. In this case the device is exposed as a regular
> >>> netdev and, in the case of Kubernetes, can be used as primary interfaces
> >>> for the pods.
> >>> Or it can be exposed to user-space via Vhost-vDPA, a chardev that can be
> >>> seen as an alternative to Vhost-user sockets. In this case it can for
> >>> example be used by QEMU or DPDK Virtio-user PMD. In Kubernetes, it can
> >>> be used as a secondary interface.
> >>>
> >>> Now comes VDUSE. VDUSE is a Kernel vDPA device, but instead of being a
> >>> physical device where the Virtio datapath is offloaded, the Virtio
> >>> datapath is offloaded to a user-space application. With this series, a
> >>> DPDK application, like OVS-DPDK for instance, can create VDUSE device
> >>> and expose them either as regular netdev when binding them to Kernel
> >>> Virtio-net driver via Virtio-vDPA, or as Vhost-vDPA interface to be
> >>> consumed by another userspace appliation like QEMU or DPDK application
> >>> using Virtio-user PMD. With this solution, OVS-DPDK could serve both
> >>> primary and secondary interfaces of Kubernetes pods.
> >>>
> >>> I hope it clarifies, I will add these information in the cover-letter
> >>> for next revisions. Let me know if anything is still unclear.
> >>>
> >>> I did a presentation at last DPDK summit [0], maybe the diagrams will
> >>> help to clarify furthermore.
> >>>
> >>
> >> Thanks Chenbo, Maxime for clarification.
> >>
> >> After reading a little more (I think) I got it better, slides [0] were
> >> useful.
> >>
> >> So this is more like a backend/handler, similar to vhost-user, although
> >> it is vDPA device emulation.
> >> Can you please describe more the benefit of vduse comparing to
> >> vhost-user?
> >
> > The main benefit is that VDUSE device can be exposed as a regular
> > netdev, while this is not possible with Vhost-user.
> >
>
> Got it, thanks. I think better to highlight this in commit logs.
>
> And out of curiosity,
> Why there is no virtio(guest) to virtio(host) communication support
> (without vdpa), something like adding virtio as backend to vhost-net, is
> it not needed or technical difficulties?

The main reason is that a lot of operations are not supported by virtio yet:

1) virtqueue saving, restoring
2) provisioning and management
3) address space id etc

Thanks

>
> >> Also what is "VDUSE daemon", which is referred a few times in
> >> documentation, is it another userspace implementation of the vduse?
> >
> > VDUSE daemon is the application that implements the VDUSE device, e.g.
> > OVS-DPDK with DPDK Vhost library using this series in our case.
> >
> > Maxime
> >>
> >>>>>
> >>>>> And if it is a better alternative, perhaps the documentation should
> >>>>> mention that it is recommended over DPDK vDPA. Just like we started
> >>>>> recommending alternatives to the KNI driver, so we could phase it out
> >>>>> and
> >>>>> eventually get rid of it.
> >>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Maxime
> >>>>
> >>>
> >>> [0]:
> >>> https://static.sched.com/hosted_files/dpdkuserspace22/9f/Open%20DPDK%20to%20containers%20networking%20with%20VDUSE.pdf
> >>>
> >>
> >
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry
  2023-03-31 15:42 ` [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
@ 2023-04-17 19:15   ` Mike Pattrick
  2023-04-24  2:58   ` Xia, Chenbo
  1 sibling, 0 replies; 79+ messages in thread
From: Mike Pattrick @ 2023-04-17 19:15 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, david.marchand, chenbo.xia, fbl, jasowang, cunming.liang,
	xieyongji, echaudro, eperezma, amorenoz, stable

On Fri, Mar 31, 2023 at 11:43 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap check")
> fixed the check to ensure the entry to be removed does not
> overlap with the next one in the IOTLB cache before marking
> it as DONTDUMP with madvise(). This is not enough, because
> the same issue is present when comparing with the previous
> entry in the cache, where the end address of the previous
> entry should be used, not the start one.
>
> Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Hi Maxime,

This makes sense.

Acked-by: Mike Pattrick <mkp@redhat.com>

> ---
>  lib/vhost/iotlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 3f45bc6061..870c8acb88 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
>                         mask = ~(alignment - 1);
>
>                         /* Don't disable coredump if the previous node is in the same page */
> -                       if (prev_node == NULL ||
> -                                       (node->uaddr & mask) != (prev_node->uaddr & mask)) {
> +                       if (prev_node == NULL || (node->uaddr & mask) !=
> +                                       ((prev_node->uaddr + prev_node->size - 1) & mask)) {
>                                 next_node = RTE_TAILQ_NEXT(node, next);
>                                 /* Don't disable coredump if the next node is in the same page */
>                                 if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
> @@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
>                         mask = ~(alignment-1);
>
>                         /* Don't disable coredump if the previous node is in the same page */
> -                       if (prev_node == NULL ||
> -                                       (node->uaddr & mask) != (prev_node->uaddr & mask)) {
> +                       if (prev_node == NULL || (node->uaddr & mask) !=
> +                                       ((prev_node->uaddr + prev_node->size - 1) & mask)) {
>                                 next_node = RTE_TAILQ_NEXT(node, next);
>                                 /* Don't disable coredump if the next node is in the same page */
>                                 if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 05/27] vhost: add helper for IOTLB entries shared page check
  2023-03-31 15:42 ` [RFC 05/27] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
@ 2023-04-17 19:39   ` Mike Pattrick
  2023-04-19  9:35     ` Maxime Coquelin
  2023-04-24  2:59   ` Xia, Chenbo
  1 sibling, 1 reply; 79+ messages in thread
From: Mike Pattrick @ 2023-04-17 19:39 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, david.marchand, chenbo.xia, fbl, jasowang, cunming.liang,
	xieyongji, echaudro, eperezma, amorenoz

Hi Maxime,

On Fri, Mar 31, 2023 at 11:43 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> This patch introduces a helper to check whether two IOTLB
> entries share a page.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index e8f1cb661e..d919f74704 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
>
>  #define IOTLB_CACHE_SIZE 2048
>
> +static bool
> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
> +               uint64_t align)
> +{
> +       uint64_t a_end, b_start;
> +
> +       if (a == NULL || b == NULL)
> +               return false;
> +
> +       /* Assumes entry a lower than entry b */
> +       RTE_ASSERT(a->uaddr < b->uaddr);
> +       a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
> +       b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
> +
> +       return a_end > b_start;

This may be a very minor detail, but it might improve readability if
the a_end calculation used addr + size - 1 and the return conditional
used >=.


Cheers,
M


> +}
> +
>  static void
>  vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
>  {
> @@ -37,16 +54,14 @@ static void
>  vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
>                 struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>  {
> -       uint64_t align, mask;
> +       uint64_t align;
>
>         align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> -       mask = ~(align - 1);
>
>         /* Don't disable coredump if the previous node is in the same page */
> -       if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
> +       if (!vhost_user_iotlb_share_page(prev, node, align)) {
>                 /* Don't disable coredump if the next node is in the same page */
> -               if (next == NULL ||
> -                               ((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
> +               if (!vhost_user_iotlb_share_page(node, next, align))
>                         mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
>         }
>  }
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 05/27] vhost: add helper for IOTLB entries shared page check
  2023-04-17 19:39   ` Mike Pattrick
@ 2023-04-19  9:35     ` Maxime Coquelin
  2023-04-19 14:52       ` Mike Pattrick
  0 siblings, 1 reply; 79+ messages in thread
From: Maxime Coquelin @ 2023-04-19  9:35 UTC (permalink / raw)
  To: Mike Pattrick
  Cc: dev, david.marchand, chenbo.xia, fbl, jasowang, cunming.liang,
	xieyongji, echaudro, eperezma, amorenoz

Hi Mike,

On 4/17/23 21:39, Mike Pattrick wrote:
> Hi Maxime,
> 
> On Fri, Mar 31, 2023 at 11:43 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> This patch introduces a helper to check whether two IOTLB
>> entries share a page.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
>> index e8f1cb661e..d919f74704 100644
>> --- a/lib/vhost/iotlb.c
>> +++ b/lib/vhost/iotlb.c
>> @@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
>>
>>   #define IOTLB_CACHE_SIZE 2048
>>
>> +static bool
>> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
>> +               uint64_t align)
>> +{
>> +       uint64_t a_end, b_start;
>> +
>> +       if (a == NULL || b == NULL)
>> +               return false;
>> +
>> +       /* Assumes entry a lower than entry b */
>> +       RTE_ASSERT(a->uaddr < b->uaddr);
>> +       a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
>> +       b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
>> +
>> +       return a_end > b_start;
> 
> This may be a very minor detail, but it might improve readability if
> the a_end calculation used addr + size - 1 and the return conditional
> used >=.

That's actually how I initially implemented it, but discussing with
David we agreed it would be better that way for consistency with
vhost_user_iotlb_clear_dump().

Regards,
Maxime

> 
> Cheers,
> M
> 
> 
>> +}
>> +
>>   static void
>>   vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
>>   {
>> @@ -37,16 +54,14 @@ static void
>>   vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
>>                  struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>>   {
>> -       uint64_t align, mask;
>> +       uint64_t align;
>>
>>          align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
>> -       mask = ~(align - 1);
>>
>>          /* Don't disable coredump if the previous node is in the same page */
>> -       if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
>> +       if (!vhost_user_iotlb_share_page(prev, node, align)) {
>>                  /* Don't disable coredump if the next node is in the same page */
>> -               if (next == NULL ||
>> -                               ((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
>> +               if (!vhost_user_iotlb_share_page(node, next, align))
>>                          mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
>>          }
>>   }
>> --
>> 2.39.2
>>
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 05/27] vhost: add helper for IOTLB entries shared page check
  2023-04-19  9:35     ` Maxime Coquelin
@ 2023-04-19 14:52       ` Mike Pattrick
  0 siblings, 0 replies; 79+ messages in thread
From: Mike Pattrick @ 2023-04-19 14:52 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, david.marchand, chenbo.xia, fbl, jasowang, cunming.liang,
	xieyongji, echaudro, eperezma, amorenoz

On Wed, Apr 19, 2023 at 5:35 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi Mike,
>
> On 4/17/23 21:39, Mike Pattrick wrote:
> > Hi Maxime,
> >
> > On Fri, Mar 31, 2023 at 11:43 AM Maxime Coquelin
> > <maxime.coquelin@redhat.com> wrote:
> >>
> >> This patch introduces a helper to check whether two IOTLB
> >> entries share a page.
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>   lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
> >>   1 file changed, 20 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> >> index e8f1cb661e..d919f74704 100644
> >> --- a/lib/vhost/iotlb.c
> >> +++ b/lib/vhost/iotlb.c
> >> @@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
> >>
> >>   #define IOTLB_CACHE_SIZE 2048
> >>
> >> +static bool
> >> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
> >> +               uint64_t align)
> >> +{
> >> +       uint64_t a_end, b_start;
> >> +
> >> +       if (a == NULL || b == NULL)
> >> +               return false;
> >> +
> >> +       /* Assumes entry a lower than entry b */
> >> +       RTE_ASSERT(a->uaddr < b->uaddr);
> >> +       a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
> >> +       b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
> >> +
> >> +       return a_end > b_start;
> >
> > This may be a very minor detail, but it might improve readability if
> > the a_end calculation used addr + size - 1 and the return conditional
> > used >=.
>
> That's actually how I initially implemented it, but discussing with
> David we agreed it would be better that way for consistency with
> vhost_user_iotlb_clear_dump().

I can get behind consistency.

Acked-by: Mike Pattrick <mkp@redhat.com>

>
> Regards,
> Maxime
>
> >
> > Cheers,
> > M
> >
> >
> >> +}
> >> +
> >>   static void
> >>   vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
> >>   {
> >> @@ -37,16 +54,14 @@ static void
> >>   vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
> >>                  struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
> >>   {
> >> -       uint64_t align, mask;
> >> +       uint64_t align;
> >>
> >>          align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> >> -       mask = ~(align - 1);
> >>
> >>          /* Don't disable coredump if the previous node is in the same page */
> >> -       if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
> >> +       if (!vhost_user_iotlb_share_page(prev, node, align)) {
> >>                  /* Don't disable coredump if the next node is in the same page */
> >> -               if (next == NULL ||
> >> -                               ((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
> >> +               if (!vhost_user_iotlb_share_page(node, next, align))
> >>                          mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
> >>          }
> >>   }
> >> --
> >> 2.39.2
> >>
> >
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 06/27] vhost: don't dump unneeded pages with IOTLB
  2023-03-31 15:42 ` [RFC 06/27] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
@ 2023-04-20 17:11   ` Mike Pattrick
  2023-04-24  3:00   ` Xia, Chenbo
  1 sibling, 0 replies; 79+ messages in thread
From: Mike Pattrick @ 2023-04-20 17:11 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, david.marchand, chenbo.xia, fbl, jasowang, cunming.liang,
	xieyongji, echaudro, eperezma, amorenoz, stable

On Fri, Mar 31, 2023 at 11:43 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> On IOTLB entry removal, previous fixes took care of not
> marking pages shared with other IOTLB entries as DONTDUMP.
>
> However, if an IOTLB entry is spanned on multiple pages,
> the other pages were kept as DODUMP while they might not
> have been shared with other entries, increasing needlessly
> the coredump size.
>
> This patch addresses this issue by excluding only the
> shared pages from madvise's DONTDUMP.
>
> Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Looks good to me.

Acked-by: Mike Pattrick <mkp@redhat.com>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 01/27] vhost: fix missing guest notif stat increment
  2023-03-31 15:42 ` [RFC 01/27] vhost: fix missing guest notif stat increment Maxime Coquelin
@ 2023-04-24  2:57   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  2:57 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz
  Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [RFC 01/27] vhost: fix missing guest notif stat increment
> 
> Guest notification counter was only incremented for split
> ring, this patch adds it also for packed ring.
> 
> Fixes: 1ea74efd7fa4 ("vhost: add statistics for guest notification")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8fdab13c70..8554ab4002 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -973,6 +973,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  kick:
>  	if (kick) {
>  		eventfd_write(vq->callfd, (eventfd_t)1);
> +		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +			vq->stats.guest_notifications++;
>  		if (dev->notify_ops->guest_notified)
>  			dev->notify_ops->guest_notified(dev->vid);
>  	}
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 02/27] vhost: fix invalid call FD handling
  2023-03-31 15:42 ` [RFC 02/27] vhost: fix invalid call FD handling Maxime Coquelin
@ 2023-04-24  2:58   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  2:58 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz
  Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [RFC 02/27] vhost: fix invalid call FD handling
> 
> This patch fixes cases where IRQ injection is tried while
> the call FD is not valid, which should not happen.
> 
> Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
> Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification
> suppression")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8554ab4002..40863f7bfd 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -902,9 +902,9 @@ vhost_vring_call_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  			"%s: used_event_idx=%d, old=%d, new=%d\n",
>  			__func__, vhost_used_event(vq), old, new);
> 
> -		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
> -					(vq->callfd >= 0)) ||
> -				unlikely(!signalled_used_valid)) {
> +		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
> +					unlikely(!signalled_used_valid)) &&
> +				vq->callfd >= 0) {
>  			eventfd_write(vq->callfd, (eventfd_t) 1);
>  			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>  				vq->stats.guest_notifications++;
> @@ -971,7 +971,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  	if (vhost_need_event(off, new, old))
>  		kick = true;
>  kick:
> -	if (kick) {
> +	if (kick && vq->callfd >= 0) {
>  		eventfd_write(vq->callfd, (eventfd_t)1);
>  		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>  			vq->stats.guest_notifications++;
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry
  2023-03-31 15:42 ` [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
  2023-04-17 19:15   ` Mike Pattrick
@ 2023-04-24  2:58   ` Xia, Chenbo
  1 sibling, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  2:58 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz
  Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [RFC 03/27] vhost: fix IOTLB entries overlap check with previous
> entry
> 
> Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap
> check")
> fixed the check to ensure the entry to be removed does not
> overlap with the next one in the IOTLB cache before marking
> it as DONTDUMP with madvise(). This is not enough, because
> the same issue is present when comparing with the previous
> entry in the cache, where the end address of the previous
> entry should be used, not the start one.
> 
> Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 3f45bc6061..870c8acb88 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev, struct vhost_virtque
>  			mask = ~(alignment - 1);
> 
>  			/* Don't disable coredump if the previous node is in the
> same page */
> -			if (prev_node == NULL ||
> -					(node->uaddr & mask) != (prev_node->uaddr &
> mask)) {
> +			if (prev_node == NULL || (node->uaddr & mask) !=
> +					((prev_node->uaddr + prev_node->size - 1) &
> mask)) {
>  				next_node = RTE_TAILQ_NEXT(node, next);
>  				/* Don't disable coredump if the next node is in
> the same page */
>  				if (next_node == NULL || ((node->uaddr + node-
> >size - 1) & mask) !=
> @@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> struct vhost_virtqueue *vq
>  			mask = ~(alignment-1);
> 
>  			/* Don't disable coredump if the previous node is in the
> same page */
> -			if (prev_node == NULL ||
> -					(node->uaddr & mask) != (prev_node->uaddr &
> mask)) {
> +			if (prev_node == NULL || (node->uaddr & mask) !=
> +					((prev_node->uaddr + prev_node->size - 1) &
> mask)) {
>  				next_node = RTE_TAILQ_NEXT(node, next);
>  				/* Don't disable coredump if the next node is in
> the same page */
>  				if (next_node == NULL || ((node->uaddr + node-
> >size - 1) & mask) !=
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 04/27] vhost: add helper of IOTLB entries coredump
  2023-03-31 15:42 ` [RFC 04/27] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-04-24  2:59   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  2:59 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 04/27] vhost: add helper of IOTLB entries coredump
> 
> This patch reworks IOTLB code to extract madvise-related
> bits into dedicated helper. This refactoring improves code
> sharing.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 77 +++++++++++++++++++++++++----------------------
>  1 file changed, 41 insertions(+), 36 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 870c8acb88..e8f1cb661e 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -23,6 +23,34 @@ struct vhost_iotlb_entry {
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
> +static void
> +vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node)
> +{
> +	uint64_t align;
> +
> +	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> +
> +	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
> align);
> +}
> +
> +static void
> +vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node,
> +		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
> +{
> +	uint64_t align, mask;
> +
> +	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> +	mask = ~(align - 1);
> +
> +	/* Don't disable coredump if the previous node is in the same page
> */
> +	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev-
> >size - 1) & mask)) {
> +		/* Don't disable coredump if the next node is in the same page
> */
> +		if (next == NULL ||
> +				((node->uaddr + node->size - 1) & mask) != (next-
> >uaddr & mask))
> +			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
> false, align);
> +	}
> +}
> +
>  static struct vhost_iotlb_entry *
>  vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
>  {
> @@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
> *dev, struct vhost_virtqueue
>  	rte_rwlock_write_lock(&vq->iotlb_lock);
> 
>  	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> -		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
> -			hua_to_alignment(dev->mem, (void *)(uintptr_t)node-
> >uaddr));
> +		vhost_user_iotlb_set_dump(dev, node);
> +
>  		TAILQ_REMOVE(&vq->iotlb_list, node, next);
>  		vhost_user_iotlb_pool_put(vq, node);
>  	}
> @@ -164,7 +192,6 @@ static void
>  vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  {
>  	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
> -	uint64_t alignment, mask;
>  	int entry_idx;
> 
>  	rte_rwlock_write_lock(&vq->iotlb_lock);
> @@ -173,20 +200,10 @@ vhost_user_iotlb_cache_random_evict(struct
> virtio_net *dev, struct vhost_virtque
> 
>  	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
>  		if (!entry_idx) {
> -			struct vhost_iotlb_entry *next_node;
> -			alignment = hua_to_alignment(dev->mem, (void
> *)(uintptr_t)node->uaddr);
> -			mask = ~(alignment - 1);
> -
> -			/* Don't disable coredump if the previous node is in the
> same page */
> -			if (prev_node == NULL || (node->uaddr & mask) !=
> -					((prev_node->uaddr + prev_node->size - 1) &
> mask)) {
> -				next_node = RTE_TAILQ_NEXT(node, next);
> -				/* Don't disable coredump if the next node is in
> the same page */
> -				if (next_node == NULL || ((node->uaddr + node-
> >size - 1) & mask) !=
> -						(next_node->uaddr & mask))
> -					mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size,
> -							false, alignment);
> -			}
> +			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> +
> +			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> +
>  			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(vq, node);
>  			vq->iotlb_cache_nr--;
> @@ -240,16 +257,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> struct vhost_virtqueue *vq
>  			vhost_user_iotlb_pool_put(vq, new_node);
>  			goto unlock;
>  		} else if (node->iova > new_node->iova) {
> -			mem_set_dump((void *)(uintptr_t)new_node->uaddr,
> new_node->size, true,
> -				hua_to_alignment(dev->mem, (void
> *)(uintptr_t)new_node->uaddr));
> +			vhost_user_iotlb_set_dump(dev, new_node);
> +
>  			TAILQ_INSERT_BEFORE(node, new_node, next);
>  			vq->iotlb_cache_nr++;
>  			goto unlock;
>  		}
>  	}
> 
> -	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size,
> true,
> -		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node-
> >uaddr));
> +	vhost_user_iotlb_set_dump(dev, new_node);
> +
>  	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
>  	vq->iotlb_cache_nr++;
> 
> @@ -265,7 +282,6 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> struct vhost_virtqueue *vq
>  					uint64_t iova, uint64_t size)
>  {
>  	struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
> -	uint64_t alignment, mask;
> 
>  	if (unlikely(!size))
>  		return;
> @@ -278,20 +294,9 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> struct vhost_virtqueue *vq
>  			break;
> 
>  		if (iova < node->iova + node->size) {
> -			struct vhost_iotlb_entry *next_node;
> -			alignment = hua_to_alignment(dev->mem, (void
> *)(uintptr_t)node->uaddr);
> -			mask = ~(alignment-1);
> -
> -			/* Don't disable coredump if the previous node is in the
> same page */
> -			if (prev_node == NULL || (node->uaddr & mask) !=
> -					((prev_node->uaddr + prev_node->size - 1) &
> mask)) {
> -				next_node = RTE_TAILQ_NEXT(node, next);
> -				/* Don't disable coredump if the next node is in
> the same page */
> -				if (next_node == NULL || ((node->uaddr + node-
> >size - 1) & mask) !=
> -						(next_node->uaddr & mask))
> -					mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size,
> -							false, alignment);
> -			}
> +			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> +
> +			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> 
>  			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(vq, node);
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 05/27] vhost: add helper for IOTLB entries shared page check
  2023-03-31 15:42 ` [RFC 05/27] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
  2023-04-17 19:39   ` Mike Pattrick
@ 2023-04-24  2:59   ` Xia, Chenbo
  1 sibling, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  2:59 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 05/27] vhost: add helper for IOTLB entries shared page check
> 
> This patch introduces a helper to check whether two IOTLB
> entries share a page.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index e8f1cb661e..d919f74704 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
> +static bool
> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b,
> +		uint64_t align)
> +{
> +	uint64_t a_end, b_start;
> +
> +	if (a == NULL || b == NULL)
> +		return false;
> +
> +	/* Assumes entry a lower than entry b */
> +	RTE_ASSERT(a->uaddr < b->uaddr);
> +	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
> +	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
> +
> +	return a_end > b_start;
> +}
> +
>  static void
>  vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node)
>  {
> @@ -37,16 +54,14 @@ static void
>  vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node,
>  		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>  {
> -	uint64_t align, mask;
> +	uint64_t align;
> 
>  	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> -	mask = ~(align - 1);
> 
>  	/* Don't disable coredump if the previous node is in the same page
> */
> -	if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev-
> >size - 1) & mask)) {
> +	if (!vhost_user_iotlb_share_page(prev, node, align)) {
>  		/* Don't disable coredump if the next node is in the same page
> */
> -		if (next == NULL ||
> -				((node->uaddr + node->size - 1) & mask) != (next-
> >uaddr & mask))
> +		if (!vhost_user_iotlb_share_page(node, next, align))
>  			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
> false, align);
>  	}
>  }
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 06/27] vhost: don't dump unneeded pages with IOTLB
  2023-03-31 15:42 ` [RFC 06/27] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
  2023-04-20 17:11   ` Mike Pattrick
@ 2023-04-24  3:00   ` Xia, Chenbo
  1 sibling, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-24  3:00 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz
  Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [RFC 06/27] vhost: don't dump unneeded pages with IOTLB
> 
> On IOTLB entry removal, previous fixes took care of not
> marking pages shared with other IOTLB entries as DONTDUMP.
> 
> However, if an IOTLB entry is spanned on multiple pages,
> the other pages were kept as DODUMP while they might not
> have been shared with other entries, increasing needlessly
> the coredump size.
> 
> This patch addresses this issue by excluding only the
> shared pages from madvise's DONTDUMP.
> 
> Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index d919f74704..f598c0a8c4 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -54,16 +54,23 @@ static void
>  vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node,
>  		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>  {
> -	uint64_t align;
> +	uint64_t align, start, end;
> +
> +	start = node->uaddr;
> +	end = node->uaddr + node->size;
> 
>  	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> 
> -	/* Don't disable coredump if the previous node is in the same page
> */
> -	if (!vhost_user_iotlb_share_page(prev, node, align)) {
> -		/* Don't disable coredump if the next node is in the same page
> */
> -		if (!vhost_user_iotlb_share_page(node, next, align))
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
> false, align);
> -	}
> +	/* Skip first page if shared with previous entry. */
> +	if (vhost_user_iotlb_share_page(prev, node, align))
> +		start = RTE_ALIGN_CEIL(start, align);
> +
> +	/* Skip last page if shared with next entry. */
> +	if (vhost_user_iotlb_share_page(node, next, align))
> +		end = RTE_ALIGN_FLOOR(end, align);
> +
> +	if (end > start)
> +		mem_set_dump((void *)(uintptr_t)start, end - start, false,
> align);
>  }
> 
>  static struct vhost_iotlb_entry *
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 07/27] vhost: change to single IOTLB cache per device
  2023-03-31 15:42 ` [RFC 07/27] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-04-25  6:19   ` Xia, Chenbo
  2023-05-03 13:47     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-25  6:19 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 07/27] vhost: change to single IOTLB cache per device
> 
> This patch simplifies IOTLB implementation and improves
> IOTLB memory consumption by having a single IOTLB cache
> per device, instead of having one per queue.
> 
> In order to not impact performance, it keeps an IOTLB lock
> per virtqueue, so that there is no contention between
> multiple queue trying to acquire it.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
>  lib/vhost/iotlb.h      |  43 ++++++---
>  lib/vhost/vhost.c      |  18 ++--
>  lib/vhost/vhost.h      |  16 ++--
>  lib/vhost/vhost_user.c |  25 +++--
>  5 files changed, 160 insertions(+), 154 deletions(-)
> 

[...]

> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index d60e39b6bc..81ebef0137 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -7,7 +7,7 @@
>   * The vhost-user protocol connection is an external interface, so it
> must be
>   * robust against invalid inputs.
>   *
> - * This is important because the vhost-user frontend is only one step
> removed
> +* This is important because the vhost-user frontend is only one step
> removed

This is changed by accident?

Thanks,
Chenbo

>   * from the guest.  Malicious guests that have escaped will then launch
> further
>   * attacks from the vhost-user frontend.
>   *
> @@ -237,6 +237,8 @@ vhost_backend_cleanup(struct virtio_net *dev)
>  	}
> 
>  	dev->postcopy_listening = 0;
> +
> +	vhost_user_iotlb_destroy(dev);
>  }
> 
>  static void
> @@ -539,7 +541,6 @@ numa_realloc(struct virtio_net **pdev, struct
> vhost_virtqueue **pvq)
>  	if (vq != dev->virtqueue[vq->index]) {
>  		VHOST_LOG_CONFIG(dev->ifname, INFO, "reallocated virtqueue on
> node %d\n", node);
>  		dev->virtqueue[vq->index] = vq;
> -		vhost_user_iotlb_init(dev, vq);
>  	}
> 
>  	if (vq_is_packed(dev)) {
> @@ -664,6 +665,8 @@ numa_realloc(struct virtio_net **pdev, struct
> vhost_virtqueue **pvq)
>  		return;
>  	}
>  	dev->guest_pages = gp;
> +
> +	vhost_user_iotlb_init(dev);
>  }
>  #else
>  static void
> @@ -1360,8 +1363,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> 
>  		/* Flush IOTLB cache as previous HVAs are now invalid */
>  		if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> -			for (i = 0; i < dev->nr_vring; i++)
> -				vhost_user_iotlb_flush_all(dev, dev->virtqueue[i]);
> +			vhost_user_iotlb_flush_all(dev);
> 
>  		free_mem_region(dev);
>  		rte_free(dev->mem);
> @@ -2194,7 +2196,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
>  	ctx->msg.size = sizeof(ctx->msg.payload.state);
>  	ctx->fd_num = 0;
> 
> -	vhost_user_iotlb_flush_all(dev, vq);
> +	vhost_user_iotlb_flush_all(dev);
> 
>  	vring_invalidate(dev, vq);
> 
> @@ -2639,15 +2641,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		if (!vva)
>  			return RTE_VHOST_MSG_RESULT_ERR;
> 
> +		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg-
> >perm);
> +
>  		for (i = 0; i < dev->nr_vring; i++) {
>  			struct vhost_virtqueue *vq = dev->virtqueue[i];
> 
>  			if (!vq)
>  				continue;
> 
> -			vhost_user_iotlb_cache_insert(dev, vq, imsg->iova, vva,
> -					len, imsg->perm);
> -
>  			if (is_vring_iotlb(dev, vq, imsg)) {
>  				rte_spinlock_lock(&vq->access_lock);
>  				translate_ring_addresses(&dev, &vq);
> @@ -2657,15 +2658,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		}
>  		break;
>  	case VHOST_IOTLB_INVALIDATE:
> +		vhost_user_iotlb_cache_remove(dev, imsg->iova, imsg->size);
> +
>  		for (i = 0; i < dev->nr_vring; i++) {
>  			struct vhost_virtqueue *vq = dev->virtqueue[i];
> 
>  			if (!vq)
>  				continue;
> 
> -			vhost_user_iotlb_cache_remove(dev, vq, imsg->iova,
> -					imsg->size);
> -
>  			if (is_vring_iotlb(dev, vq, imsg)) {
>  				rte_spinlock_lock(&vq->access_lock);
>  				vring_invalidate(dev, vq);
> @@ -2674,8 +2674,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		}
>  		break;
>  	default:
> -		VHOST_LOG_CONFIG(dev->ifname, ERR,
> -			"invalid IOTLB message type (%d)\n",
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "invalid IOTLB message type
> (%d)\n",
>  			imsg->type);
>  		return RTE_VHOST_MSG_RESULT_ERR;
>  	}
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 08/27] vhost: add offset field to IOTLB entries
  2023-03-31 15:42 ` [RFC 08/27] vhost: add offset field to IOTLB entries Maxime Coquelin
@ 2023-04-25  6:20   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-25  6:20 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 08/27] vhost: add offset field to IOTLB entries
> 
> This patch is a preliminary work to prepare for VDUSE
> support, for which we need to keep track of the mmaped base
> address and offset in order to be able to unmap it later
> when IOTLB entry is invalidated.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c      | 30 ++++++++++++++++++------------
>  lib/vhost/iotlb.h      |  2 +-
>  lib/vhost/vhost_user.c |  2 +-
>  3 files changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index a91115cf1c..51f118bc48 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -17,6 +17,7 @@ struct vhost_iotlb_entry {
> 
>  	uint64_t iova;
>  	uint64_t uaddr;
> +	uint64_t uoffset;
>  	uint64_t size;
>  	uint8_t perm;
>  };
> @@ -27,15 +28,18 @@ static bool
>  vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b,
>  		uint64_t align)
>  {
> -	uint64_t a_end, b_start;
> +	uint64_t a_start, a_end, b_start;
> 
>  	if (a == NULL || b == NULL)
>  		return false;
> 
> +	a_start = a->uaddr + a->uoffset;
> +	b_start = b->uaddr + b->uoffset;
> +
>  	/* Assumes entry a lower than entry b */
> -	RTE_ASSERT(a->uaddr < b->uaddr);
> -	a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
> -	b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
> +	RTE_ASSERT(a_start < b_start);
> +	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
> +	b_start = RTE_ALIGN_FLOOR(b_start, align);
> 
>  	return a_end > b_start;
>  }
> @@ -43,11 +47,12 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry
> *a, struct vhost_iotlb_entr
>  static void
>  vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node)
>  {
> -	uint64_t align;
> +	uint64_t align, start;
> 
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> +	start = node->uaddr + node->uoffset;
> +	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> 
> -	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
> align);
> +	mem_set_dump((void *)(uintptr_t)start, node->size, false, align);
>  }
> 
>  static void
> @@ -56,10 +61,10 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev,
> struct vhost_iotlb_entry *no
>  {
>  	uint64_t align, start, end;
> 
> -	start = node->uaddr;
> -	end = node->uaddr + node->size;
> +	start = node->uaddr + node->uoffset;
> +	end = start + node->size;
> 
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
> +	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> 
>  	/* Skip first page if shared with previous entry. */
>  	if (vhost_user_iotlb_share_page(prev, node, align))
> @@ -234,7 +239,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
> 
>  void
>  vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -				uint64_t size, uint8_t perm)
> +				uint64_t uoffset, uint64_t size, uint8_t perm)
>  {
>  	struct vhost_iotlb_entry *node, *new_node;
> 
> @@ -256,6 +261,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
> 
>  	new_node->iova = iova;
>  	new_node->uaddr = uaddr;
> +	new_node->uoffset = uoffset;
>  	new_node->size = size;
>  	new_node->perm = perm;
> 
> @@ -344,7 +350,7 @@ vhost_user_iotlb_cache_find(struct virtio_net *dev,
> uint64_t iova, uint64_t *siz
> 
>  		offset = iova - node->iova;
>  		if (!vva)
> -			vva = node->uaddr + offset;
> +			vva = node->uaddr + node->uoffset + offset;
> 
>  		mapped += node->size - offset;
>  		iova = node->iova + node->size;
> diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
> index 3490b9e6be..bee36c5903 100644
> --- a/lib/vhost/iotlb.h
> +++ b/lib/vhost/iotlb.h
> @@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
>  }
> 
>  void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -					uint64_t size, uint8_t perm);
> +					uint64_t uoffset, uint64_t size, uint8_t
> perm);
>  void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova,
> uint64_t size);
>  uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t
> iova,
>  					uint64_t *size, uint8_t perm);
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 81ebef0137..93673d3902 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -2641,7 +2641,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		if (!vva)
>  			return RTE_VHOST_MSG_RESULT_ERR;
> 
> -		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg-
> >perm);
> +		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len,
> imsg->perm);
> 
>  		for (i = 0; i < dev->nr_vring; i++) {
>  			struct vhost_virtqueue *vq = dev->virtqueue[i];
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 09/27] vhost: add page size info to IOTLB entry
  2023-03-31 15:42 ` [RFC 09/27] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-04-25  6:20   ` Xia, Chenbo
  2023-05-03 13:57     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-04-25  6:20 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 09/27] vhost: add page size info to IOTLB entry
> 
> VDUSE will close the file descriptor after having mapped
> the shared memory, so it will not be possible to get the
> page size afterwards.
> 
> This patch adds an new page_shift field to the IOTLB entry,
> so that the information will be passed at IOTLB cache
> insertion time. The information is stored as a bit shift
> value so that IOTLB entry keeps fitting in a single
> cacheline.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
>  lib/vhost/iotlb.h      |  2 +-
>  lib/vhost/vhost.h      |  1 -
>  lib/vhost/vhost_user.c |  8 +++++---
>  4 files changed, 28 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 51f118bc48..188dfb8e38 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
>  	uint64_t uaddr;
>  	uint64_t uoffset;
>  	uint64_t size;
> +	uint8_t page_shift;
>  	uint8_t perm;
>  };
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
>  static bool
> -vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b,
> -		uint64_t align)
> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b)
>  {
>  	uint64_t a_start, a_end, b_start;
> 
> @@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry
> *a, struct vhost_iotlb_entr
> 
>  	/* Assumes entry a lower than entry b */
>  	RTE_ASSERT(a_start < b_start);
> -	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
> -	b_start = RTE_ALIGN_FLOOR(b_start, align);
> +	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
> +	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
> 
>  	return a_end > b_start;
>  }
> 
>  static void
> -vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node)
> +vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
>  {
> -	uint64_t align, start;
> +	uint64_t start;
> 
>  	start = node->uaddr + node->uoffset;
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> -
> -	mem_set_dump((void *)(uintptr_t)start, node->size, false, align);
> +	mem_set_dump((void *)(uintptr_t)start, node->size, false,
> RTE_BIT64(node->page_shift));
>  }
> 
>  static void
> -vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
> vhost_iotlb_entry *node,
> +vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
>  		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>  {
> -	uint64_t align, start, end;
> +	uint64_t start, end;
> 
>  	start = node->uaddr + node->uoffset;
>  	end = start + node->size;
> 
> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
> -
>  	/* Skip first page if shared with previous entry. */
> -	if (vhost_user_iotlb_share_page(prev, node, align))
> -		start = RTE_ALIGN_CEIL(start, align);
> +	if (vhost_user_iotlb_share_page(prev, node))
> +		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
> 
>  	/* Skip last page if shared with next entry. */
> -	if (vhost_user_iotlb_share_page(node, next, align))
> -		end = RTE_ALIGN_FLOOR(end, align);
> +	if (vhost_user_iotlb_share_page(node, next))
> +		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
> 
>  	if (end > start)
> -		mem_set_dump((void *)(uintptr_t)start, end - start, false,
> align);
> +		mem_set_dump((void *)(uintptr_t)start, end - start, false,
> +			RTE_BIT64(node->page_shift));
>  }
> 
>  static struct vhost_iotlb_entry *
> @@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
> *dev)
>  	vhost_user_iotlb_wr_lock_all(dev);
> 
>  	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
> -		vhost_user_iotlb_set_dump(dev, node);
> +		vhost_user_iotlb_set_dump(node);
> 
>  		TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  		vhost_user_iotlb_pool_put(dev, node);
> @@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
>  		if (!entry_idx) {
>  			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> 
> -			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> +			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(dev, node);
> @@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
> 
>  void
>  vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -				uint64_t uoffset, uint64_t size, uint8_t perm)
> +				uint64_t uoffset, uint64_t size, uint64_t
> page_size, uint8_t perm)
>  {
>  	struct vhost_iotlb_entry *node, *new_node;
> 
> @@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  	new_node->uaddr = uaddr;
>  	new_node->uoffset = uoffset;
>  	new_node->size = size;
> +	new_node->page_shift = __builtin_ctz(page_size);

__builtin_ctzll ?

Thanks,
Chenbo

>  	new_node->perm = perm;
> 
>  	vhost_user_iotlb_wr_lock_all(dev);
> @@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  			vhost_user_iotlb_pool_put(dev, new_node);
>  			goto unlock;
>  		} else if (node->iova > new_node->iova) {
> -			vhost_user_iotlb_set_dump(dev, new_node);
> +			vhost_user_iotlb_set_dump(new_node);
> 
>  			TAILQ_INSERT_BEFORE(node, new_node, next);
>  			dev->iotlb_cache_nr++;
> @@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
> uint64_t iova, uint64_t ua
>  		}
>  	}
> 
> -	vhost_user_iotlb_set_dump(dev, new_node);
> +	vhost_user_iotlb_set_dump(new_node);
> 
>  	TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
>  	dev->iotlb_cache_nr++;
> @@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> uint64_t iova, uint64_t si
>  		if (iova < node->iova + node->size) {
>  			struct vhost_iotlb_entry *next_node =
> RTE_TAILQ_NEXT(node, next);
> 
> -			vhost_user_iotlb_clear_dump(dev, node, prev_node,
> next_node);
> +			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>  			vhost_user_iotlb_pool_put(dev, node);
> diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
> index bee36c5903..81ca04df21 100644
> --- a/lib/vhost/iotlb.h
> +++ b/lib/vhost/iotlb.h
> @@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
>  }
> 
>  void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
> uint64_t uaddr,
> -					uint64_t uoffset, uint64_t size, uint8_t
> perm);
> +		uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t
> perm);
>  void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova,
> uint64_t size);
>  uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t
> iova,
>  					uint64_t *size, uint8_t perm);
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 67cc4a2fdb..4ace5ab081 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -1016,6 +1016,5 @@ mbuf_is_consumed(struct rte_mbuf *m)
>  	return true;
>  }
> 
> -uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
>  void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t
> alignment);
>  #endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 93673d3902..a989f2c46d 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  	return log_gpa;
>  }
> 
> -uint64_t
> +static uint64_t
>  hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
>  {
>  	struct rte_vhost_mem_region *r;
> @@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  	struct virtio_net *dev = *pdev;
>  	struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
>  	uint16_t i;
> -	uint64_t vva, len;
> +	uint64_t vva, len, pg_sz;
> 
>  	switch (imsg->type) {
>  	case VHOST_IOTLB_UPDATE:
> @@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
>  		if (!vva)
>  			return RTE_VHOST_MSG_RESULT_ERR;
> 
> -		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len,
> imsg->perm);
> +		pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
> +
> +		vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len,
> pg_sz, imsg->perm);
> 
>  		for (i = 0; i < dev->nr_vring; i++) {
>  			struct vhost_virtqueue *vq = dev->virtqueue[i];
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 07/27] vhost: change to single IOTLB cache per device
  2023-04-25  6:19   ` Xia, Chenbo
@ 2023-05-03 13:47     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-03 13:47 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Chenbo,

On 4/25/23 08:19, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 07/27] vhost: change to single IOTLB cache per device
>>
>> This patch simplifies IOTLB implementation and improves
>> IOTLB memory consumption by having a single IOTLB cache
>> per device, instead of having one per queue.
>>
>> In order to not impact performance, it keeps an IOTLB lock
>> per virtqueue, so that there is no contention between
>> multiple queue trying to acquire it.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/iotlb.c      | 212 +++++++++++++++++++----------------------
>>   lib/vhost/iotlb.h      |  43 ++++++---
>>   lib/vhost/vhost.c      |  18 ++--
>>   lib/vhost/vhost.h      |  16 ++--
>>   lib/vhost/vhost_user.c |  25 +++--
>>   5 files changed, 160 insertions(+), 154 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>> index d60e39b6bc..81ebef0137 100644
>> --- a/lib/vhost/vhost_user.c
>> +++ b/lib/vhost/vhost_user.c
>> @@ -7,7 +7,7 @@
>>    * The vhost-user protocol connection is an external interface, so it
>> must be
>>    * robust against invalid inputs.
>>    *
>> - * This is important because the vhost-user frontend is only one step
>> removed
>> +* This is important because the vhost-user frontend is only one step
>> removed
> 
> This is changed by accident?

Yes, this will be fixed in v1.

Thanks,
Maxime

> Thanks,
> Chenbo


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 09/27] vhost: add page size info to IOTLB entry
  2023-04-25  6:20   ` Xia, Chenbo
@ 2023-05-03 13:57     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-03 13:57 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Chenbo,

On 4/25/23 08:20, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 09/27] vhost: add page size info to IOTLB entry
>>
>> VDUSE will close the file descriptor after having mapped
>> the shared memory, so it will not be possible to get the
>> page size afterwards.
>>
>> This patch adds an new page_shift field to the IOTLB entry,
>> so that the information will be passed at IOTLB cache
>> insertion time. The information is stored as a bit shift
>> value so that IOTLB entry keeps fitting in a single
>> cacheline.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/iotlb.c      | 46 ++++++++++++++++++++----------------------
>>   lib/vhost/iotlb.h      |  2 +-
>>   lib/vhost/vhost.h      |  1 -
>>   lib/vhost/vhost_user.c |  8 +++++---
>>   4 files changed, 28 insertions(+), 29 deletions(-)
>>
>> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
>> index 51f118bc48..188dfb8e38 100644
>> --- a/lib/vhost/iotlb.c
>> +++ b/lib/vhost/iotlb.c
>> @@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
>>   	uint64_t uaddr;
>>   	uint64_t uoffset;
>>   	uint64_t size;
>> +	uint8_t page_shift;
>>   	uint8_t perm;
>>   };
>>
>>   #define IOTLB_CACHE_SIZE 2048
>>
>>   static bool
>> -vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
>> vhost_iotlb_entry *b,
>> -		uint64_t align)
>> +vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
>> vhost_iotlb_entry *b)
>>   {
>>   	uint64_t a_start, a_end, b_start;
>>
>> @@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry
>> *a, struct vhost_iotlb_entr
>>
>>   	/* Assumes entry a lower than entry b */
>>   	RTE_ASSERT(a_start < b_start);
>> -	a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
>> -	b_start = RTE_ALIGN_FLOOR(b_start, align);
>> +	a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
>> +	b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
>>
>>   	return a_end > b_start;
>>   }
>>
>>   static void
>> -vhost_user_iotlb_set_dump(struct virtio_net *dev, struct
>> vhost_iotlb_entry *node)
>> +vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
>>   {
>> -	uint64_t align, start;
>> +	uint64_t start;
>>
>>   	start = node->uaddr + node->uoffset;
>> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
>> -
>> -	mem_set_dump((void *)(uintptr_t)start, node->size, false, align);
>> +	mem_set_dump((void *)(uintptr_t)start, node->size, false,
>> RTE_BIT64(node->page_shift));
>>   }
>>
>>   static void
>> -vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct
>> vhost_iotlb_entry *node,
>> +vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
>>   		struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
>>   {
>> -	uint64_t align, start, end;
>> +	uint64_t start, end;
>>
>>   	start = node->uaddr + node->uoffset;
>>   	end = start + node->size;
>>
>> -	align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
>> -
>>   	/* Skip first page if shared with previous entry. */
>> -	if (vhost_user_iotlb_share_page(prev, node, align))
>> -		start = RTE_ALIGN_CEIL(start, align);
>> +	if (vhost_user_iotlb_share_page(prev, node))
>> +		start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
>>
>>   	/* Skip last page if shared with next entry. */
>> -	if (vhost_user_iotlb_share_page(node, next, align))
>> -		end = RTE_ALIGN_FLOOR(end, align);
>> +	if (vhost_user_iotlb_share_page(node, next))
>> +		end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
>>
>>   	if (end > start)
>> -		mem_set_dump((void *)(uintptr_t)start, end - start, false,
>> align);
>> +		mem_set_dump((void *)(uintptr_t)start, end - start, false,
>> +			RTE_BIT64(node->page_shift));
>>   }
>>
>>   static struct vhost_iotlb_entry *
>> @@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
>> *dev)
>>   	vhost_user_iotlb_wr_lock_all(dev);
>>
>>   	RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
>> -		vhost_user_iotlb_set_dump(dev, node);
>> +		vhost_user_iotlb_set_dump(node);
>>
>>   		TAILQ_REMOVE(&dev->iotlb_list, node, next);
>>   		vhost_user_iotlb_pool_put(dev, node);
>> @@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
>> *dev)
>>   		if (!entry_idx) {
>>   			struct vhost_iotlb_entry *next_node =
>> RTE_TAILQ_NEXT(node, next);
>>
>> -			vhost_user_iotlb_clear_dump(dev, node, prev_node,
>> next_node);
>> +			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
>>
>>   			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>>   			vhost_user_iotlb_pool_put(dev, node);
>> @@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
>> *dev)
>>
>>   void
>>   vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova,
>> uint64_t uaddr,
>> -				uint64_t uoffset, uint64_t size, uint8_t perm)
>> +				uint64_t uoffset, uint64_t size, uint64_t
>> page_size, uint8_t perm)
>>   {
>>   	struct vhost_iotlb_entry *node, *new_node;
>>
>> @@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev,
>> uint64_t iova, uint64_t ua
>>   	new_node->uaddr = uaddr;
>>   	new_node->uoffset = uoffset;
>>   	new_node->size = size;
>> +	new_node->page_shift = __builtin_ctz(page_size);
> 
> __builtin_ctzll ?

Indeed, that's better. Weird I don't get a warning!
Fixed in v1.

Thanks,
Maxime

> Thanks,
> Chenbo


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 10/27] vhost: retry translating IOVA after IOTLB miss
  2023-03-31 15:42 ` [RFC 10/27] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 10/27] vhost: retry translating IOVA after IOTLB miss
> 
> Vhost-user backend IOTLB misses and updates are
> asynchronous, so IOVA address translation function
> just fails after having sent an IOTLB miss update if needed
> entry was not in the IOTLB cache.
> 
> This is not the case for VDUSE, for which the needed IOTLB
> update is returned directly when sending an IOTLB miss.
> 
> This patch retry again finding the needed entry in the
> IOTLB cache after having sent an IOTLB miss.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index d35075b96c..4f16307e4d 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -96,6 +96,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  		vhost_user_iotlb_rd_lock(vq);
>  	}
> 
> +	tmp_size = *size;
> +	/* Retry in case of VDUSE, as it is synchronous */
> +	vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
> +	if (tmp_size == *size)
> +		return vva;
> +
>  	return 0;
>  }
> 
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 11/27] vhost: introduce backend ops
  2023-03-31 15:42 ` [RFC 11/27] vhost: introduce backend ops Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 11/27] vhost: introduce backend ops
> 
> This patch introduces backend ops struct, that will enable
> calling backend specifics callbacks (Vhost-user, VDUSE), in
> shared code.
> 
> This is an empty shell for now, it will be filled in later
> patches.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/socket.c     |  2 +-
>  lib/vhost/vhost.c      |  8 +++++++-
>  lib/vhost/vhost.h      | 10 +++++++++-
>  lib/vhost/vhost_user.c |  8 ++++++++
>  lib/vhost/vhost_user.h |  1 +
>  5 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 669c322e12..ba54263824 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -221,7 +221,7 @@ vhost_user_add_connection(int fd, struct
> vhost_user_socket *vsocket)
>  		return;
>  	}
> 
> -	vid = vhost_new_device();
> +	vid = vhost_user_new_device();
>  	if (vid == -1) {
>  		goto err;
>  	}
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index 4f16307e4d..41f212315e 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -676,11 +676,16 @@ reset_device(struct virtio_net *dev)
>   * there is a new virtio device being attached).
>   */
>  int
> -vhost_new_device(void)
> +vhost_new_device(struct vhost_backend_ops *ops)
>  {
>  	struct virtio_net *dev;
>  	int i;
> 
> +	if (ops == NULL) {
> +		VHOST_LOG_CONFIG("device", ERR, "missing backend ops.\n");
> +		return -1;
> +	}
> +
>  	pthread_mutex_lock(&vhost_dev_lock);
>  	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
>  		if (vhost_devices[i] == NULL)
> @@ -708,6 +713,7 @@ vhost_new_device(void)
>  	dev->backend_req_fd = -1;
>  	dev->postcopy_ufd = -1;
>  	rte_spinlock_init(&dev->backend_req_lock);
> +	dev->backend_ops = ops;
> 
>  	return i;
>  }
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 4ace5ab081..cc5c707205 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -89,6 +89,12 @@
>  	for (iter = val; iter < num; iter++)
>  #endif
> 
> +/**
> + * Structure that contains backend-specific ops.
> + */
> +struct vhost_backend_ops {
> +};
> +
>  /**
>   * Structure contains buffer address, length and descriptor index
>   * from vring to do scatter RX.
> @@ -513,6 +519,8 @@ struct virtio_net {
>  	void			*extern_data;
>  	/* pre and post vhost user message handlers for the device */
>  	struct rte_vhost_user_extern_ops extern_ops;
> +
> +	struct vhost_backend_ops *backend_ops;
>  } __rte_cache_aligned;
> 
>  static inline void
> @@ -812,7 +820,7 @@ get_device(int vid)
>  	return dev;
>  }
> 
> -int vhost_new_device(void);
> +int vhost_new_device(struct vhost_backend_ops *ops);
>  void cleanup_device(struct virtio_net *dev, int destroy);
>  void reset_device(struct virtio_net *dev);
>  void vhost_destroy_device(int);
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index a989f2c46d..2d5dec5bc1 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -3464,3 +3464,11 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
> 
>  	return ret;
>  }
> +
> +static struct vhost_backend_ops vhost_user_backend_ops;
> +
> +int
> +vhost_user_new_device(void)
> +{
> +	return vhost_new_device(&vhost_user_backend_ops);
> +}
> diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
> index a0987a58f9..61456049c8 100644
> --- a/lib/vhost/vhost_user.h
> +++ b/lib/vhost/vhost_user.h
> @@ -185,5 +185,6 @@ int vhost_user_iotlb_miss(struct virtio_net *dev,
> uint64_t iova, uint8_t perm);
>  int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int
> *fds, int max_fds,
>  		int *fd_num);
>  int send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int
> *fds, int fd_num);
> +int vhost_user_new_device(void);
> 
>  #endif
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 12/27] vhost: add IOTLB cache entry removal callback
  2023-03-31 15:42 ` [RFC 12/27] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  2023-05-25 11:20     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 12/27] vhost: add IOTLB cache entry removal callback
> 
> VDUSE will need to munmap() the IOTLB entry on removal
> from the cache, as it performs mmap() before insertion.
> 
> This patch introduces a callback that VDUSE layer will
> implement to achieve this.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/iotlb.c | 12 ++++++++++++
>  lib/vhost/vhost.h |  4 ++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index 188dfb8e38..86b0be62b4 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
> 
>  #define IOTLB_CACHE_SIZE 2048
> 
> +static void
> +vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct
> vhost_iotlb_entry *entry)
> +{
> +	if (dev->backend_ops->iotlb_remove_notify == NULL)
> +		return;
> +
> +	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset,
> entry->size);
> +}
> +
>  static bool
>  vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
> vhost_iotlb_entry *b)
>  {
> @@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
> *dev)
>  		vhost_user_iotlb_set_dump(node);
> 
>  		TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +		vhost_user_iotlb_remove_notify(dev, node);
>  		vhost_user_iotlb_pool_put(dev, node);
>  	}
> 
> @@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
> *dev)
>  			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +			vhost_user_iotlb_remove_notify(dev, node);
>  			vhost_user_iotlb_pool_put(dev, node);
>  			dev->iotlb_cache_nr--;
>  			break;
> @@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
> uint64_t iova, uint64_t si
>  			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
> 
>  			TAILQ_REMOVE(&dev->iotlb_list, node, next);
> +			vhost_user_iotlb_remove_notify(dev, node);
>  			vhost_user_iotlb_pool_put(dev, node);
>  			dev->iotlb_cache_nr--;
>  		} else {
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index cc5c707205..2ad26f6951 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -89,10 +89,14 @@
>  	for (iter = val; iter < num; iter++)
>  #endif
> 
> +struct virtio_net;

Adding this in patch 13 could be better since this patch is not using it.

Thanks,
Chenbo

> +typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off,
> uint64_t size);
> +
>  /**
>   * Structure that contains backend-specific ops.
>   */
>  struct vhost_backend_ops {
> +	vhost_iotlb_remove_notify iotlb_remove_notify;
>  };
> 
>  /**
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 14/27] vhost: add helper for interrupt injection
  2023-03-31 15:42 ` [RFC 14/27] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 14/27] vhost: add helper for interrupt injection
> 
> Vhost-user uses eventfd to inject IRQs, but VDUSE uses
> an ioctl.
> 
> This patch prepares vhost_vring_call_split() and
> vhost_vring_call_packed() to support VDUSE by introducing
> a new helper.
> 
> It also adds a new counter to for guest notification

to for -> for?

With this fixed:

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

> failures, which could happen in case of uninitialized call
> file descriptor for example.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.c      |  6 +++++
>  lib/vhost/vhost.h      | 54 +++++++++++++++++++++++-------------------
>  lib/vhost/vhost_user.c | 10 ++++++++
>  3 files changed, 46 insertions(+), 24 deletions(-)
> 
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index 790eb06b28..c07028f2b3 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -44,6 +44,7 @@ static const struct vhost_vq_stats_name_off
> vhost_vq_stat_strings[] = {
>  	{"size_1024_1518_packets", offsetof(struct vhost_virtqueue,
> stats.size_bins[6])},
>  	{"size_1519_max_packets",  offsetof(struct vhost_virtqueue,
> stats.size_bins[7])},
>  	{"guest_notifications",    offsetof(struct vhost_virtqueue,
> stats.guest_notifications)},
> +	{"guest_notifications_error",    offsetof(struct vhost_virtqueue,
> stats.guest_notifications_error)},
>  	{"iotlb_hits",             offsetof(struct vhost_virtqueue,
> stats.iotlb_hits)},
>  	{"iotlb_misses",           offsetof(struct vhost_virtqueue,
> stats.iotlb_misses)},
>  	{"inflight_submitted",     offsetof(struct vhost_virtqueue,
> stats.inflight_submitted)},
> @@ -697,6 +698,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
>  		return -1;
>  	}
> 
> +	if (ops->inject_irq == NULL) {
> +		VHOST_LOG_CONFIG("device", ERR, "missing IRQ injection backend
> op.\n");
> +		return -1;
> +	}
> +
>  	pthread_mutex_lock(&vhost_dev_lock);
>  	for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
>  		if (vhost_devices[i] == NULL)
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index ee7640e901..8f0875b4e2 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -90,16 +90,20 @@
>  #endif
> 
>  struct virtio_net;
> +struct vhost_virtqueue;
> +
>  typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off,
> uint64_t size);
> 
>  typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova,
> uint8_t perm);
> 
> +typedef int (*vhost_vring_inject_irq_cb)(struct virtio_net *dev, struct
> vhost_virtqueue *vq);
>  /**
>   * Structure that contains backend-specific ops.
>   */
>  struct vhost_backend_ops {
>  	vhost_iotlb_remove_notify iotlb_remove_notify;
>  	vhost_iotlb_miss_cb iotlb_miss;
> +	vhost_vring_inject_irq_cb inject_irq;
>  };
> 
>  /**
> @@ -149,6 +153,7 @@ struct virtqueue_stats {
>  	/* Size bins in array as RFC 2819, undersized [0], 64 [1], etc */
>  	uint64_t size_bins[8];
>  	uint64_t guest_notifications;
> +	uint64_t guest_notifications_error;
>  	uint64_t iotlb_hits;
>  	uint64_t iotlb_misses;
>  	uint64_t inflight_submitted;
> @@ -900,6 +905,24 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx,
> uint16_t old)
>  	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx -
> old);
>  }
> 
> +static __rte_always_inline void
> +vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
> +{
> +	int ret;
> +
> +	ret = dev->backend_ops->inject_irq(dev, vq);
> +	if (ret) {
> +		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +			vq->stats.guest_notifications_error++;
> +		return;
> +	}
> +
> +	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> +		vq->stats.guest_notifications++;
> +	if (dev->notify_ops->guest_notified)
> +		dev->notify_ops->guest_notified(dev->vid);
> +}
> +
>  static __rte_always_inline void
>  vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  {
> @@ -919,25 +942,13 @@ vhost_vring_call_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq)
>  			"%s: used_event_idx=%d, old=%d, new=%d\n",
>  			__func__, vhost_used_event(vq), old, new);
> 
> -		if ((vhost_need_event(vhost_used_event(vq), new, old) ||
> -					unlikely(!signalled_used_valid)) &&
> -				vq->callfd >= 0) {
> -			eventfd_write(vq->callfd, (eventfd_t) 1);
> -			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> -				vq->stats.guest_notifications++;
> -			if (dev->notify_ops->guest_notified)
> -				dev->notify_ops->guest_notified(dev->vid);
> -		}
> +		if (vhost_need_event(vhost_used_event(vq), new, old) ||
> +					unlikely(!signalled_used_valid))
> +			vhost_vring_inject_irq(dev, vq);
>  	} else {
>  		/* Kick the guest if necessary. */
> -		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
> -				&& (vq->callfd >= 0)) {
> -			eventfd_write(vq->callfd, (eventfd_t)1);
> -			if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> -				vq->stats.guest_notifications++;
> -			if (dev->notify_ops->guest_notified)
> -				dev->notify_ops->guest_notified(dev->vid);
> -		}
> +		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> +			vhost_vring_inject_irq(dev, vq);
>  	}
>  }
> 
> @@ -988,13 +999,8 @@ vhost_vring_call_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq)
>  	if (vhost_need_event(off, new, old))
>  		kick = true;
>  kick:
> -	if (kick && vq->callfd >= 0) {
> -		eventfd_write(vq->callfd, (eventfd_t)1);
> -		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> -			vq->stats.guest_notifications++;
> -		if (dev->notify_ops->guest_notified)
> -			dev->notify_ops->guest_notified(dev->vid);
> -	}
> +	if (kick)
> +		vhost_vring_inject_irq(dev, vq);
>  }
> 
>  static __rte_always_inline void
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 6a9f32972a..2e4a9fdea4 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -3465,8 +3465,18 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
>  	return ret;
>  }
> 
> +static int
> +vhost_user_inject_irq(struct virtio_net *dev __rte_unused, struct
> vhost_virtqueue *vq)
> +{
> +	if (vq->callfd < 0)
> +		return -1;
> +
> +	return eventfd_write(vq->callfd, (eventfd_t)1);
> +}
> +
>  static struct vhost_backend_ops vhost_user_backend_ops = {
>  	.iotlb_miss = vhost_user_iotlb_miss,
> +	.inject_irq = vhost_user_inject_irq,
>  };
> 
>  int
> --
> 2.39.2

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 15/27] vhost: add API to set max queue pairs
  2023-03-31 15:42 ` [RFC 15/27] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  2023-05-25 11:23     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 15/27] vhost: add API to set max queue pairs
> 
> This patch introduces a new rte_vhost_driver_set_max_queues
> API as preliminary work for multiqueue support with VDUSE.
> 
> Indeed, with VDUSE we need to pre-allocate the vrings at
> device creation time, so we need such API not to allocate
> the 128 queue pairs supported by the Vhost library.
> 
> Calling the API is optional, 128 queue pairs remaining the
> default.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst |  4 ++++
>  lib/vhost/rte_vhost.h               | 17 ++++++++++++++
>  lib/vhost/socket.c                  | 36 +++++++++++++++++++++++++++--
>  lib/vhost/version.map               |  3 +++
>  4 files changed, 58 insertions(+), 2 deletions(-)

Also add changes in release notes? Btw: somewhere we should also mention vduse
support is added in release notes.

Thanks,
Chenbo

> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index e8bb8c9b7b..cd4b109139 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -334,6 +334,10 @@ The following is an overview of some key Vhost API
> functions:
>    Clean DMA vChannel finished to use. After this function is called,
>    the specified DMA vChannel should no longer be used by the Vhost
> library.
> 
> +* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
> +
> +  Set the maximum number of queue pairs supported by the device.
> +
>  Vhost-user Implementations
>  --------------------------
> 
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index 58a5d4be92..44cbfcb469 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -588,6 +588,23 @@ rte_vhost_driver_get_protocol_features(const char
> *path,
>  int
>  rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice.
> + *
> + * Set the maximum number of queue pairs supported by the device.
> + *
> + * @param path
> + *  The vhost-user socket file path
> + * @param max_queue_pairs
> + *  The maximum number of queue pairs
> + * @return
> + *  0 on success, -1 on failure
> + */
> +__rte_experimental
> +int
> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
> max_queue_pairs);
> +
>  /**
>   * Get the feature bits after negotiation
>   *
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index ba54263824..e95c3ffeac 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -56,6 +56,8 @@ struct vhost_user_socket {
> 
>  	uint64_t protocol_features;
> 
> +	uint32_t max_queue_pairs;
> +
>  	struct rte_vdpa_device *vdpa_dev;
> 
>  	struct rte_vhost_device_ops const *notify_ops;
> @@ -821,7 +823,7 @@ rte_vhost_driver_get_queue_num(const char *path,
> uint32_t *queue_num)
> 
>  	vdpa_dev = vsocket->vdpa_dev;
>  	if (!vdpa_dev) {
> -		*queue_num = VHOST_MAX_QUEUE_PAIRS;
> +		*queue_num = vsocket->max_queue_pairs;
>  		goto unlock_exit;
>  	}
> 
> @@ -831,7 +833,36 @@ rte_vhost_driver_get_queue_num(const char *path,
> uint32_t *queue_num)
>  		goto unlock_exit;
>  	}
> 
> -	*queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS,
> vdpa_queue_num);
> +	*queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
> +
> +unlock_exit:
> +	pthread_mutex_unlock(&vhost_user.mutex);
> +	return ret;
> +}
> +
> +int
> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
> max_queue_pairs)
> +{
> +	struct vhost_user_socket *vsocket;
> +	int ret = 0;
> +
> +	VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n",
> max_queue_pairs);
> +
> +	if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
> +		VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u
> queue pairs\n",
> +				VHOST_MAX_QUEUE_PAIRS);
> +		return -1;
> +	}
> +
> +	pthread_mutex_lock(&vhost_user.mutex);
> +	vsocket = find_vhost_user_socket(path);
> +	if (!vsocket) {
> +		VHOST_LOG_CONFIG(path, ERR, "socket file is not registered
> yet.\n");
> +		ret = -1;
> +		goto unlock_exit;
> +	}
> +
> +	vsocket->max_queue_pairs = max_queue_pairs;
> 
>  unlock_exit:
>  	pthread_mutex_unlock(&vhost_user.mutex);
> @@ -890,6 +921,7 @@ rte_vhost_driver_register(const char *path, uint64_t
> flags)
>  		goto out_free;
>  	}
>  	vsocket->vdpa_dev = NULL;
> +	vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
>  	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
>  	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
>  	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> diff --git a/lib/vhost/version.map b/lib/vhost/version.map
> index d322a4a888..dffb126aa8 100644
> --- a/lib/vhost/version.map
> +++ b/lib/vhost/version.map
> @@ -98,6 +98,9 @@ EXPERIMENTAL {
>  	# added in 22.11
>  	rte_vhost_async_dma_unconfigure;
>  	rte_vhost_vring_call_nonblock;
> +
> +	# added in 23.07
> +	rte_vhost_driver_set_max_queue_num;
>  };
> 
>  INTERNAL {
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 16/27] net/vhost: use API to set max queue pairs
  2023-03-31 15:42 ` [RFC 16/27] net/vhost: use " Maxime Coquelin
@ 2023-05-05  5:07   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:07 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 16/27] net/vhost: use API to set max queue pairs
> 
> In order to support multiqueue with VDUSE, we need to
> be able to limit the maximum number of queue pairs, to
> avoid unnecessary memory consumption since the maximum
> number of queue pairs need to be allocated at device
> creation time, as opposed to Vhost-user which allocate
> only when the frontend initialize them.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index 62ef955ebc..8d37ec9775 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -1013,6 +1013,9 @@ vhost_driver_setup(struct rte_eth_dev *eth_dev)
>  			goto drv_unreg;
>  	}
> 
> +	if (rte_vhost_driver_set_max_queue_num(internal->iface_name,
> internal->max_queues))
> +		goto drv_unreg;
> +
>  	if (rte_vhost_driver_callback_register(internal->iface_name,
>  					       &vhost_ops) < 0) {
>  		VHOST_LOG(ERR, "Can't register callbacks\n");
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 00/27] Add VDUSE support to Vhost library
  2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
                   ` (28 preceding siblings ...)
  2023-04-12 11:33 ` Ferruh Yigit
@ 2023-05-05  5:53 ` Xia, Chenbo
  29 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-05  5:53 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 00/27] Add VDUSE support to Vhost library
> 
> This series introduces a new type of backend, VDUSE,
> to the Vhost library.
> 
> VDUSE stands for vDPA device in Userspace, it enables
> implementing a Virtio device in userspace and have it
> attached to the Kernel vDPA bus.
> 
> Once attached to the vDPA bus, it can be used either by
> Kernel Virtio drivers, like virtio-net in our case, via
> the virtio-vdpa driver. Doing that, the device is visible
> to the Kernel networking stack and is exposed to userspace
> as a regular netdev.
> 
> It can also be exposed to userspace thanks to the
> vhost-vdpa driver, via a vhost-vdpa chardev that can be
> passed to QEMU or Virtio-user PMD.
> 
> While VDUSE support is already available in upstream
> Kernel, a couple of patches are required to support
> network device type:
> 
> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_poc
> 
> In order to attach the created VDUSE device to the vDPA
> bus, a recent iproute2 version containing the vdpa tool is
> required.
> --
> 2.39.2

Btw: when this series merged in future, will Redhat run all the
test cases of vduse in every release?

Thanks,
Chenbo


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 17/27] vhost: add control virtqueue support
  2023-03-31 15:42 ` [RFC 17/27] vhost: add control virtqueue support Maxime Coquelin
@ 2023-05-09  5:29   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:29 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 17/27] vhost: add control virtqueue support
> 
> In order to support multi-queue with VDUSE, having
> control queue support in required.

in -> is

> 
> This patch adds control queue implementation, it will be
> used later when adding VDUSE support. Only split ring
> layout is supported for now, packed ring support will be
> added later.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/meson.build       |   1 +
>  lib/vhost/vhost.h           |   2 +
>  lib/vhost/virtio_net_ctrl.c | 282 ++++++++++++++++++++++++++++++++++++
>  lib/vhost/virtio_net_ctrl.h |  10 ++
>  4 files changed, 295 insertions(+)
>  create mode 100644 lib/vhost/virtio_net_ctrl.c
>  create mode 100644 lib/vhost/virtio_net_ctrl.h
> 
> diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
> index 197a51d936..cdcd403df3 100644
> --- a/lib/vhost/meson.build
> +++ b/lib/vhost/meson.build
> @@ -28,6 +28,7 @@ sources = files(
>          'vhost_crypto.c',
>          'vhost_user.c',
>          'virtio_net.c',
> +        'virtio_net_ctrl.c',
>  )
>  headers = files(
>          'rte_vdpa.h',
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8f0875b4e2..76663aed24 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -525,6 +525,8 @@ struct virtio_net {
>  	int			postcopy_ufd;
>  	int			postcopy_listening;
> 
> +	struct vhost_virtqueue	*cvq;
> +
>  	struct rte_vdpa_device *vdpa_dev;
> 
>  	/* context data for the external message handlers */
> diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
> new file mode 100644
> index 0000000000..16ea63b42f
> --- /dev/null
> +++ b/lib/vhost/virtio_net_ctrl.c
> @@ -0,0 +1,282 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#undef RTE_ANNOTATE_LOCKS
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +
> +#include "vhost.h"
> +#include "virtio_net_ctrl.h"
> +
> +struct virtio_net_ctrl {
> +	uint8_t class;
> +	uint8_t command;
> +	uint8_t command_data[];
> +};
> +
> +struct virtio_net_ctrl_elem {
> +	struct virtio_net_ctrl *ctrl_req;
> +	uint16_t head_idx;
> +	uint16_t n_descs;
> +	uint8_t *desc_ack;
> +};
> +
> +static int
> +virtio_net_ctrl_pop(struct virtio_net *dev, struct virtio_net_ctrl_elem
> *ctrl_elem)
> +{
> +	struct vhost_virtqueue *cvq = dev->cvq;
> +	uint16_t avail_idx, desc_idx, n_descs = 0;
> +	uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
> +	uint8_t *ctrl_req;
> +	struct vring_desc *descs;
> +
> +	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
> +	if (avail_idx == cvq->last_avail_idx) {
> +		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
> +		return 0;
> +	}
> +
> +	desc_idx = cvq->avail->ring[cvq->last_avail_idx];
> +	if (desc_idx >= cvq->size) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index,
> dropping\n");
> +		goto err;
> +	}
> +
> +	ctrl_elem->head_idx = desc_idx;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
> +		desc_len = cvq->desc[desc_idx].len;
> +		desc_iova = cvq->desc[desc_idx].addr;
> +
> +		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev,
> cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_RO);
> +		if (!descs || desc_len != cvq->desc[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> indirect descs\n");
> +			goto err;
> +		}
> +
> +		desc_idx = 0;
> +	} else {
> +		descs = cvq->desc;
> +	}
> +
> +	while (1) {
> +		desc_len = descs[desc_idx].len;
> +		desc_iova = descs[desc_idx].addr;
> +
> +		n_descs++;
> +
> +		if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
> +			if (ctrl_elem->desc_ack) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Unexpected ctrl chain layout\n");
> +				goto err;
> +			}
> +
> +			if (desc_len != sizeof(uint8_t)) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Invalid ack size for ctrl req,
> dropping\n");
> +				goto err;
> +			}
> +
> +			ctrl_elem->desc_ack = (uint8_t
> *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_WO);
> +			if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t))
> {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Failed to map ctrl ack descriptor\n");
> +				goto err;
> +			}
> +		} else {
> +			if (ctrl_elem->desc_ack) {
> +				VHOST_LOG_CONFIG(dev->ifname, ERR,
> +						"Unexpected ctrl chain layout\n");
> +				goto err;
> +			}
> +
> +			data_len += desc_len;
> +		}
> +
> +		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
> +			break;
> +
> +		desc_idx = descs[desc_idx].next;
> +	}
> +
> +	desc_idx = ctrl_elem->head_idx;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
> +		ctrl_elem->n_descs = 1;
> +	else
> +		ctrl_elem->n_descs = n_descs;
> +
> +	if (!ctrl_elem->desc_ack) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack
> descriptor\n");
> +		goto err;
> +	}
> +
> +	if (data_len < sizeof(ctrl_elem->ctrl_req->class) +
> sizeof(ctrl_elem->ctrl_req->command)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header
> size\n");
> +		goto err;
> +	}
> +
> +	ctrl_elem->ctrl_req = malloc(data_len);
> +	if (!ctrl_elem->ctrl_req) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl
> request\n");
> +		goto err;
> +	}
> +
> +	ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
> +
> +	if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
> +		desc_len = cvq->desc[desc_idx].len;
> +		desc_iova = cvq->desc[desc_idx].addr;
> +
> +		descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev,
> cvq,
> +					desc_iova, &desc_len, VHOST_ACCESS_RO);
> +		if (!descs || desc_len != cvq->desc[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> indirect descs\n");
> +			goto err;

goto free_err?

Thanks,
Chenbo 

> +		}
> +
> +		desc_idx = 0;
> +	} else {
> +		descs = cvq->desc;
> +	}
> +
> +	while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
> +		desc_len = descs[desc_idx].len;
> +		desc_iova = descs[desc_idx].addr;
> +
> +		desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len,
> VHOST_ACCESS_RO);
> +		if (!desc_addr || desc_len < descs[desc_idx].len) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl
> descriptor\n");
> +			goto free_err;
> +		}
> +
> +		memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
> +		ctrl_req += desc_len;
> +
> +		if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
> +			break;
> +
> +		desc_idx = descs[desc_idx].next;
> +	}
> +
> +	cvq->last_avail_idx++;
> +	if (cvq->last_avail_idx >= cvq->size)
> +		cvq->last_avail_idx -= cvq->size;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vhost_avail_event(cvq) = cvq->last_avail_idx;
> +
> +	return 1;
> +
> +free_err:
> +	free(ctrl_elem->ctrl_req);
> +err:
> +	cvq->last_avail_idx++;
> +	if (cvq->last_avail_idx >= cvq->size)
> +		cvq->last_avail_idx -= cvq->size;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vhost_avail_event(cvq) = cvq->last_avail_idx;
> +
> +	return -1;
> +}
> +
> +static uint8_t
> +virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl
> *ctrl_req)
> +{
> +	uint8_t ret = VIRTIO_NET_ERR;
> +
> +	if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
> +			ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> +		uint16_t queue_pairs;
> +		uint32_t i;
> +
> +		queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue
> pairs\n", queue_pairs);
> +		ret = VIRTIO_NET_OK;
> +
> +		for (i = 0; i < dev->nr_vring; i++) {
> +			struct vhost_virtqueue *vq = dev->virtqueue[i];
> +			bool enable;
> +
> +			if (vq == dev->cvq)
> +				continue;
> +
> +			if (i < queue_pairs * 2)
> +				enable = true;
> +			else
> +				enable = false;
> +
> +			vq->enabled = enable;
> +			if (dev->notify_ops->vring_state_changed)
> +				dev->notify_ops->vring_state_changed(dev->vid, i,
> enable);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int
> +virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem
> *ctrl_elem)
> +{
> +	struct vhost_virtqueue *cvq = dev->cvq;
> +	struct vring_used_elem *used_elem;
> +
> +	used_elem = &cvq->used->ring[cvq->last_used_idx];
> +	used_elem->id = ctrl_elem->head_idx;
> +	used_elem->len = ctrl_elem->n_descs;
> +
> +	cvq->last_used_idx++;
> +	if (cvq->last_used_idx >= cvq->size)
> +		cvq->last_used_idx -= cvq->size;
> +
> +	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx,
> __ATOMIC_RELEASE);
> +
> +	free(ctrl_elem->ctrl_req);
> +
> +	return 0;
> +}
> +
> +int
> +virtio_net_ctrl_handle(struct virtio_net *dev)
> +{
> +	int ret = 0;
> +
> +	if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported
> yet\n");
> +		return -1;
> +	}
> +
> +	if (!dev->cvq) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
> +		return -1;
> +	}
> +
> +	rte_spinlock_lock(&dev->cvq->access_lock);
> +
> +	while (1) {
> +		struct virtio_net_ctrl_elem ctrl_elem;
> +
> +		memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
> +
> +		ret = virtio_net_ctrl_pop(dev, &ctrl_elem);
> +		if (ret <= 0)
> +			break;
> +
> +		*ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev,
> ctrl_elem.ctrl_req);
> +
> +		ret = virtio_net_ctrl_push(dev, &ctrl_elem);
> +		if (ret < 0)
> +			break;
> +	}
> +
> +	rte_spinlock_unlock(&dev->cvq->access_lock);
> +
> +	return ret;
> +}
> diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
> new file mode 100644
> index 0000000000..9a90f4b9da
> --- /dev/null
> +++ b/lib/vhost/virtio_net_ctrl.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#ifndef _VIRTIO_NET_CTRL_H
> +#define _VIRTIO_NET_CTRL_H
> +
> +int virtio_net_ctrl_handle(struct virtio_net *dev);
> +
> +#endif
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 18/27] vhost: add VDUSE device creation and destruction
  2023-03-31 15:42 ` [RFC 18/27] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-05-09  5:31   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:31 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 18/27] vhost: add VDUSE device creation and destruction
> 
> This patch adds initial support for VDUSE, which includes
> the device creation and destruction.
> 
> It does not include the virtqueues configuration, so this is
> not functionnal at this point.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/meson.build |   4 +
>  lib/vhost/socket.c    |  34 +++++---
>  lib/vhost/vduse.c     | 184 ++++++++++++++++++++++++++++++++++++++++++
>  lib/vhost/vduse.h     |  33 ++++++++
>  lib/vhost/vhost.h     |   2 +
>  5 files changed, 245 insertions(+), 12 deletions(-)
>  create mode 100644 lib/vhost/vduse.c
>  create mode 100644 lib/vhost/vduse.h
> 
> diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
> index cdcd403df3..a57a15937f 100644
> --- a/lib/vhost/meson.build
> +++ b/lib/vhost/meson.build
> @@ -30,6 +30,10 @@ sources = files(
>          'virtio_net.c',
>          'virtio_net_ctrl.c',
>  )
> +if cc.has_header('linux/vduse.h')
> +    sources += files('vduse.c')
> +    cflags += '-DVHOST_HAS_VDUSE'
> +endif
>  headers = files(
>          'rte_vdpa.h',
>          'rte_vhost.h',
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index e95c3ffeac..a8a1c4cd2b 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -18,6 +18,7 @@
>  #include <rte_log.h>
> 
>  #include "fd_man.h"
> +#include "vduse.h"
>  #include "vhost.h"
>  #include "vhost_user.h"
> 
> @@ -35,6 +36,7 @@ struct vhost_user_socket {
>  	int socket_fd;
>  	struct sockaddr_un un;
>  	bool is_server;
> +	bool is_vduse;
>  	bool reconnect;
>  	bool iommu_support;
>  	bool use_builtin_virtio_net;
> @@ -992,18 +994,21 @@ rte_vhost_driver_register(const char *path, uint64_t
> flags)
>  #endif
>  	}
> 
> -	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
> -		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
> -		if (vsocket->reconnect && reconn_tid == 0) {
> -			if (vhost_user_reconnect_init() != 0)
> -				goto out_mutex;
> -		}
> +	if (!strncmp("/dev/vduse/", path, strlen("/dev/vduse/"))) {
> +		vsocket->is_vduse = true;
>  	} else {
> -		vsocket->is_server = true;
> -	}
> -	ret = create_unix_socket(vsocket);
> -	if (ret < 0) {
> -		goto out_mutex;
> +		if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
> +			vsocket->reconnect = !(flags &
> RTE_VHOST_USER_NO_RECONNECT);
> +			if (vsocket->reconnect && reconn_tid == 0) {
> +				if (vhost_user_reconnect_init() != 0)
> +					goto out_mutex;
> +			}
> +		} else {
> +			vsocket->is_server = true;
> +		}
> +		ret = create_unix_socket(vsocket);
> +		if (ret < 0)
> +			goto out_mutex;
>  	}
> 
>  	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
> @@ -1068,7 +1073,9 @@ rte_vhost_driver_unregister(const char *path)
>  		if (strcmp(vsocket->path, path))
>  			continue;
> 
> -		if (vsocket->is_server) {
> +		if (vsocket->is_vduse) {
> +			vduse_device_destroy(path);
> +		} else if (vsocket->is_server) {
>  			/*
>  			 * If r/wcb is executing, release vhost_user's
>  			 * mutex lock, and try again since the r/wcb
> @@ -1171,6 +1178,9 @@ rte_vhost_driver_start(const char *path)
>  	if (!vsocket)
>  		return -1;
> 
> +	if (vsocket->is_vduse)
> +		return vduse_device_create(path);
> +
>  	if (fdset_tid == 0) {
>  		/**
>  		 * create a pipe which will be waited by poll and notified to
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> new file mode 100644
> index 0000000000..336761c97a
> --- /dev/null
> +++ b/lib/vhost/vduse.c
> @@ -0,0 +1,184 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <fcntl.h>
> +
> +
> +#include <linux/vduse.h>
> +#include <linux/virtio_net.h>
> +
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +
> +#include <rte_common.h>
> +
> +#include "vduse.h"
> +#include "vhost.h"
> +
> +#define VHOST_VDUSE_API_VERSION 0
> +#define VDUSE_CTRL_PATH "/dev/vduse/control"
> +
> +#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
> \
> +				(1ULL << VIRTIO_F_ANY_LAYOUT) | \
> +				(1ULL << VIRTIO_F_VERSION_1)   | \
> +				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
> +				(1ULL << VIRTIO_RING_F_EVENT_IDX) | \
> +				(1ULL << VIRTIO_F_IN_ORDER) | \
> +				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +
> +static struct vhost_backend_ops vduse_backend_ops = {
> +};
> +
> +int
> +vduse_device_create(const char *path)
> +{
> +	int control_fd, dev_fd, vid, ret;
> +	uint32_t i;
> +	struct virtio_net *dev;
> +	uint64_t ver = VHOST_VDUSE_API_VERSION;
> +	struct vduse_dev_config *dev_config = NULL;
> +	const char *name = path + strlen("/dev/vduse/");
> +
> +	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
> +	if (control_fd < 0) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
> +				VDUSE_CTRL_PATH, strerror(errno));
> +		return -1;
> +	}
> +
> +	if (ioctl(control_fd, VDUSE_SET_API_VERSION, &ver)) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to set API version: %"
> PRIu64 ": %s\n",
> +				ver, strerror(errno));
> +		ret = -1;
> +		goto out_ctrl_close;
> +	}
> +
> +	dev_config = malloc(offsetof(struct vduse_dev_config, config));
> +	if (!dev_config) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE
> config\n");
> +		ret = -1;
> +		goto out_ctrl_close;
> +	}
> +
> +	memset(dev_config, 0, sizeof(struct vduse_dev_config));
> +
> +	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
> +	dev_config->device_id = VIRTIO_ID_NET;
> +	dev_config->vendor_id = 0;
> +	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
> +	dev_config->vq_num = 2;
> +	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
> +	dev_config->config_size = 0;
> +
> +	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to create VDUSE
> device: %s\n",
> +				strerror(errno));
> +		goto out_free;
> +	}
> +
> +	dev_fd = open(path, O_RDWR);
> +	if (dev_fd < 0) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to open device %s: %s\n",
> +				path, strerror(errno));
> +		ret = -1;
> +		goto out_dev_close;
> +	}
> +
> +	vid = vhost_new_device(&vduse_backend_ops);
> +	if (vid < 0) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to create new Vhost
> device\n");
> +		ret = -1;
> +		goto out_dev_close;
> +	}
> +
> +	dev = get_device(vid);
> +	if (!dev) {
> +		ret = -1;
> +		goto out_dev_close;
> +	}
> +
> +	strncpy(dev->ifname, path, IF_NAME_SZ - 1);
> +	dev->vduse_ctrl_fd = control_fd;
> +	dev->vduse_dev_fd = dev_fd;
> +	vhost_setup_virtio_net(dev->vid, true, true, true, true);
> +
> +	for (i = 0; i < 2; i++) {
> +		struct vduse_vq_config vq_cfg = { 0 };
> +
> +		ret = alloc_vring_queue(dev, i);
> +		if (ret) {
> +			VHOST_LOG_CONFIG(name, ERR, "Failed to alloc vring %d
> metadata\n", i);
> +			goto out_dev_destroy;
> +		}
> +
> +		vq_cfg.index = i;
> +		vq_cfg.max_size = 1024;
> +
> +		ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP, &vq_cfg);
> +		if (ret) {
> +			VHOST_LOG_CONFIG(name, ERR, "Failed to set-up VQ %d\n",
> i);
> +			goto out_dev_destroy;
> +		}
> +	}
> +
> +	free(dev_config);
> +
> +	return 0;
> +
> +out_dev_destroy:
> +	vhost_destroy_device(vid);
> +out_dev_close:
> +	if (dev_fd >= 0)
> +		close(dev_fd);
> +	ioctl(control_fd, VDUSE_DESTROY_DEV, name);
> +out_free:
> +	free(dev_config);
> +out_ctrl_close:
> +	close(control_fd);
> +
> +	return ret;
> +}
> +
> +int
> +vduse_device_destroy(const char *path)
> +{
> +	const char *name = path + strlen("/dev/vduse/");
> +	struct virtio_net *dev;
> +	int vid, ret;
> +
> +	for (vid = 0; vid < RTE_MAX_VHOST_DEVICE; vid++) {
> +		dev = vhost_devices[vid];
> +
> +		if (dev == NULL)
> +			continue;
> +
> +		if (!strcmp(path, dev->ifname))
> +			break;
> +	}
> +
> +	if (vid == RTE_MAX_VHOST_DEVICE)
> +		return -1;
> +
> +	if (dev->vduse_dev_fd >= 0) {
> +		close(dev->vduse_dev_fd);
> +		dev->vduse_dev_fd = -1;
> +	}
> +
> +	if (dev->vduse_ctrl_fd >= 0) {
> +		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
> +		if (ret)
> +			VHOST_LOG_CONFIG(name, ERR, "Failed to destroy VDUSE
> device: %s\n",
> +					strerror(errno));
> +		close(dev->vduse_ctrl_fd);
> +		dev->vduse_ctrl_fd = -1;
> +	}
> +
> +	vhost_destroy_device(vid);
> +
> +	return 0;
> +}
> diff --git a/lib/vhost/vduse.h b/lib/vhost/vduse.h
> new file mode 100644
> index 0000000000..a15e5d4c16
> --- /dev/null
> +++ b/lib/vhost/vduse.h
> @@ -0,0 +1,33 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2023 Red Hat, Inc.
> + */
> +
> +#ifndef _VDUSE_H
> +#define _VDUSE_H
> +
> +#include "vhost.h"
> +
> +#ifdef VHOST_HAS_VDUSE
> +
> +int vduse_device_create(const char *path);
> +int vduse_device_destroy(const char *path);
> +
> +#else
> +
> +static inline int
> +vduse_device_create(const char *path)
> +{
> +	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build
> time\n");
> +	return -1;
> +}
> +
> +static inline int
> +vduse_device_destroy(const char *path)
> +{
> +	VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build
> time\n");
> +	return -1;
> +}
> +
> +#endif /* VHOST_HAS_VDUSE */
> +
> +#endif /* _VDUSE_H */
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 76663aed24..c8f2a0d43a 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -524,6 +524,8 @@ struct virtio_net {
> 
>  	int			postcopy_ufd;
>  	int			postcopy_listening;
> +	int			vduse_ctrl_fd;
> +	int			vduse_dev_fd;
> 
>  	struct vhost_virtqueue	*cvq;
> 
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 19/27] vhost: add VDUSE callback for IOTLB miss
  2023-03-31 15:42 ` [RFC 19/27] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
@ 2023-05-09  5:31   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:31 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 19/27] vhost: add VDUSE callback for IOTLB miss
> 
> This patch implements the VDUSE callback for IOTLB misses,
> where it unmaps the pages from the invalidated IOTLB entry.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 336761c97a..f46823f589 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -13,9 +13,11 @@
> 
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
> +#include <sys/stat.h>
> 
>  #include <rte_common.h>
> 
> +#include "iotlb.h"
>  #include "vduse.h"
>  #include "vhost.h"
> 
> @@ -30,7 +32,63 @@
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
>  				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> +static int
> +vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm
> __rte_unused)
> +{
> +	struct vduse_iotlb_entry entry;
> +	uint64_t size, page_size;
> +	struct stat stat;
> +	void *mmap_addr;
> +	int fd, ret;
> +
> +	entry.start = iova;
> +	entry.last = iova + 1;
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_IOTLB_GET_FD, &entry);
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get IOTLB entry
> for 0x%" PRIx64 "\n",
> +				iova);
> +		return -1;
> +	}
> +
> +	fd = ret;
> +
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "New IOTLB entry:\n");
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tIOVA: %" PRIx64 " - %"
> PRIx64 "\n",
> +			(uint64_t)entry.start, (uint64_t)entry.last);
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\toffset: %" PRIx64 "\n",
> (uint64_t)entry.offset);
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tfd: %d\n", fd);
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tperm: %x\n", entry.perm);
> +
> +	size = entry.last - entry.start + 1;
> +	mmap_addr = mmap(0, size + entry.offset, entry.perm, MAP_SHARED, fd,
> 0);
> +	if (!mmap_addr) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR,
> +				"Failed to mmap IOTLB entry for 0x%" PRIx64 "\n",
> iova);
> +		ret = -1;
> +		goto close_fd;
> +	}
> +
> +	ret = fstat(fd, &stat);
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get page
> size.\n");
> +		munmap(mmap_addr, entry.offset + size);
> +		goto close_fd;
> +	}
> +	page_size = (uint64_t)stat.st_blksize;
> +
> +	vhost_user_iotlb_cache_insert(dev, entry.start,
> (uint64_t)(uintptr_t)mmap_addr,
> +		entry.offset, size, page_size, entry.perm);
> +
> +	ret = 0;
> +close_fd:
> +	close(fd);
> +
> +	return ret;
> +}
> +
>  static struct vhost_backend_ops vduse_backend_ops = {
> +	.iotlb_miss = vduse_iotlb_miss,
>  };
> 
>  int
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal
  2023-03-31 15:42 ` [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-05-09  5:32   ` Xia, Chenbo
  2023-05-25 11:35     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:32 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal
> 
> This patch implements the VDUSE callback for IOTLB misses,

for IOTLB entry removal? This commit messages seems the same as patch 19.
You may want to change it :)

Thanks,
Chenbo

> where it unmaps the pages from the invalidated IOTLB entry
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index f46823f589..ff4c9e72f1 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -32,6 +32,12 @@
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
>  				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> +static void
> +vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
> +{
> +	munmap((void *)(uintptr_t)addr, offset + size);
> +}
> +
>  static int
>  vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm
> __rte_unused)
>  {
> @@ -89,6 +95,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova,
> uint8_t perm __rte_unuse
> 
>  static struct vhost_backend_ops vduse_backend_ops = {
>  	.iotlb_miss = vduse_iotlb_miss,
> +	.iotlb_remove_notify = vduse_iotlb_remove_notify,
>  };
> 
>  int
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 21/27] vhost: add VDUSE callback for IRQ injection
  2023-03-31 15:42 ` [RFC 21/27] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
@ 2023-05-09  5:33   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:33 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 21/27] vhost: add VDUSE callback for IRQ injection
> 
> This patch implements the VDUSE callback for kicking
> virtqueues.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index ff4c9e72f1..afa8a39498 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -32,6 +32,12 @@
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
>  				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> +static int
> +vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
> +{
> +	return ioctl(dev->vduse_dev_fd, VDUSE_VQ_INJECT_IRQ, &vq->index);
> +}
> +
>  static void
>  vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
>  {
> @@ -96,6 +102,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova,
> uint8_t perm __rte_unuse
>  static struct vhost_backend_ops vduse_backend_ops = {
>  	.iotlb_miss = vduse_iotlb_miss,
>  	.iotlb_remove_notify = vduse_iotlb_remove_notify,
> +	.inject_irq = vduse_inject_irq,
>  };
> 
>  int
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 22/27] vhost: add VDUSE events handler
  2023-03-31 15:42 ` [RFC 22/27] vhost: add VDUSE events handler Maxime Coquelin
@ 2023-05-09  5:34   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:34 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 22/27] vhost: add VDUSE events handler
> 
> This patch makes use of Vhost lib's FD manager to install
> a handler for VDUSE events occurring on the VDUSE device FD.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 102 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index afa8a39498..2a183130d3 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -17,6 +17,7 @@
> 
>  #include <rte_common.h>
> 
> +#include "fd_man.h"
>  #include "iotlb.h"
>  #include "vduse.h"
>  #include "vhost.h"
> @@ -32,6 +33,27 @@
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
>  				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> 
> +struct vduse {
> +	struct fdset fdset;
> +};
> +
> +static struct vduse vduse = {
> +	.fdset = {
> +		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> +		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> +		.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
> +		.num = 0
> +	},
> +};
> +
> +static bool vduse_events_thread;
> +
> +static const char * const vduse_reqs_str[] = {
> +	"VDUSE_GET_VQ_STATE",
> +	"VDUSE_SET_STATUS",
> +	"VDUSE_UPDATE_IOTLB",
> +};
> +
>  static int
>  vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  {
> @@ -105,16 +127,84 @@ static struct vhost_backend_ops vduse_backend_ops =
> {
>  	.inject_irq = vduse_inject_irq,
>  };
> 
> +static void
> +vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
> +{
> +	struct virtio_net *dev = arg;
> +	struct vduse_dev_request req;
> +	struct vduse_dev_response resp;
> +	int ret;
> +
> +	memset(&resp, 0, sizeof(resp));
> +
> +	ret = read(fd, &req, sizeof(req));
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read
> request: %s\n",
> +				strerror(errno));
> +		return;
> +	} else if (ret < (int)sizeof(req)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Incomplete to read
> request %d\n", ret);
> +		return;
> +	}
> +
> +	pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "New request: %s (%u)\n",
> +			req.type < RTE_DIM(vduse_reqs_str) ?
> +			vduse_reqs_str[req.type] : "Unknown",
> +			req.type);
> +
> +	switch (req.type) {
> +	default:
> +		resp.result = VDUSE_REQ_RESULT_FAILED;
> +		break;
> +	}
> +
> +	pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
> +
> +	resp.request_id = req.request_id;
> +
> +	ret = write(dev->vduse_dev_fd, &resp, sizeof(resp));
> +	if (ret != sizeof(resp)) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to write
> response %s\n",
> +				strerror(errno));
> +		return;
> +	}
> +}
> +
>  int
>  vduse_device_create(const char *path)
>  {
>  	int control_fd, dev_fd, vid, ret;
> +	pthread_t fdset_tid;
>  	uint32_t i;
>  	struct virtio_net *dev;
>  	uint64_t ver = VHOST_VDUSE_API_VERSION;
>  	struct vduse_dev_config *dev_config = NULL;
>  	const char *name = path + strlen("/dev/vduse/");
> 
> +	/* If first device, create events dispatcher thread */
> +	if (vduse_events_thread == false) {
> +		/**
> +		 * create a pipe which will be waited by poll and notified to
> +		 * rebuild the wait list of poll.
> +		 */
> +		if (fdset_pipe_init(&vduse.fdset) < 0) {
> +			VHOST_LOG_CONFIG(path, ERR, "failed to create pipe for
> vduse fdset\n");
> +			return -1;
> +		}
> +
> +		ret = rte_ctrl_thread_create(&fdset_tid, "vduse-events", NULL,
> +				fdset_event_dispatch, &vduse.fdset);
> +		if (ret != 0) {
> +			VHOST_LOG_CONFIG(path, ERR, "failed to create vduse
> fdset handling thread\n");
> +			fdset_pipe_uninit(&vduse.fdset);
> +			return -1;
> +		}
> +
> +		vduse_events_thread = true;
> +	}
> +
>  	control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
>  	if (control_fd < 0) {
>  		VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
> @@ -198,6 +288,13 @@ vduse_device_create(const char *path)
>  		}
>  	}
> 
> +	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd,
> vduse_events_handler, NULL, dev);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse
> fdset\n",
> +				dev->vduse_dev_fd);
> +		goto out_dev_destroy;
> +	}
> +
>  	free(dev_config);
> 
>  	return 0;
> @@ -236,11 +333,16 @@ vduse_device_destroy(const char *path)
>  	if (vid == RTE_MAX_VHOST_DEVICE)
>  		return -1;
> 
> +	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
> +
>  	if (dev->vduse_dev_fd >= 0) {
>  		close(dev->vduse_dev_fd);
>  		dev->vduse_dev_fd = -1;
>  	}
> 
> +	sleep(2); //ToDo: Need to rework fdman to ensure the deleted FD is
> no
> +		  //more being polled, otherwise VDUSE_DESTROY_DEV will fail.
> +
>  	if (dev->vduse_ctrl_fd >= 0) {
>  		ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
>  		if (ret)
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 23/27] vhost: add support for virtqueue state get event
  2023-03-31 15:42 ` [RFC 23/27] vhost: add support for virtqueue state get event Maxime Coquelin
@ 2023-05-09  5:34   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:34 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 23/27] vhost: add support for virtqueue state get event
> 
> This patch adds support for VDUSE_GET_VQ_STATE event
> handling, which consists in providing the backend last
> available index for the specified virtqueue.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 2a183130d3..36028b7315 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -133,6 +133,7 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  	struct virtio_net *dev = arg;
>  	struct vduse_dev_request req;
>  	struct vduse_dev_response resp;
> +	struct vhost_virtqueue *vq;
>  	int ret;
> 
>  	memset(&resp, 0, sizeof(resp));
> @@ -155,6 +156,13 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  			req.type);
> 
>  	switch (req.type) {
> +	case VDUSE_GET_VQ_STATE:
> +		vq = dev->virtqueue[req.vq_state.index];
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tvq index: %u,
> avail_index: %u\n",
> +				req.vq_state.index, vq->last_avail_idx);
> +		resp.vq_state.split.avail_index = vq->last_avail_idx;
> +		resp.result = VDUSE_REQ_RESULT_OK;
> +		break;
>  	default:
>  		resp.result = VDUSE_REQ_RESULT_FAILED;
>  		break;
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 24/27] vhost: add support for VDUSE status set event
  2023-03-31 15:42 ` [RFC 24/27] vhost: add support for VDUSE status set event Maxime Coquelin
@ 2023-05-09  5:34   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:34 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 24/27] vhost: add support for VDUSE status set event
> 
> This patch adds support for VDUSE_SET_STATUS event
> handling, which consists in updating the Virtio device
> status set by the Virtio driver.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 36028b7315..7d59a5f709 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -163,6 +163,12 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  		resp.vq_state.split.avail_index = vq->last_avail_idx;
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
> +	case VDUSE_SET_STATUS:
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
> +				req.s.status);
> +		dev->status = req.s.status;
> +		resp.result = VDUSE_REQ_RESULT_OK;
> +		break;
>  	default:
>  		resp.result = VDUSE_REQ_RESULT_FAILED;
>  		break;
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 25/27] vhost: add support for VDUSE IOTLB update event
  2023-03-31 15:42 ` [RFC 25/27] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-05-09  5:35   ` Xia, Chenbo
  2023-05-25 11:43     ` Maxime Coquelin
  0 siblings, 1 reply; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:35 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 25/27] vhost: add support for VDUSE IOTLB update event
> 
> This patch adds support for VDUSE_UPDATE_IOTLB event
> handling, which consists in invaliding IOTLB entries for
> the range specified in the request.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 7d59a5f709..b5b9fa2eb1 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -169,6 +169,12 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  		dev->status = req.s.status;
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
> +	case VDUSE_UPDATE_IOTLB:
> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 "
> - %" PRIx64 "\n",
> +				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
> +		vhost_user_iotlb_cache_remove(dev, req.iova.start,
> +				req.iova.last - req.iova.start + 1);
> +		break;

We don't need to set the response result here?

Thanks,
Chenbo

>  	default:
>  		resp.result = VDUSE_REQ_RESULT_FAILED;
>  		break;
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 26/27] vhost: add VDUSE device startup
  2023-03-31 15:42 ` [RFC 26/27] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-05-09  5:35   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:35 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 26/27] vhost: add VDUSE device startup
> 
> This patch adds the device and its virtqueues
> initialization once the Virtio driver has set the DRIVER_OK
> in the Virtio status register.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index b5b9fa2eb1..1cd04b4872 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -127,6 +127,120 @@ static struct vhost_backend_ops vduse_backend_ops =
> {
>  	.inject_irq = vduse_inject_irq,
>  };
> 
> +static void
> +vduse_vring_setup(struct virtio_net *dev, unsigned int index)
> +{
> +	struct vhost_virtqueue *vq = dev->virtqueue[index];
> +	struct vhost_vring_addr *ra = &vq->ring_addrs;
> +	struct vduse_vq_info vq_info;
> +	struct vduse_vq_eventfd vq_efd;
> +	int ret;
> +
> +	vq_info.index = index;
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u
> info: %s\n",
> +				index, strerror(errno));
> +		return;
> +	}
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n",
> vq_info.desc_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n",
> vq_info.driver_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n",
> vq_info.device_addr);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n",
> vq_info.split.avail_index);
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
> +
> +	vq->last_avail_idx = vq_info.split.avail_index;
> +	vq->size = vq_info.num;
> +	vq->ready = vq_info.ready;
> +	vq->enabled = true;
> +	ra->desc_user_addr = vq_info.desc_addr;
> +	ra->avail_user_addr = vq_info.driver_addr;
> +	ra->used_user_addr = vq_info.device_addr;
> +
> +	vq->shadow_used_split = rte_malloc_socket(NULL,
> +				vq->size * sizeof(struct vring_used_elem),
> +				RTE_CACHE_LINE_SIZE, 0);
> +	vq->batch_copy_elems = rte_malloc_socket(NULL,
> +				vq->size * sizeof(struct batch_copy_elem),
> +				RTE_CACHE_LINE_SIZE, 0);
> +
> +	vhost_user_iotlb_rd_lock(vq);
> +	if (vring_translate(dev, vq))
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate
> vring %d addresses\n",
> +				index);
> +	vhost_user_iotlb_rd_unlock(vq);
> +
> +	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> +	if (vq->kickfd < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for
> VQ %u: %s\n",
> +				index, strerror(errno));
> +		vq->kickfd = VIRTIO_INVALID_EVENTFD;
> +		return;
> +	}
> +
> +	vq_efd.index = index;
> +	vq_efd.fd = vq->kickfd;
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for
> VQ %u: %s\n",
> +				index, strerror(errno));
> +		close(vq->kickfd);
> +		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +		return;
> +	}
> +}
> +
> +static void
> +vduse_device_start(struct virtio_net *dev)
> +{
> +	unsigned int i, ret;
> +
> +	dev->notify_ops = vhost_driver_callback_get(dev->ifname);
> +	if (!dev->notify_ops) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR,
> +				"Failed to get callback ops for driver\n");
> +		return;
> +	}
> +
> +	ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev-
> >features);
> +	if (ret) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get
> features: %s\n",
> +				strerror(errno));
> +		return;
> +	}
> +
> +	VHOST_LOG_CONFIG(dev->ifname, INFO, "negotiated Virtio features:
> 0x%" PRIx64 "\n",
> +		dev->features);
> +
> +	if (dev->features &
> +		((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
> +		 (1ULL << VIRTIO_F_VERSION_1) |
> +		 (1ULL << VIRTIO_F_RING_PACKED))) {
> +		dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	} else {
> +		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
> +	}
> +
> +	for (i = 0; i < dev->nr_vring; i++)
> +		vduse_vring_setup(dev, i);
> +
> +	dev->flags |= VIRTIO_DEV_READY;
> +
> +	if (dev->notify_ops->new_device(dev->vid) == 0)
> +		dev->flags |= VIRTIO_DEV_RUNNING;
> +
> +	for (i = 0; i < dev->nr_vring; i++) {
> +		struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> +		if (dev->notify_ops->vring_state_changed)
> +			dev->notify_ops->vring_state_changed(dev->vid, i, vq-
> >enabled);
> +	}
> +}
> +
>  static void
>  vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>  {
> @@ -167,6 +281,10 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
>  		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
>  				req.s.status);
>  		dev->status = req.s.status;
> +
> +		if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> +			vduse_device_start(dev);
> +
>  		resp.result = VDUSE_REQ_RESULT_OK;
>  		break;
>  	case VDUSE_UPDATE_IOTLB:
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [RFC 27/27] vhost: add multiqueue support to VDUSE
  2023-03-31 15:42 ` [RFC 27/27] vhost: add multiqueue support to VDUSE Maxime Coquelin
@ 2023-05-09  5:35   ` Xia, Chenbo
  0 siblings, 0 replies; 79+ messages in thread
From: Xia, Chenbo @ 2023-05-09  5:35 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, March 31, 2023 11:43 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [RFC 27/27] vhost: add multiqueue support to VDUSE
> 
> This patch enables control queue support in order to
> support multiqueue.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vduse.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 1cd04b4872..135e78fc35 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -21,6 +21,7 @@
>  #include "iotlb.h"
>  #include "vduse.h"
>  #include "vhost.h"
> +#include "virtio_net_ctrl.h"
> 
>  #define VHOST_VDUSE_API_VERSION 0
>  #define VDUSE_CTRL_PATH "/dev/vduse/control"
> @@ -31,7 +32,9 @@
>  				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
>  				(1ULL << VIRTIO_RING_F_EVENT_IDX) | \
>  				(1ULL << VIRTIO_F_IN_ORDER) | \
> -				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +				(1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
> +				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> +				(1ULL << VIRTIO_NET_F_MQ))
> 
>  struct vduse {
>  	struct fdset fdset;
> @@ -127,6 +130,25 @@ static struct vhost_backend_ops vduse_backend_ops = {
>  	.inject_irq = vduse_inject_irq,
>  };
> 
> +static void
> +vduse_control_queue_event(int fd, void *arg, int *remove __rte_unused)
> +{
> +	struct virtio_net *dev = arg;
> +	uint64_t buf;
> +	int ret;
> +
> +	ret = read(fd, &buf, sizeof(buf));
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read control
> queue event: %s\n",
> +				strerror(errno));
> +		return;
> +	}
> +
> +	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue kicked\n");
> +	if (virtio_net_ctrl_handle(dev))
> +		VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to handle ctrl
> request\n");
> +}
> +
>  static void
>  vduse_vring_setup(struct virtio_net *dev, unsigned int index)
>  {
> @@ -192,6 +214,18 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
> int index)
>  		vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>  		return;
>  	}
> +
> +	if (vq == dev->cvq) {
> +		vhost_enable_guest_notification(dev, vq, 1);
> +		ret = fdset_add(&vduse.fdset, vq->kickfd,
> vduse_control_queue_event, NULL, dev);
> +		if (ret) {
> +			VHOST_LOG_CONFIG(dev->ifname, ERR,
> +					"Failed to setup kickfd handler for
> VQ %u: %s\n",
> +					index, strerror(errno));
> +			close(vq->kickfd);
> +			vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +		}
> +	}
>  }
> 
>  static void
> @@ -236,6 +270,9 @@ vduse_device_start(struct virtio_net *dev)
>  	for (i = 0; i < dev->nr_vring; i++) {
>  		struct vhost_virtqueue *vq = dev->virtqueue[i];
> 
> +		if (vq == dev->cvq)
> +			continue;
> +
>  		if (dev->notify_ops->vring_state_changed)
>  			dev->notify_ops->vring_state_changed(dev->vid, i, vq-
> >enabled);
>  	}
> @@ -315,8 +352,9 @@ vduse_device_create(const char *path)
>  {
>  	int control_fd, dev_fd, vid, ret;
>  	pthread_t fdset_tid;
> -	uint32_t i;
> +	uint32_t i, max_queue_pairs;
>  	struct virtio_net *dev;
> +	struct virtio_net_config vnet_config = { 0 };
>  	uint64_t ver = VHOST_VDUSE_API_VERSION;
>  	struct vduse_dev_config *dev_config = NULL;
>  	const char *name = path + strlen("/dev/vduse/");
> @@ -357,22 +395,33 @@ vduse_device_create(const char *path)
>  		goto out_ctrl_close;
>  	}
> 
> -	dev_config = malloc(offsetof(struct vduse_dev_config, config));
> +	dev_config = malloc(offsetof(struct vduse_dev_config, config) +
> +			sizeof(vnet_config));
>  	if (!dev_config) {
>  		VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE
> config\n");
>  		ret = -1;
>  		goto out_ctrl_close;
>  	}
> 
> +	ret = rte_vhost_driver_get_queue_num(path, &max_queue_pairs);
> +	if (ret < 0) {
> +		VHOST_LOG_CONFIG(name, ERR, "Failed to get max queue pairs\n");
> +		goto out_free;
> +	}
> +
> +	VHOST_LOG_CONFIG(path, INFO, "VDUSE max queue pairs: %u\n",
> max_queue_pairs);
> +
> +	vnet_config.max_virtqueue_pairs = max_queue_pairs;
>  	memset(dev_config, 0, sizeof(struct vduse_dev_config));
> 
>  	strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
>  	dev_config->device_id = VIRTIO_ID_NET;
>  	dev_config->vendor_id = 0;
>  	dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
> -	dev_config->vq_num = 2;
> +	dev_config->vq_num = max_queue_pairs * 2 + 1; /* Includes ctrl queue
> */
>  	dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
> -	dev_config->config_size = 0;
> +	dev_config->config_size = sizeof(struct virtio_net_config);
> +	memcpy(dev_config->config, &vnet_config, sizeof(vnet_config));
> 
>  	ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
>  	if (ret < 0) {
> @@ -407,7 +456,7 @@ vduse_device_create(const char *path)
>  	dev->vduse_dev_fd = dev_fd;
>  	vhost_setup_virtio_net(dev->vid, true, true, true, true);
> 
> -	for (i = 0; i < 2; i++) {
> +	for (i = 0; i < max_queue_pairs * 2 + 1; i++) {
>  		struct vduse_vq_config vq_cfg = { 0 };
> 
>  		ret = alloc_vring_queue(dev, i);
> @@ -426,6 +475,8 @@ vduse_device_create(const char *path)
>  		}
>  	}
> 
> +	dev->cvq = dev->virtqueue[max_queue_pairs * 2];
> +
>  	ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd,
> vduse_events_handler, NULL, dev);
>  	if (ret) {
>  		VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse
> fdset\n",
> @@ -471,6 +522,12 @@ vduse_device_destroy(const char *path)
>  	if (vid == RTE_MAX_VHOST_DEVICE)
>  		return -1;
> 
> +	if (dev->cvq && dev->cvq->kickfd >= 0) {
> +		fdset_del(&vduse.fdset, dev->cvq->kickfd);
> +		close(dev->cvq->kickfd);
> +		dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +	}
> +
>  	fdset_del(&vduse.fdset, dev->vduse_dev_fd);
> 
>  	if (dev->vduse_dev_fd >= 0) {
> --
> 2.39.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 12/27] vhost: add IOTLB cache entry removal callback
  2023-05-05  5:07   ` Xia, Chenbo
@ 2023-05-25 11:20     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-25 11:20 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz



On 5/5/23 07:07, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 12/27] vhost: add IOTLB cache entry removal callback
>>
>> VDUSE will need to munmap() the IOTLB entry on removal
>> from the cache, as it performs mmap() before insertion.
>>
>> This patch introduces a callback that VDUSE layer will
>> implement to achieve this.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/iotlb.c | 12 ++++++++++++
>>   lib/vhost/vhost.h |  4 ++++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
>> index 188dfb8e38..86b0be62b4 100644
>> --- a/lib/vhost/iotlb.c
>> +++ b/lib/vhost/iotlb.c
>> @@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
>>
>>   #define IOTLB_CACHE_SIZE 2048
>>
>> +static void
>> +vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct
>> vhost_iotlb_entry *entry)
>> +{
>> +	if (dev->backend_ops->iotlb_remove_notify == NULL)
>> +		return;
>> +
>> +	dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset,
>> entry->size);
>> +}
>> +
>>   static bool
>>   vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct
>> vhost_iotlb_entry *b)
>>   {
>> @@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net
>> *dev)
>>   		vhost_user_iotlb_set_dump(node);
>>
>>   		TAILQ_REMOVE(&dev->iotlb_list, node, next);
>> +		vhost_user_iotlb_remove_notify(dev, node);
>>   		vhost_user_iotlb_pool_put(dev, node);
>>   	}
>>
>> @@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net
>> *dev)
>>   			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
>>
>>   			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>> +			vhost_user_iotlb_remove_notify(dev, node);
>>   			vhost_user_iotlb_pool_put(dev, node);
>>   			dev->iotlb_cache_nr--;
>>   			break;
>> @@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev,
>> uint64_t iova, uint64_t si
>>   			vhost_user_iotlb_clear_dump(node, prev_node, next_node);
>>
>>   			TAILQ_REMOVE(&dev->iotlb_list, node, next);
>> +			vhost_user_iotlb_remove_notify(dev, node);
>>   			vhost_user_iotlb_pool_put(dev, node);
>>   			dev->iotlb_cache_nr--;
>>   		} else {
>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
>> index cc5c707205..2ad26f6951 100644
>> --- a/lib/vhost/vhost.h
>> +++ b/lib/vhost/vhost.h
>> @@ -89,10 +89,14 @@
>>   	for (iter = val; iter < num; iter++)
>>   #endif
>>
>> +struct virtio_net;
> 
> Adding this in patch 13 could be better since this patch is not using it.

Right, I changed vhost_iotlb_remove_notify cb prototype but forgot to 
remove struct virtio_net afterwards.

Changed in upcoming v2.

Thanks,
Maxime


> Thanks,
> Chenbo
> 
>> +typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off,
>> uint64_t size);
>> +
>>   /**
>>    * Structure that contains backend-specific ops.
>>    */
>>   struct vhost_backend_ops {
>> +	vhost_iotlb_remove_notify iotlb_remove_notify;
>>   };
>>
>>   /**
>> --
>> 2.39.2
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 15/27] vhost: add API to set max queue pairs
  2023-05-05  5:07   ` Xia, Chenbo
@ 2023-05-25 11:23     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-25 11:23 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz



On 5/5/23 07:07, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 15/27] vhost: add API to set max queue pairs
>>
>> This patch introduces a new rte_vhost_driver_set_max_queues
>> API as preliminary work for multiqueue support with VDUSE.
>>
>> Indeed, with VDUSE we need to pre-allocate the vrings at
>> device creation time, so we need such API not to allocate
>> the 128 queue pairs supported by the Vhost library.
>>
>> Calling the API is optional, 128 queue pairs remaining the
>> default.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   doc/guides/prog_guide/vhost_lib.rst |  4 ++++
>>   lib/vhost/rte_vhost.h               | 17 ++++++++++++++
>>   lib/vhost/socket.c                  | 36 +++++++++++++++++++++++++++--
>>   lib/vhost/version.map               |  3 +++
>>   4 files changed, 58 insertions(+), 2 deletions(-)
> 
> Also add changes in release notes? Btw: somewhere we should also mention vduse
> support is added in release notes.
Correct, I need to update the release note in v2.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>>
>> diff --git a/doc/guides/prog_guide/vhost_lib.rst
>> b/doc/guides/prog_guide/vhost_lib.rst
>> index e8bb8c9b7b..cd4b109139 100644
>> --- a/doc/guides/prog_guide/vhost_lib.rst
>> +++ b/doc/guides/prog_guide/vhost_lib.rst
>> @@ -334,6 +334,10 @@ The following is an overview of some key Vhost API
>> functions:
>>     Clean DMA vChannel finished to use. After this function is called,
>>     the specified DMA vChannel should no longer be used by the Vhost
>> library.
>>
>> +* ``rte_vhost_driver_set_max_queue_num(path, max_queue_pairs)``
>> +
>> +  Set the maximum number of queue pairs supported by the device.
>> +
>>   Vhost-user Implementations
>>   --------------------------
>>
>> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
>> index 58a5d4be92..44cbfcb469 100644
>> --- a/lib/vhost/rte_vhost.h
>> +++ b/lib/vhost/rte_vhost.h
>> @@ -588,6 +588,23 @@ rte_vhost_driver_get_protocol_features(const char
>> *path,
>>   int
>>   rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
>> notice.
>> + *
>> + * Set the maximum number of queue pairs supported by the device.
>> + *
>> + * @param path
>> + *  The vhost-user socket file path
>> + * @param max_queue_pairs
>> + *  The maximum number of queue pairs
>> + * @return
>> + *  0 on success, -1 on failure
>> + */
>> +__rte_experimental
>> +int
>> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
>> max_queue_pairs);
>> +
>>   /**
>>    * Get the feature bits after negotiation
>>    *
>> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
>> index ba54263824..e95c3ffeac 100644
>> --- a/lib/vhost/socket.c
>> +++ b/lib/vhost/socket.c
>> @@ -56,6 +56,8 @@ struct vhost_user_socket {
>>
>>   	uint64_t protocol_features;
>>
>> +	uint32_t max_queue_pairs;
>> +
>>   	struct rte_vdpa_device *vdpa_dev;
>>
>>   	struct rte_vhost_device_ops const *notify_ops;
>> @@ -821,7 +823,7 @@ rte_vhost_driver_get_queue_num(const char *path,
>> uint32_t *queue_num)
>>
>>   	vdpa_dev = vsocket->vdpa_dev;
>>   	if (!vdpa_dev) {
>> -		*queue_num = VHOST_MAX_QUEUE_PAIRS;
>> +		*queue_num = vsocket->max_queue_pairs;
>>   		goto unlock_exit;
>>   	}
>>
>> @@ -831,7 +833,36 @@ rte_vhost_driver_get_queue_num(const char *path,
>> uint32_t *queue_num)
>>   		goto unlock_exit;
>>   	}
>>
>> -	*queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS,
>> vdpa_queue_num);
>> +	*queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
>> +
>> +unlock_exit:
>> +	pthread_mutex_unlock(&vhost_user.mutex);
>> +	return ret;
>> +}
>> +
>> +int
>> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
>> max_queue_pairs)
>> +{
>> +	struct vhost_user_socket *vsocket;
>> +	int ret = 0;
>> +
>> +	VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n",
>> max_queue_pairs);
>> +
>> +	if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
>> +		VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u
>> queue pairs\n",
>> +				VHOST_MAX_QUEUE_PAIRS);
>> +		return -1;
>> +	}
>> +
>> +	pthread_mutex_lock(&vhost_user.mutex);
>> +	vsocket = find_vhost_user_socket(path);
>> +	if (!vsocket) {
>> +		VHOST_LOG_CONFIG(path, ERR, "socket file is not registered
>> yet.\n");
>> +		ret = -1;
>> +		goto unlock_exit;
>> +	}
>> +
>> +	vsocket->max_queue_pairs = max_queue_pairs;
>>
>>   unlock_exit:
>>   	pthread_mutex_unlock(&vhost_user.mutex);
>> @@ -890,6 +921,7 @@ rte_vhost_driver_register(const char *path, uint64_t
>> flags)
>>   		goto out_free;
>>   	}
>>   	vsocket->vdpa_dev = NULL;
>> +	vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
>>   	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
>>   	vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
>>   	vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
>> diff --git a/lib/vhost/version.map b/lib/vhost/version.map
>> index d322a4a888..dffb126aa8 100644
>> --- a/lib/vhost/version.map
>> +++ b/lib/vhost/version.map
>> @@ -98,6 +98,9 @@ EXPERIMENTAL {
>>   	# added in 22.11
>>   	rte_vhost_async_dma_unconfigure;
>>   	rte_vhost_vring_call_nonblock;
>> +
>> +	# added in 23.07
>> +	rte_vhost_driver_set_max_queue_num;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.39.2
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal
  2023-05-09  5:32   ` Xia, Chenbo
@ 2023-05-25 11:35     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-25 11:35 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz



On 5/9/23 07:32, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal
>>
>> This patch implements the VDUSE callback for IOTLB misses,
> 
> for IOTLB entry removal? This commit messages seems the same as patch 19.
> You may want to change it :)
I indeed need to rework both patches 19 & 20 commit messages.

Thanks for the review,
Maxime

> Thanks,
> Chenbo
> 
>> where it unmaps the pages from the invalidated IOTLB entry
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/vduse.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
>> index f46823f589..ff4c9e72f1 100644
>> --- a/lib/vhost/vduse.c
>> +++ b/lib/vhost/vduse.c
>> @@ -32,6 +32,12 @@
>>   				(1ULL << VIRTIO_F_IN_ORDER) | \
>>   				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
>>
>> +static void
>> +vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
>> +{
>> +	munmap((void *)(uintptr_t)addr, offset + size);
>> +}
>> +
>>   static int
>>   vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm
>> __rte_unused)
>>   {
>> @@ -89,6 +95,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova,
>> uint8_t perm __rte_unuse
>>
>>   static struct vhost_backend_ops vduse_backend_ops = {
>>   	.iotlb_miss = vduse_iotlb_miss,
>> +	.iotlb_remove_notify = vduse_iotlb_remove_notify,
>>   };
>>
>>   int
>> --
>> 2.39.2
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [RFC 25/27] vhost: add support for VDUSE IOTLB update event
  2023-05-09  5:35   ` Xia, Chenbo
@ 2023-05-25 11:43     ` Maxime Coquelin
  0 siblings, 0 replies; 79+ messages in thread
From: Maxime Coquelin @ 2023-05-25 11:43 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
	Cunming, Xie, Yongji, echaudro, eperezma, amorenoz



On 5/9/23 07:35, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, March 31, 2023 11:43 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [RFC 25/27] vhost: add support for VDUSE IOTLB update event
>>
>> This patch adds support for VDUSE_UPDATE_IOTLB event
>> handling, which consists in invaliding IOTLB entries for
>> the range specified in the request.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/vhost/vduse.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
>> index 7d59a5f709..b5b9fa2eb1 100644
>> --- a/lib/vhost/vduse.c
>> +++ b/lib/vhost/vduse.c
>> @@ -169,6 +169,12 @@ vduse_events_handler(int fd, void *arg, int *remove
>> __rte_unused)
>>   		dev->status = req.s.status;
>>   		resp.result = VDUSE_REQ_RESULT_OK;
>>   		break;
>> +	case VDUSE_UPDATE_IOTLB:
>> +		VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 "
>> - %" PRIx64 "\n",
>> +				(uint64_t)req.iova.start, (uint64_t)req.iova.last);
>> +		vhost_user_iotlb_cache_remove(dev, req.iova.start,
>> +				req.iova.last - req.iova.start + 1);
>> +		break;
> 
> We don't need to set the response result here?

Good catch! We indeed need to send the reply for this message.
I'm fixing it now.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>>   	default:
>>   		resp.result = VDUSE_REQ_RESULT_FAILED;
>>   		break;
>> --
>> 2.39.2
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2023-05-25 11:43 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31 15:42 [RFC 00/27] Add VDUSE support to Vhost library Maxime Coquelin
2023-03-31 15:42 ` [RFC 01/27] vhost: fix missing guest notif stat increment Maxime Coquelin
2023-04-24  2:57   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 02/27] vhost: fix invalid call FD handling Maxime Coquelin
2023-04-24  2:58   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 03/27] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-04-17 19:15   ` Mike Pattrick
2023-04-24  2:58   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 04/27] vhost: add helper of IOTLB entries coredump Maxime Coquelin
2023-04-24  2:59   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 05/27] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
2023-04-17 19:39   ` Mike Pattrick
2023-04-19  9:35     ` Maxime Coquelin
2023-04-19 14:52       ` Mike Pattrick
2023-04-24  2:59   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 06/27] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
2023-04-20 17:11   ` Mike Pattrick
2023-04-24  3:00   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 07/27] vhost: change to single IOTLB cache per device Maxime Coquelin
2023-04-25  6:19   ` Xia, Chenbo
2023-05-03 13:47     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 08/27] vhost: add offset field to IOTLB entries Maxime Coquelin
2023-04-25  6:20   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 09/27] vhost: add page size info to IOTLB entry Maxime Coquelin
2023-04-25  6:20   ` Xia, Chenbo
2023-05-03 13:57     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 10/27] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 11/27] vhost: introduce backend ops Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 12/27] vhost: add IOTLB cache entry removal callback Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-05-25 11:20     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 13/27] vhost: add helper for IOTLB misses Maxime Coquelin
2023-03-31 15:42 ` [RFC 14/27] vhost: add helper for interrupt injection Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 15/27] vhost: add API to set max queue pairs Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-05-25 11:23     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 16/27] net/vhost: use " Maxime Coquelin
2023-05-05  5:07   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 17/27] vhost: add control virtqueue support Maxime Coquelin
2023-05-09  5:29   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 18/27] vhost: add VDUSE device creation and destruction Maxime Coquelin
2023-05-09  5:31   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 19/27] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
2023-05-09  5:31   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 20/27] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
2023-05-09  5:32   ` Xia, Chenbo
2023-05-25 11:35     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 21/27] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
2023-05-09  5:33   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 22/27] vhost: add VDUSE events handler Maxime Coquelin
2023-05-09  5:34   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 23/27] vhost: add support for virtqueue state get event Maxime Coquelin
2023-05-09  5:34   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 24/27] vhost: add support for VDUSE status set event Maxime Coquelin
2023-05-09  5:34   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 25/27] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
2023-05-09  5:35   ` Xia, Chenbo
2023-05-25 11:43     ` Maxime Coquelin
2023-03-31 15:42 ` [RFC 26/27] vhost: add VDUSE device startup Maxime Coquelin
2023-05-09  5:35   ` Xia, Chenbo
2023-03-31 15:42 ` [RFC 27/27] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-05-09  5:35   ` Xia, Chenbo
2023-04-06  3:44 ` [RFC 00/27] Add VDUSE support to Vhost library Yongji Xie
2023-04-06  8:16   ` Maxime Coquelin
2023-04-06 11:04     ` Yongji Xie
2023-04-12 11:33 ` Ferruh Yigit
2023-04-12 15:28   ` Maxime Coquelin
2023-04-12 19:40     ` Morten Brørup
2023-04-13  7:08       ` Xia, Chenbo
2023-04-13  7:58         ` Morten Brørup
2023-04-13  7:59         ` Maxime Coquelin
2023-04-14 10:48           ` Ferruh Yigit
2023-04-14 12:06             ` Maxime Coquelin
2023-04-14 14:25               ` Ferruh Yigit
2023-04-17  3:10                 ` Jason Wang
2023-05-05  5:53 ` Xia, Chenbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).