* [PATCH v4 00/26] Add VDUSE support to Vhost library
@ 2023-06-01 20:07 Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
` (25 more replies)
0 siblings, 26 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This series introduces a new type of backend, VDUSE,
to the Vhost library.
VDUSE stands for vDPA device in Userspace, it enables
implementing a Virtio device in userspace and have it
attached to the Kernel vDPA bus.
Once attached to the vDPA bus, it can be used either by
Kernel Virtio drivers, like virtio-net in our case, via
the virtio-vdpa driver. Doing that, the device is visible
to the Kernel networking stack and is exposed to userspace
as a regular netdev.
It can also be exposed to userspace thanks to the
vhost-vdpa driver, via a vhost-vdpa chardev that can be
passed to QEMU or Virtio-user PMD.
While VDUSE support is already available in upstream
Kernel, a couple of patches are required to support
network device type:
https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
In order to attach the created VDUSE device to the vDPA
bus, a recent iproute2 version containing the vdpa tool is
required.
Benchmark results:
==================
On this v2, PVP reference benchmark has been run & compared with
Vhost-user.
When doing macswap forwarding in the worload, no difference is seen.
When doing io forwarding in the workload, we see 4% performance
degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
explained by the use of the IOTLB layer in the Vhost-library when using
VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
Usage:
======
1. Probe required Kernel modules
# modprobe vdpa
# modprobe vduse
# modprobe virtio-vdpa
2. Build (require vduse kernel headers to be available)
# meson build
# ninja -C build
3. Create a VDUSE device (vduse0) using Vhost PMD with
testpmd (with 4 queue pairs in this example)
# ./build/app/dpdk-testpmd --no-pci --vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9 -- -i --txq=4 --rxq=4
4. Attach the VDUSE device to the vDPA bus
# vdpa dev add name vduse0 mgmtdev vduse
=> The virtio-net netdev shows up (eth0 here)
# ip l show eth0
21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
5. Start/stop traffic in testpmd
testpmd> start
testpmd> show port stats 0
######################## NIC statistics for port 0 ########################
RX-packets: 11 RX-missed: 0 RX-bytes: 1482
RX-errors: 0
RX-nombuf: 0
TX-packets: 1 TX-errors: 0 TX-bytes: 62
Throughput (since last show)
Rx-pps: 0 Rx-bps: 0
Tx-pps: 0 Tx-bps: 0
############################################################################
testpmd> stop
6. Detach the VDUSE device from the vDPA bus
# vdpa dev del vduse0
7. Quit testpmd
testpmd> quit
Known issues & remaining work:
==============================
- Fix issue in FD manager (still polling while FD has been removed)
- Add Netlink support in Vhost library
- Support device reconnection
-> a temporary patch to support reconnection via a tmpfs file is available,
upstream solution would be in-kernel and is being developed.
-> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
- Support packed ring
- Provide more performance benchmark results
Changes in v4:
==============
- Applied patch 1 and patch 2 from v3
- Rebased on top of Eelco series
- Fix coredump clear in IOTLB cache removal (David)
- Remove uneeded ret variable in vhost_vring_inject_irq (David)
- Fixed release note (David, Chenbo)
Changes in v2/v3:
=================
- Fixed mem_set_dump() parameter (patch 4)
- Fixed accidental comment change (patch 7, Chenbo)
- Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
- move change from patch 12 to 13 (Chenbo)
- Enable locks annotation for control queue (Patch 17)
- Send control queue notification when used descriptors enqueued (Patch 17)
- Lock control queue IOTLB lock (Patch 17)
- Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
- Set VDUSE dev FD as NONBLOCK (Patch 18)
- Enable more Virtio features (Patch 18)
- Remove calls to pthread_setcancelstate() (Patch 22)
- Add calls to fdset_pipe_notify() when adding and deleting FDs from a set (Patch 22)
- Use RTE_DIM() to get requests string array size (Patch 22)
- Set reply result for IOTLB update message (Patch 25, Chenbo)
- Fix queues enablement with multiqueue (Patch 26)
- Move kickfd creation for better logging (Patch 26)
- Improve logging (Patch 26)
- Uninstall cvq kickfd in case of handler installation failure (Patch 27)
- Enable CVQ notifications once handler is installed (Patch 27)
- Don't advertise multiqueue and control queue if app only request single queue pair (Patch 27)
- Add release notes
Maxime Coquelin (26):
vhost: fix IOTLB entries overlap check with previous entry
vhost: add helper of IOTLB entries coredump
vhost: add helper for IOTLB entries shared page check
vhost: don't dump unneeded pages with IOTLB
vhost: change to single IOTLB cache per device
vhost: add offset field to IOTLB entries
vhost: add page size info to IOTLB entry
vhost: retry translating IOVA after IOTLB miss
vhost: introduce backend ops
vhost: add IOTLB cache entry removal callback
vhost: add helper for IOTLB misses
vhost: add helper for interrupt injection
vhost: add API to set max queue pairs
net/vhost: use API to set max queue pairs
vhost: add control virtqueue support
vhost: add VDUSE device creation and destruction
vhost: add VDUSE callback for IOTLB miss
vhost: add VDUSE callback for IOTLB entry removal
vhost: add VDUSE callback for IRQ injection
vhost: add VDUSE events handler
vhost: add support for virtqueue state get event
vhost: add support for VDUSE status set event
vhost: add support for VDUSE IOTLB update event
vhost: add VDUSE device startup
vhost: add multiqueue support to VDUSE
vhost: add VDUSE device stop
doc/guides/prog_guide/vhost_lib.rst | 4 +
doc/guides/rel_notes/release_23_07.rst | 11 +
drivers/net/vhost/rte_eth_vhost.c | 3 +
lib/vhost/iotlb.c | 333 +++++++------
lib/vhost/iotlb.h | 45 +-
lib/vhost/meson.build | 5 +
lib/vhost/rte_vhost.h | 17 +
lib/vhost/socket.c | 72 ++-
lib/vhost/vduse.c | 646 +++++++++++++++++++++++++
lib/vhost/vduse.h | 33 ++
lib/vhost/version.map | 1 +
lib/vhost/vhost.c | 70 ++-
lib/vhost/vhost.h | 57 ++-
lib/vhost/vhost_user.c | 51 +-
lib/vhost/vhost_user.h | 2 +-
lib/vhost/virtio_net_ctrl.c | 286 +++++++++++
lib/vhost/virtio_net_ctrl.h | 10 +
17 files changed, 1408 insertions(+), 238 deletions(-)
create mode 100644 lib/vhost/vduse.c
create mode 100644 lib/vhost/vduse.h
create mode 100644 lib/vhost/virtio_net_ctrl.c
create mode 100644 lib/vhost/virtio_net_ctrl.h
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 01/26] vhost: fix IOTLB entries overlap check with previous entry
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
` (24 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin, stable
Commit 22b6d0ac691a ("vhost: fix madvise IOTLB entries pages overlap check")
fixed the check to ensure the entry to be removed does not
overlap with the next one in the IOTLB cache before marking
it as DONTDUMP with madvise(). This is not enough, because
the same issue is present when comparing with the previous
entry in the cache, where the end address of the previous
entry should be used, not the start one.
Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 3f45bc6061..870c8acb88 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -178,8 +178,8 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
mask = ~(alignment - 1);
/* Don't disable coredump if the previous node is in the same page */
- if (prev_node == NULL ||
- (node->uaddr & mask) != (prev_node->uaddr & mask)) {
+ if (prev_node == NULL || (node->uaddr & mask) !=
+ ((prev_node->uaddr + prev_node->size - 1) & mask)) {
next_node = RTE_TAILQ_NEXT(node, next);
/* Don't disable coredump if the next node is in the same page */
if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
@@ -283,8 +283,8 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
mask = ~(alignment-1);
/* Don't disable coredump if the previous node is in the same page */
- if (prev_node == NULL ||
- (node->uaddr & mask) != (prev_node->uaddr & mask)) {
+ if (prev_node == NULL || (node->uaddr & mask) !=
+ ((prev_node->uaddr + prev_node->size - 1) & mask)) {
next_node = RTE_TAILQ_NEXT(node, next);
/* Don't disable coredump if the next node is in the same page */
if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 02/26] vhost: add helper of IOTLB entries coredump
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
` (23 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch reworks IOTLB code to extract madvise-related
bits into dedicated helper. This refactoring improves code
sharing.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 77 +++++++++++++++++++++++++----------------------
1 file changed, 41 insertions(+), 36 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 870c8acb88..51d45de446 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,34 @@ struct vhost_iotlb_entry {
#define IOTLB_CACHE_SIZE 2048
+static void
+vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+{
+ uint64_t align;
+
+ align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+
+ mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+}
+
+static void
+vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+ struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
+{
+ uint64_t align, mask;
+
+ align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+ mask = ~(align - 1);
+
+ /* Don't disable coredump if the previous node is in the same page */
+ if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+ /* Don't disable coredump if the next node is in the same page */
+ if (next == NULL ||
+ ((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+ mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
+ }
+}
+
static struct vhost_iotlb_entry *
vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
{
@@ -149,8 +177,8 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue
rte_rwlock_write_lock(&vq->iotlb_lock);
RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
- mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false,
- hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
+ vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
+
TAILQ_REMOVE(&vq->iotlb_list, node, next);
vhost_user_iotlb_pool_put(vq, node);
}
@@ -164,7 +192,6 @@ static void
vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
{
struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
- uint64_t alignment, mask;
int entry_idx;
rte_rwlock_write_lock(&vq->iotlb_lock);
@@ -173,20 +200,10 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtque
RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
if (!entry_idx) {
- struct vhost_iotlb_entry *next_node;
- alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
- mask = ~(alignment - 1);
-
- /* Don't disable coredump if the previous node is in the same page */
- if (prev_node == NULL || (node->uaddr & mask) !=
- ((prev_node->uaddr + prev_node->size - 1) & mask)) {
- next_node = RTE_TAILQ_NEXT(node, next);
- /* Don't disable coredump if the next node is in the same page */
- if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
- (next_node->uaddr & mask))
- mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
- false, alignment);
- }
+ struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+ vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+
TAILQ_REMOVE(&vq->iotlb_list, node, next);
vhost_user_iotlb_pool_put(vq, node);
vq->iotlb_cache_nr--;
@@ -240,16 +257,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
vhost_user_iotlb_pool_put(vq, new_node);
goto unlock;
} else if (node->iova > new_node->iova) {
- mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
- hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+ vhost_user_iotlb_set_dump(dev, new_node);
+
TAILQ_INSERT_BEFORE(node, new_node, next);
vq->iotlb_cache_nr++;
goto unlock;
}
}
- mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
- hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
+ vhost_user_iotlb_set_dump(dev, new_node);
+
TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
vq->iotlb_cache_nr++;
@@ -265,7 +282,6 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
uint64_t iova, uint64_t size)
{
struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
- uint64_t alignment, mask;
if (unlikely(!size))
return;
@@ -278,20 +294,9 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
break;
if (iova < node->iova + node->size) {
- struct vhost_iotlb_entry *next_node;
- alignment = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
- mask = ~(alignment-1);
-
- /* Don't disable coredump if the previous node is in the same page */
- if (prev_node == NULL || (node->uaddr & mask) !=
- ((prev_node->uaddr + prev_node->size - 1) & mask)) {
- next_node = RTE_TAILQ_NEXT(node, next);
- /* Don't disable coredump if the next node is in the same page */
- if (next_node == NULL || ((node->uaddr + node->size - 1) & mask) !=
- (next_node->uaddr & mask))
- mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
- false, alignment);
- }
+ struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
+
+ vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
TAILQ_REMOVE(&vq->iotlb_list, node, next);
vhost_user_iotlb_pool_put(vq, node);
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 03/26] vhost: add helper for IOTLB entries shared page check
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
` (22 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch introduces a helper to check whether two IOTLB
entries share a page.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 51d45de446..4ef038adff 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -23,6 +23,23 @@ struct vhost_iotlb_entry {
#define IOTLB_CACHE_SIZE 2048
+static bool
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
+ uint64_t align)
+{
+ uint64_t a_end, b_start;
+
+ if (a == NULL || b == NULL)
+ return false;
+
+ /* Assumes entry a lower than entry b */
+ RTE_ASSERT(a->uaddr < b->uaddr);
+ a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
+ b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+
+ return a_end > b_start;
+}
+
static void
vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
{
@@ -37,16 +54,14 @@ static void
vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
{
- uint64_t align, mask;
+ uint64_t align;
align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
- mask = ~(align - 1);
/* Don't disable coredump if the previous node is in the same page */
- if (prev == NULL || (node->uaddr & mask) != ((prev->uaddr + prev->size - 1) & mask)) {
+ if (!vhost_user_iotlb_share_page(prev, node, align)) {
/* Don't disable coredump if the next node is in the same page */
- if (next == NULL ||
- ((node->uaddr + node->size - 1) & mask) != (next->uaddr & mask))
+ if (!vhost_user_iotlb_share_page(node, next, align))
mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
}
}
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 04/26] vhost: don't dump unneeded pages with IOTLB
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (2 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
` (21 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin, stable
On IOTLB entry removal, previous fixes took care of not
marking pages shared with other IOTLB entries as DONTDUMP.
However, if an IOTLB entry is spanned on multiple pages,
the other pages were kept as DODUMP while they might not
have been shared with other entries, increasing needlessly
the coredump size.
This patch addresses this issue by excluding only the
shared pages from madvise's DONTDUMP.
Fixes: dea092d0addb ("vhost: fix madvise arguments alignment")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 4ef038adff..95d67ac832 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -54,16 +54,23 @@ static void
vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
{
- uint64_t align;
+ uint64_t align, start, end;
+
+ start = node->uaddr;
+ end = node->uaddr + node->size;
align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
- /* Don't disable coredump if the previous node is in the same page */
- if (!vhost_user_iotlb_share_page(prev, node, align)) {
- /* Don't disable coredump if the next node is in the same page */
- if (!vhost_user_iotlb_share_page(node, next, align))
- mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, false, align);
- }
+ /* Skip first page if shared with previous entry. */
+ if (vhost_user_iotlb_share_page(prev, node, align))
+ start = RTE_ALIGN_CEIL(start, align);
+
+ /* Skip last page if shared with next entry. */
+ if (vhost_user_iotlb_share_page(node, next, align))
+ end = RTE_ALIGN_FLOOR(end, align);
+
+ if (end > start)
+ mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
}
static struct vhost_iotlb_entry *
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 05/26] vhost: change to single IOTLB cache per device
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (3 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
` (20 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch simplifies IOTLB implementation and improves
IOTLB memory consumption by having a single IOTLB cache
per device, instead of having one per queue.
In order to not impact performance, it keeps an IOTLB lock
per virtqueue, so that there is no contention between
multiple queue trying to acquire it.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 212 +++++++++++++++++++----------------------
lib/vhost/iotlb.h | 43 ++++++---
lib/vhost/vhost.c | 18 ++--
lib/vhost/vhost.h | 16 ++--
lib/vhost/vhost_user.c | 23 +++--
5 files changed, 159 insertions(+), 153 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 95d67ac832..6d49bf6b30 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -74,86 +74,81 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
}
static struct vhost_iotlb_entry *
-vhost_user_iotlb_pool_get(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pool_get(struct virtio_net *dev)
{
struct vhost_iotlb_entry *node;
- rte_spinlock_lock(&vq->iotlb_free_lock);
- node = SLIST_FIRST(&vq->iotlb_free_list);
+ rte_spinlock_lock(&dev->iotlb_free_lock);
+ node = SLIST_FIRST(&dev->iotlb_free_list);
if (node != NULL)
- SLIST_REMOVE_HEAD(&vq->iotlb_free_list, next_free);
- rte_spinlock_unlock(&vq->iotlb_free_lock);
+ SLIST_REMOVE_HEAD(&dev->iotlb_free_list, next_free);
+ rte_spinlock_unlock(&dev->iotlb_free_lock);
return node;
}
static void
-vhost_user_iotlb_pool_put(struct vhost_virtqueue *vq,
- struct vhost_iotlb_entry *node)
+vhost_user_iotlb_pool_put(struct virtio_net *dev, struct vhost_iotlb_entry *node)
{
- rte_spinlock_lock(&vq->iotlb_free_lock);
- SLIST_INSERT_HEAD(&vq->iotlb_free_list, node, next_free);
- rte_spinlock_unlock(&vq->iotlb_free_lock);
+ rte_spinlock_lock(&dev->iotlb_free_lock);
+ SLIST_INSERT_HEAD(&dev->iotlb_free_list, node, next_free);
+ rte_spinlock_unlock(&dev->iotlb_free_lock);
}
static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq);
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev);
static void
-vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
+vhost_user_iotlb_pending_remove_all(struct virtio_net *dev)
{
struct vhost_iotlb_entry *node, *temp_node;
- rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_lock(&dev->iotlb_pending_lock);
- RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
- TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
- vhost_user_iotlb_pool_put(vq, node);
+ RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next, temp_node) {
+ TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+ vhost_user_iotlb_pool_put(dev, node);
}
- rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
}
bool
-vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
- uint8_t perm)
+vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
{
struct vhost_iotlb_entry *node;
bool found = false;
- rte_rwlock_read_lock(&vq->iotlb_pending_lock);
+ rte_rwlock_read_lock(&dev->iotlb_pending_lock);
- TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
+ TAILQ_FOREACH(node, &dev->iotlb_pending_list, next) {
if ((node->iova == iova) && (node->perm == perm)) {
found = true;
break;
}
}
- rte_rwlock_read_unlock(&vq->iotlb_pending_lock);
+ rte_rwlock_read_unlock(&dev->iotlb_pending_lock);
return found;
}
void
-vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint8_t perm)
+vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm)
{
struct vhost_iotlb_entry *node;
- node = vhost_user_iotlb_pool_get(vq);
+ node = vhost_user_iotlb_pool_get(dev);
if (node == NULL) {
VHOST_LOG_CONFIG(dev->ifname, DEBUG,
- "IOTLB pool for vq %"PRIu32" empty, clear entries for pending insertion\n",
- vq->index);
- if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
- vhost_user_iotlb_pending_remove_all(vq);
+ "IOTLB pool empty, clear entries for pending insertion\n");
+ if (!TAILQ_EMPTY(&dev->iotlb_pending_list))
+ vhost_user_iotlb_pending_remove_all(dev);
else
- vhost_user_iotlb_cache_random_evict(dev, vq);
- node = vhost_user_iotlb_pool_get(vq);
+ vhost_user_iotlb_cache_random_evict(dev);
+ node = vhost_user_iotlb_pool_get(dev);
if (node == NULL) {
VHOST_LOG_CONFIG(dev->ifname, ERR,
- "IOTLB pool vq %"PRIu32" still empty, pending insertion failure\n",
- vq->index);
+ "IOTLB pool still empty, pending insertion failure\n");
return;
}
}
@@ -161,22 +156,21 @@ vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *
node->iova = iova;
node->perm = perm;
- rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_lock(&dev->iotlb_pending_lock);
- TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
+ TAILQ_INSERT_TAIL(&dev->iotlb_pending_list, node, next);
- rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
}
void
-vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
- uint64_t iova, uint64_t size, uint8_t perm)
+vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova, uint64_t size, uint8_t perm)
{
struct vhost_iotlb_entry *node, *temp_node;
- rte_rwlock_write_lock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_lock(&dev->iotlb_pending_lock);
- RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next,
+ RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_pending_list, next,
temp_node) {
if (node->iova < iova)
continue;
@@ -184,81 +178,78 @@ vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
continue;
if ((node->perm & perm) != node->perm)
continue;
- TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
- vhost_user_iotlb_pool_put(vq, node);
+ TAILQ_REMOVE(&dev->iotlb_pending_list, node, next);
+ vhost_user_iotlb_pool_put(dev, node);
}
- rte_rwlock_write_unlock(&vq->iotlb_pending_lock);
+ rte_rwlock_write_unlock(&dev->iotlb_pending_lock);
}
static void
-vhost_user_iotlb_cache_remove_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
{
struct vhost_iotlb_entry *node, *temp_node;
- rte_rwlock_write_lock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_lock_all(dev);
- RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+ RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
- TAILQ_REMOVE(&vq->iotlb_list, node, next);
- vhost_user_iotlb_pool_put(vq, node);
+ TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_pool_put(dev, node);
}
- vq->iotlb_cache_nr = 0;
+ dev->iotlb_cache_nr = 0;
- rte_rwlock_write_unlock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_unlock_all(dev);
}
static void
-vhost_user_iotlb_cache_random_evict(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
{
struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
int entry_idx;
- rte_rwlock_write_lock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_lock_all(dev);
- entry_idx = rte_rand() % vq->iotlb_cache_nr;
+ entry_idx = rte_rand() % dev->iotlb_cache_nr;
- RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+ RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
if (!entry_idx) {
struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
- TAILQ_REMOVE(&vq->iotlb_list, node, next);
- vhost_user_iotlb_pool_put(vq, node);
- vq->iotlb_cache_nr--;
+ TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_pool_put(dev, node);
+ dev->iotlb_cache_nr--;
break;
}
prev_node = node;
entry_idx--;
}
- rte_rwlock_write_unlock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_unlock_all(dev);
}
void
-vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint64_t uaddr,
+vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
uint64_t size, uint8_t perm)
{
struct vhost_iotlb_entry *node, *new_node;
- new_node = vhost_user_iotlb_pool_get(vq);
+ new_node = vhost_user_iotlb_pool_get(dev);
if (new_node == NULL) {
VHOST_LOG_CONFIG(dev->ifname, DEBUG,
- "IOTLB pool vq %"PRIu32" empty, clear entries for cache insertion\n",
- vq->index);
- if (!TAILQ_EMPTY(&vq->iotlb_list))
- vhost_user_iotlb_cache_random_evict(dev, vq);
+ "IOTLB pool empty, clear entries for cache insertion\n");
+ if (!TAILQ_EMPTY(&dev->iotlb_list))
+ vhost_user_iotlb_cache_random_evict(dev);
else
- vhost_user_iotlb_pending_remove_all(vq);
- new_node = vhost_user_iotlb_pool_get(vq);
+ vhost_user_iotlb_pending_remove_all(dev);
+ new_node = vhost_user_iotlb_pool_get(dev);
if (new_node == NULL) {
VHOST_LOG_CONFIG(dev->ifname, ERR,
- "IOTLB pool vq %"PRIu32" still empty, cache insertion failed\n",
- vq->index);
+ "IOTLB pool still empty, cache insertion failed\n");
return;
}
}
@@ -268,49 +259,47 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
new_node->size = size;
new_node->perm = perm;
- rte_rwlock_write_lock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_lock_all(dev);
- TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+ TAILQ_FOREACH(node, &dev->iotlb_list, next) {
/*
* Entries must be invalidated before being updated.
* So if iova already in list, assume identical.
*/
if (node->iova == new_node->iova) {
- vhost_user_iotlb_pool_put(vq, new_node);
+ vhost_user_iotlb_pool_put(dev, new_node);
goto unlock;
} else if (node->iova > new_node->iova) {
vhost_user_iotlb_set_dump(dev, new_node);
TAILQ_INSERT_BEFORE(node, new_node, next);
- vq->iotlb_cache_nr++;
+ dev->iotlb_cache_nr++;
goto unlock;
}
}
vhost_user_iotlb_set_dump(dev, new_node);
- TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
- vq->iotlb_cache_nr++;
+ TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
+ dev->iotlb_cache_nr++;
unlock:
- vhost_user_iotlb_pending_remove(vq, iova, size, perm);
-
- rte_rwlock_write_unlock(&vq->iotlb_lock);
+ vhost_user_iotlb_pending_remove(dev, iova, size, perm);
+ vhost_user_iotlb_wr_unlock_all(dev);
}
void
-vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint64_t size)
+vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size)
{
struct vhost_iotlb_entry *node, *temp_node, *prev_node = NULL;
if (unlikely(!size))
return;
- rte_rwlock_write_lock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_lock_all(dev);
- RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+ RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
/* Sorted list */
if (unlikely(iova + size < node->iova))
break;
@@ -320,19 +309,19 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq
vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
- TAILQ_REMOVE(&vq->iotlb_list, node, next);
- vhost_user_iotlb_pool_put(vq, node);
- vq->iotlb_cache_nr--;
- } else
+ TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_pool_put(dev, node);
+ dev->iotlb_cache_nr--;
+ } else {
prev_node = node;
+ }
}
- rte_rwlock_write_unlock(&vq->iotlb_lock);
+ vhost_user_iotlb_wr_unlock_all(dev);
}
uint64_t
-vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
- uint64_t *size, uint8_t perm)
+vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *size, uint8_t perm)
{
struct vhost_iotlb_entry *node;
uint64_t offset, vva = 0, mapped = 0;
@@ -340,7 +329,7 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
if (unlikely(!*size))
goto out;
- TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+ TAILQ_FOREACH(node, &dev->iotlb_list, next) {
/* List sorted by iova */
if (unlikely(iova < node->iova))
break;
@@ -373,60 +362,57 @@ vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
}
void
-vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_flush_all(struct virtio_net *dev)
{
- vhost_user_iotlb_cache_remove_all(dev, vq);
- vhost_user_iotlb_pending_remove_all(vq);
+ vhost_user_iotlb_cache_remove_all(dev);
+ vhost_user_iotlb_pending_remove_all(dev);
}
int
-vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq)
+vhost_user_iotlb_init(struct virtio_net *dev)
{
unsigned int i;
int socket = 0;
- if (vq->iotlb_pool) {
+ if (dev->iotlb_pool) {
/*
* The cache has already been initialized,
* just drop all cached and pending entries.
*/
- vhost_user_iotlb_flush_all(dev, vq);
- rte_free(vq->iotlb_pool);
+ vhost_user_iotlb_flush_all(dev);
+ rte_free(dev->iotlb_pool);
}
#ifdef RTE_LIBRTE_VHOST_NUMA
- if (get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR) != 0)
+ if (get_mempolicy(&socket, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR) != 0)
socket = 0;
#endif
- rte_spinlock_init(&vq->iotlb_free_lock);
- rte_rwlock_init(&vq->iotlb_lock);
- rte_rwlock_init(&vq->iotlb_pending_lock);
+ rte_spinlock_init(&dev->iotlb_free_lock);
+ rte_rwlock_init(&dev->iotlb_pending_lock);
- SLIST_INIT(&vq->iotlb_free_list);
- TAILQ_INIT(&vq->iotlb_list);
- TAILQ_INIT(&vq->iotlb_pending_list);
+ SLIST_INIT(&dev->iotlb_free_list);
+ TAILQ_INIT(&dev->iotlb_list);
+ TAILQ_INIT(&dev->iotlb_pending_list);
if (dev->flags & VIRTIO_DEV_SUPPORT_IOMMU) {
- vq->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
+ dev->iotlb_pool = rte_calloc_socket("iotlb", IOTLB_CACHE_SIZE,
sizeof(struct vhost_iotlb_entry), 0, socket);
- if (!vq->iotlb_pool) {
- VHOST_LOG_CONFIG(dev->ifname, ERR,
- "Failed to create IOTLB cache pool for vq %"PRIu32"\n",
- vq->index);
+ if (!dev->iotlb_pool) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to create IOTLB cache pool\n");
return -1;
}
for (i = 0; i < IOTLB_CACHE_SIZE; i++)
- vhost_user_iotlb_pool_put(vq, &vq->iotlb_pool[i]);
+ vhost_user_iotlb_pool_put(dev, &dev->iotlb_pool[i]);
}
- vq->iotlb_cache_nr = 0;
+ dev->iotlb_cache_nr = 0;
return 0;
}
void
-vhost_user_iotlb_destroy(struct vhost_virtqueue *vq)
+vhost_user_iotlb_destroy(struct virtio_net *dev)
{
- rte_free(vq->iotlb_pool);
+ rte_free(dev->iotlb_pool);
}
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 73b5465b41..3490b9e6be 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -37,20 +37,37 @@ vhost_user_iotlb_wr_unlock(struct vhost_virtqueue *vq)
rte_rwlock_write_unlock(&vq->iotlb_lock);
}
-void vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint64_t uaddr,
+static __rte_always_inline void
+vhost_user_iotlb_wr_lock_all(struct virtio_net *dev)
+ __rte_no_thread_safety_analysis
+{
+ uint32_t i;
+
+ for (i = 0; i < dev->nr_vring; i++)
+ rte_rwlock_write_lock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+static __rte_always_inline void
+vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
+ __rte_no_thread_safety_analysis
+{
+ uint32_t i;
+
+ for (i = 0; i < dev->nr_vring; i++)
+ rte_rwlock_write_unlock(&dev->virtqueue[i]->iotlb_lock);
+}
+
+void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
uint64_t size, uint8_t perm);
-void vhost_user_iotlb_cache_remove(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint64_t size);
-uint64_t vhost_user_iotlb_cache_find(struct vhost_virtqueue *vq, uint64_t iova,
+void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
+uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
uint64_t *size, uint8_t perm);
-bool vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
- uint8_t perm);
-void vhost_user_iotlb_pending_insert(struct virtio_net *dev, struct vhost_virtqueue *vq,
- uint64_t iova, uint8_t perm);
-void vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq, uint64_t iova,
+bool vhost_user_iotlb_pending_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_insert(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+void vhost_user_iotlb_pending_remove(struct virtio_net *dev, uint64_t iova,
uint64_t size, uint8_t perm);
-void vhost_user_iotlb_flush_all(struct virtio_net *dev, struct vhost_virtqueue *vq);
-int vhost_user_iotlb_init(struct virtio_net *dev, struct vhost_virtqueue *vq);
-void vhost_user_iotlb_destroy(struct vhost_virtqueue *vq);
+void vhost_user_iotlb_flush_all(struct virtio_net *dev);
+int vhost_user_iotlb_init(struct virtio_net *dev);
+void vhost_user_iotlb_destroy(struct virtio_net *dev);
+
#endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 79e88f986e..3ddd2a963f 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -67,7 +67,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
tmp_size = *size;
- vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
+ vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
if (tmp_size == *size) {
if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
vq->stats.iotlb_hits++;
@@ -79,7 +79,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
iova += tmp_size;
- if (!vhost_user_iotlb_pending_miss(vq, iova, perm)) {
+ if (!vhost_user_iotlb_pending_miss(dev, iova, perm)) {
/*
* iotlb_lock is read-locked for a full burst,
* but it only protects the iotlb cache.
@@ -89,12 +89,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
*/
vhost_user_iotlb_rd_unlock(vq);
- vhost_user_iotlb_pending_insert(dev, vq, iova, perm);
+ vhost_user_iotlb_pending_insert(dev, iova, perm);
if (vhost_user_iotlb_miss(dev, iova, perm)) {
VHOST_LOG_DATA(dev->ifname, ERR,
"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
iova);
- vhost_user_iotlb_pending_remove(vq, iova, 1, perm);
+ vhost_user_iotlb_pending_remove(dev, iova, 1, perm);
}
vhost_user_iotlb_rd_lock(vq);
@@ -401,7 +401,6 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
vhost_free_async_mem(vq);
rte_rwlock_write_unlock(&vq->access_lock);
rte_free(vq->batch_copy_elems);
- vhost_user_iotlb_destroy(vq);
rte_free(vq->log_cache);
rte_free(vq);
}
@@ -579,7 +578,7 @@ vring_invalidate(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq
}
static void
-init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
+init_vring_queue(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq,
uint32_t vring_idx)
{
int numa_node = SOCKET_ID_ANY;
@@ -599,8 +598,6 @@ init_vring_queue(struct virtio_net *dev, struct vhost_virtqueue *vq,
}
#endif
vq->numa_node = numa_node;
-
- vhost_user_iotlb_init(dev, vq);
}
static void
@@ -635,6 +632,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
dev->virtqueue[i] = vq;
init_vring_queue(dev, vq, i);
rte_rwlock_init(&vq->access_lock);
+ rte_rwlock_init(&vq->iotlb_lock);
vq->avail_wrap_counter = 1;
vq->used_wrap_counter = 1;
vq->signalled_used_valid = false;
@@ -799,6 +797,10 @@ vhost_setup_virtio_net(int vid, bool enable, bool compliant_ol_flags, bool stats
dev->flags |= VIRTIO_DEV_SUPPORT_IOMMU;
else
dev->flags &= ~VIRTIO_DEV_SUPPORT_IOMMU;
+
+ if (vhost_user_iotlb_init(dev) < 0)
+ VHOST_LOG_CONFIG("device", ERR, "failed to init IOTLB\n");
+
}
void
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index eaf3b0d392..ee952de175 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -305,13 +305,6 @@ struct vhost_virtqueue {
struct log_cache_entry *log_cache;
rte_rwlock_t iotlb_lock;
- rte_rwlock_t iotlb_pending_lock;
- struct vhost_iotlb_entry *iotlb_pool;
- TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
- TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
- int iotlb_cache_nr;
- rte_spinlock_t iotlb_free_lock;
- SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
/* Used to notify the guest (trigger interrupt) */
int callfd;
@@ -486,6 +479,15 @@ struct virtio_net {
int extbuf;
int linearbuf;
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
+
+ rte_rwlock_t iotlb_pending_lock;
+ struct vhost_iotlb_entry *iotlb_pool;
+ TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+ TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
+ int iotlb_cache_nr;
+ rte_spinlock_t iotlb_free_lock;
+ SLIST_HEAD(, vhost_iotlb_entry) iotlb_free_list;
+
struct inflight_mem_info *inflight_info;
#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
char ifname[IF_NAME_SZ];
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index c9454ce3d9..f2fe7ebc93 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -237,6 +237,8 @@ vhost_backend_cleanup(struct virtio_net *dev)
}
dev->postcopy_listening = 0;
+
+ vhost_user_iotlb_destroy(dev);
}
static void
@@ -539,7 +541,6 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
if (vq != dev->virtqueue[vq->index]) {
VHOST_LOG_CONFIG(dev->ifname, INFO, "reallocated virtqueue on node %d\n", node);
dev->virtqueue[vq->index] = vq;
- vhost_user_iotlb_init(dev, vq);
}
if (vq_is_packed(dev)) {
@@ -664,6 +665,8 @@ numa_realloc(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
return;
}
dev->guest_pages = gp;
+
+ vhost_user_iotlb_init(dev);
}
#else
static void
@@ -1360,8 +1363,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
/* Flush IOTLB cache as previous HVAs are now invalid */
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
- for (i = 0; i < dev->nr_vring; i++)
- vhost_user_iotlb_flush_all(dev, dev->virtqueue[i]);
+ vhost_user_iotlb_flush_all(dev);
free_mem_region(dev);
rte_free(dev->mem);
@@ -2194,7 +2196,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
ctx->msg.size = sizeof(ctx->msg.payload.state);
ctx->fd_num = 0;
- vhost_user_iotlb_flush_all(dev, vq);
+ vhost_user_iotlb_flush_all(dev);
vring_invalidate(dev, vq);
@@ -2639,15 +2641,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
if (!vva)
return RTE_VHOST_MSG_RESULT_ERR;
+ vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+
for (i = 0; i < dev->nr_vring; i++) {
struct vhost_virtqueue *vq = dev->virtqueue[i];
if (!vq)
continue;
- vhost_user_iotlb_cache_insert(dev, vq, imsg->iova, vva,
- len, imsg->perm);
-
if (is_vring_iotlb(dev, vq, imsg)) {
rte_rwlock_write_lock(&vq->access_lock);
translate_ring_addresses(&dev, &vq);
@@ -2657,15 +2658,14 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
}
break;
case VHOST_IOTLB_INVALIDATE:
+ vhost_user_iotlb_cache_remove(dev, imsg->iova, imsg->size);
+
for (i = 0; i < dev->nr_vring; i++) {
struct vhost_virtqueue *vq = dev->virtqueue[i];
if (!vq)
continue;
- vhost_user_iotlb_cache_remove(dev, vq, imsg->iova,
- imsg->size);
-
if (is_vring_iotlb(dev, vq, imsg)) {
rte_rwlock_write_lock(&vq->access_lock);
vring_invalidate(dev, vq);
@@ -2674,8 +2674,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
}
break;
default:
- VHOST_LOG_CONFIG(dev->ifname, ERR,
- "invalid IOTLB message type (%d)\n",
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "invalid IOTLB message type (%d)\n",
imsg->type);
return RTE_VHOST_MSG_RESULT_ERR;
}
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 06/26] vhost: add offset field to IOTLB entries
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (4 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
` (19 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch is a preliminary work to prepare for VDUSE
support, for which we need to keep track of the mmaped base
address and offset in order to be able to unmap it later
when IOTLB entry is invalidated.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 30 ++++++++++++++++++------------
lib/vhost/iotlb.h | 2 +-
lib/vhost/vhost_user.c | 2 +-
3 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 6d49bf6b30..aa5100e6e7 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -17,6 +17,7 @@ struct vhost_iotlb_entry {
uint64_t iova;
uint64_t uaddr;
+ uint64_t uoffset;
uint64_t size;
uint8_t perm;
};
@@ -27,15 +28,18 @@ static bool
vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
uint64_t align)
{
- uint64_t a_end, b_start;
+ uint64_t a_start, a_end, b_start;
if (a == NULL || b == NULL)
return false;
+ a_start = a->uaddr + a->uoffset;
+ b_start = b->uaddr + b->uoffset;
+
/* Assumes entry a lower than entry b */
- RTE_ASSERT(a->uaddr < b->uaddr);
- a_end = RTE_ALIGN_CEIL(a->uaddr + a->size, align);
- b_start = RTE_ALIGN_FLOOR(b->uaddr, align);
+ RTE_ASSERT(a_start < b_start);
+ a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
+ b_start = RTE_ALIGN_FLOOR(b_start, align);
return a_end > b_start;
}
@@ -43,11 +47,12 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
static void
vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
{
- uint64_t align;
+ uint64_t align, start;
- align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+ start = node->uaddr + node->uoffset;
+ align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
- mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true, align);
+ mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
}
static void
@@ -56,10 +61,10 @@ vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *no
{
uint64_t align, start, end;
- start = node->uaddr;
- end = node->uaddr + node->size;
+ start = node->uaddr + node->uoffset;
+ end = start + node->size;
- align = hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr);
+ align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
/* Skip first page if shared with previous entry. */
if (vhost_user_iotlb_share_page(prev, node, align))
@@ -234,7 +239,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
void
vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
- uint64_t size, uint8_t perm)
+ uint64_t uoffset, uint64_t size, uint8_t perm)
{
struct vhost_iotlb_entry *node, *new_node;
@@ -256,6 +261,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
new_node->iova = iova;
new_node->uaddr = uaddr;
+ new_node->uoffset = uoffset;
new_node->size = size;
new_node->perm = perm;
@@ -344,7 +350,7 @@ vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova, uint64_t *siz
offset = iova - node->iova;
if (!vva)
- vva = node->uaddr + offset;
+ vva = node->uaddr + node->uoffset + offset;
mapped += node->size - offset;
iova = node->iova + node->size;
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index 3490b9e6be..bee36c5903 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
}
void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
- uint64_t size, uint8_t perm);
+ uint64_t uoffset, uint64_t size, uint8_t perm);
void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index f2fe7ebc93..7f88a8754f 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -2641,7 +2641,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
if (!vva)
return RTE_VHOST_MSG_RESULT_ERR;
- vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, len, imsg->perm);
+ vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
for (i = 0; i < dev->nr_vring; i++) {
struct vhost_virtqueue *vq = dev->virtqueue[i];
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 07/26] vhost: add page size info to IOTLB entry
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (5 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
` (18 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
VDUSE will close the file descriptor after having mapped
the shared memory, so it will not be possible to get the
page size afterwards.
This patch adds an new page_shift field to the IOTLB entry,
so that the information will be passed at IOTLB cache
insertion time. The information is stored as a bit shift
value so that IOTLB entry keeps fitting in a single
cacheline.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 46 ++++++++++++++++++++----------------------
lib/vhost/iotlb.h | 2 +-
lib/vhost/vhost.h | 1 -
lib/vhost/vhost_user.c | 8 +++++---
4 files changed, 28 insertions(+), 29 deletions(-)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index aa5100e6e7..87986f2489 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -19,14 +19,14 @@ struct vhost_iotlb_entry {
uint64_t uaddr;
uint64_t uoffset;
uint64_t size;
+ uint8_t page_shift;
uint8_t perm;
};
#define IOTLB_CACHE_SIZE 2048
static bool
-vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b,
- uint64_t align)
+vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
{
uint64_t a_start, a_end, b_start;
@@ -38,44 +38,41 @@ vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entr
/* Assumes entry a lower than entry b */
RTE_ASSERT(a_start < b_start);
- a_end = RTE_ALIGN_CEIL(a_start + a->size, align);
- b_start = RTE_ALIGN_FLOOR(b_start, align);
+ a_end = RTE_ALIGN_CEIL(a_start + a->size, RTE_BIT64(a->page_shift));
+ b_start = RTE_ALIGN_FLOOR(b_start, RTE_BIT64(b->page_shift));
return a_end > b_start;
}
static void
-vhost_user_iotlb_set_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node)
+vhost_user_iotlb_set_dump(struct vhost_iotlb_entry *node)
{
- uint64_t align, start;
+ uint64_t start;
start = node->uaddr + node->uoffset;
- align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
- mem_set_dump((void *)(uintptr_t)start, node->size, true, align);
+ mem_set_dump((void *)(uintptr_t)start, node->size, true, RTE_BIT64(node->page_shift));
}
static void
-vhost_user_iotlb_clear_dump(struct virtio_net *dev, struct vhost_iotlb_entry *node,
+vhost_user_iotlb_clear_dump(struct vhost_iotlb_entry *node,
struct vhost_iotlb_entry *prev, struct vhost_iotlb_entry *next)
{
- uint64_t align, start, end;
+ uint64_t start, end;
start = node->uaddr + node->uoffset;
end = start + node->size;
- align = hua_to_alignment(dev->mem, (void *)(uintptr_t)start);
-
/* Skip first page if shared with previous entry. */
- if (vhost_user_iotlb_share_page(prev, node, align))
- start = RTE_ALIGN_CEIL(start, align);
+ if (vhost_user_iotlb_share_page(prev, node))
+ start = RTE_ALIGN_CEIL(start, RTE_BIT64(node->page_shift));
/* Skip last page if shared with next entry. */
- if (vhost_user_iotlb_share_page(node, next, align))
- end = RTE_ALIGN_FLOOR(end, align);
+ if (vhost_user_iotlb_share_page(node, next))
+ end = RTE_ALIGN_FLOOR(end, RTE_BIT64(node->page_shift));
if (end > start)
- mem_set_dump((void *)(uintptr_t)start, end - start, false, align);
+ mem_set_dump((void *)(uintptr_t)start, end - start, false,
+ RTE_BIT64(node->page_shift));
}
static struct vhost_iotlb_entry *
@@ -198,7 +195,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
vhost_user_iotlb_wr_lock_all(dev);
RTE_TAILQ_FOREACH_SAFE(node, &dev->iotlb_list, next, temp_node) {
- vhost_user_iotlb_clear_dump(dev, node, NULL, NULL);
+ vhost_user_iotlb_clear_dump(node, NULL, NULL);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
vhost_user_iotlb_pool_put(dev, node);
@@ -223,7 +220,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
if (!entry_idx) {
struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
- vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+ vhost_user_iotlb_clear_dump(node, prev_node, next_node);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
vhost_user_iotlb_pool_put(dev, node);
@@ -239,7 +236,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
void
vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
- uint64_t uoffset, uint64_t size, uint8_t perm)
+ uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm)
{
struct vhost_iotlb_entry *node, *new_node;
@@ -263,6 +260,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
new_node->uaddr = uaddr;
new_node->uoffset = uoffset;
new_node->size = size;
+ new_node->page_shift = __builtin_ctzll(page_size);
new_node->perm = perm;
vhost_user_iotlb_wr_lock_all(dev);
@@ -276,7 +274,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
vhost_user_iotlb_pool_put(dev, new_node);
goto unlock;
} else if (node->iova > new_node->iova) {
- vhost_user_iotlb_set_dump(dev, new_node);
+ vhost_user_iotlb_set_dump(new_node);
TAILQ_INSERT_BEFORE(node, new_node, next);
dev->iotlb_cache_nr++;
@@ -284,7 +282,7 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t ua
}
}
- vhost_user_iotlb_set_dump(dev, new_node);
+ vhost_user_iotlb_set_dump(new_node);
TAILQ_INSERT_TAIL(&dev->iotlb_list, new_node, next);
dev->iotlb_cache_nr++;
@@ -313,7 +311,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
if (iova < node->iova + node->size) {
struct vhost_iotlb_entry *next_node = RTE_TAILQ_NEXT(node, next);
- vhost_user_iotlb_clear_dump(dev, node, prev_node, next_node);
+ vhost_user_iotlb_clear_dump(node, prev_node, next_node);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
vhost_user_iotlb_pool_put(dev, node);
diff --git a/lib/vhost/iotlb.h b/lib/vhost/iotlb.h
index bee36c5903..81ca04df21 100644
--- a/lib/vhost/iotlb.h
+++ b/lib/vhost/iotlb.h
@@ -58,7 +58,7 @@ vhost_user_iotlb_wr_unlock_all(struct virtio_net *dev)
}
void vhost_user_iotlb_cache_insert(struct virtio_net *dev, uint64_t iova, uint64_t uaddr,
- uint64_t uoffset, uint64_t size, uint8_t perm);
+ uint64_t uoffset, uint64_t size, uint64_t page_size, uint8_t perm);
void vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t size);
uint64_t vhost_user_iotlb_cache_find(struct virtio_net *dev, uint64_t iova,
uint64_t *size, uint8_t perm);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ee952de175..de84c115b7 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -1032,7 +1032,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
return true;
}
-uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
/* Versioned functions */
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 7f88a8754f..98d8b8ac79 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -743,7 +743,7 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
return log_gpa;
}
-uint64_t
+static uint64_t
hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
{
struct rte_vhost_mem_region *r;
@@ -2632,7 +2632,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
struct virtio_net *dev = *pdev;
struct vhost_iotlb_msg *imsg = &ctx->msg.payload.iotlb;
uint16_t i;
- uint64_t vva, len;
+ uint64_t vva, len, pg_sz;
switch (imsg->type) {
case VHOST_IOTLB_UPDATE:
@@ -2641,7 +2641,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev,
if (!vva)
return RTE_VHOST_MSG_RESULT_ERR;
- vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, imsg->perm);
+ pg_sz = hua_to_alignment(dev->mem, (void *)(uintptr_t)vva);
+
+ vhost_user_iotlb_cache_insert(dev, imsg->iova, vva, 0, len, pg_sz, imsg->perm);
for (i = 0; i < dev->nr_vring; i++) {
struct vhost_virtqueue *vq = dev->virtqueue[i];
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 08/26] vhost: retry translating IOVA after IOTLB miss
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (6 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 09/26] vhost: introduce backend ops Maxime Coquelin
` (17 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
Vhost-user backend IOTLB misses and updates are
asynchronous, so IOVA address translation function
just fails after having sent an IOTLB miss update if needed
entry was not in the IOTLB cache.
This is not the case for VDUSE, for which the needed IOTLB
update is returned directly when sending an IOTLB miss.
This patch retry again finding the needed entry in the
IOTLB cache after having sent an IOTLB miss.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vhost.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 3ddd2a963f..7e1af487c1 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -100,6 +100,12 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
vhost_user_iotlb_rd_lock(vq);
}
+ tmp_size = *size;
+ /* Retry in case of VDUSE, as it is synchronous */
+ vva = vhost_user_iotlb_cache_find(dev, iova, &tmp_size, perm);
+ if (tmp_size == *size)
+ return vva;
+
return 0;
}
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 09/26] vhost: introduce backend ops
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (7 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
` (16 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch introduces backend ops struct, that will enable
calling backend specifics callbacks (Vhost-user, VDUSE), in
shared code.
This is an empty shell for now, it will be filled in later
patches.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/socket.c | 2 +-
lib/vhost/vhost.c | 8 +++++++-
lib/vhost/vhost.h | 10 +++++++++-
lib/vhost/vhost_user.c | 8 ++++++++
lib/vhost/vhost_user.h | 1 +
5 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index cb0218b7bc..407d0011c3 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -223,7 +223,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
return;
}
- vid = vhost_new_device();
+ vid = vhost_user_new_device();
if (vid == -1) {
goto err;
}
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 7e1af487c1..d054772bf8 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -680,11 +680,16 @@ reset_device(struct virtio_net *dev)
* there is a new virtio device being attached).
*/
int
-vhost_new_device(void)
+vhost_new_device(struct vhost_backend_ops *ops)
{
struct virtio_net *dev;
int i;
+ if (ops == NULL) {
+ VHOST_LOG_CONFIG("device", ERR, "missing backend ops.\n");
+ return -1;
+ }
+
pthread_mutex_lock(&vhost_dev_lock);
for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
if (vhost_devices[i] == NULL)
@@ -712,6 +717,7 @@ vhost_new_device(void)
dev->backend_req_fd = -1;
dev->postcopy_ufd = -1;
rte_spinlock_init(&dev->backend_req_lock);
+ dev->backend_ops = ops;
return i;
}
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index de84c115b7..966b6dd67a 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,6 +89,12 @@
for (iter = val; iter < num; iter++)
#endif
+/**
+ * Structure that contains backend-specific ops.
+ */
+struct vhost_backend_ops {
+};
+
/**
* Structure contains buffer address, length and descriptor index
* from vring to do scatter RX.
@@ -516,6 +522,8 @@ struct virtio_net {
void *extern_data;
/* pre and post vhost user message handlers for the device */
struct rte_vhost_user_extern_ops extern_ops;
+
+ struct vhost_backend_ops *backend_ops;
} __rte_cache_aligned;
static inline void
@@ -815,7 +823,7 @@ get_device(int vid)
return dev;
}
-int vhost_new_device(void);
+int vhost_new_device(struct vhost_backend_ops *ops);
void cleanup_device(struct virtio_net *dev, int destroy);
void reset_device(struct virtio_net *dev);
void vhost_destroy_device(int);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 98d8b8ac79..2cd86686a5 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3464,3 +3464,11 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
return ret;
}
+
+static struct vhost_backend_ops vhost_user_backend_ops;
+
+int
+vhost_user_new_device(void)
+{
+ return vhost_new_device(&vhost_user_backend_ops);
+}
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index a0987a58f9..61456049c8 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -185,5 +185,6 @@ int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
int *fd_num);
int send_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int fd_num);
+int vhost_user_new_device(void);
#endif
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 10/26] vhost: add IOTLB cache entry removal callback
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (8 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 09/26] vhost: introduce backend ops Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
` (15 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
VDUSE will need to munmap() the IOTLB entry on removal
from the cache, as it performs mmap() before insertion.
This patch introduces a callback that VDUSE layer will
implement to achieve this.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/iotlb.c | 12 ++++++++++++
lib/vhost/vhost.h | 3 +++
2 files changed, 15 insertions(+)
diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
index 87986f2489..424121cc00 100644
--- a/lib/vhost/iotlb.c
+++ b/lib/vhost/iotlb.c
@@ -25,6 +25,15 @@ struct vhost_iotlb_entry {
#define IOTLB_CACHE_SIZE 2048
+static void
+vhost_user_iotlb_remove_notify(struct virtio_net *dev, struct vhost_iotlb_entry *entry)
+{
+ if (dev->backend_ops->iotlb_remove_notify == NULL)
+ return;
+
+ dev->backend_ops->iotlb_remove_notify(entry->uaddr, entry->uoffset, entry->size);
+}
+
static bool
vhost_user_iotlb_share_page(struct vhost_iotlb_entry *a, struct vhost_iotlb_entry *b)
{
@@ -198,6 +207,7 @@ vhost_user_iotlb_cache_remove_all(struct virtio_net *dev)
vhost_user_iotlb_clear_dump(node, NULL, NULL);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_remove_notify(dev, node);
vhost_user_iotlb_pool_put(dev, node);
}
@@ -223,6 +233,7 @@ vhost_user_iotlb_cache_random_evict(struct virtio_net *dev)
vhost_user_iotlb_clear_dump(node, prev_node, next_node);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_remove_notify(dev, node);
vhost_user_iotlb_pool_put(dev, node);
dev->iotlb_cache_nr--;
break;
@@ -314,6 +325,7 @@ vhost_user_iotlb_cache_remove(struct virtio_net *dev, uint64_t iova, uint64_t si
vhost_user_iotlb_clear_dump(node, prev_node, next_node);
TAILQ_REMOVE(&dev->iotlb_list, node, next);
+ vhost_user_iotlb_remove_notify(dev, node);
vhost_user_iotlb_pool_put(dev, node);
dev->iotlb_cache_nr--;
} else {
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 966b6dd67a..69df8b14c0 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,10 +89,13 @@
for (iter = val; iter < num; iter++)
#endif
+typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
+
/**
* Structure that contains backend-specific ops.
*/
struct vhost_backend_ops {
+ vhost_iotlb_remove_notify iotlb_remove_notify;
};
/**
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 11/26] vhost: add helper for IOTLB misses
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (9 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 12/26] vhost: add helper for interrupt injection Maxime Coquelin
` (14 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds a helper for sending IOTLB misses as VDUSE
will use an ioctl while Vhost-user use a dedicated
Vhost-user backend request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vhost.c | 13 ++++++++++++-
lib/vhost/vhost.h | 4 ++++
lib/vhost/vhost_user.c | 6 ++++--
lib/vhost/vhost_user.h | 1 -
4 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index d054772bf8..f77f30d674 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -56,6 +56,12 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = {
#define VHOST_NB_VQ_STATS RTE_DIM(vhost_vq_stat_strings)
+static int
+vhost_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+ return dev->backend_ops->iotlb_miss(dev, iova, perm);
+}
+
uint64_t
__vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
uint64_t iova, uint64_t *size, uint8_t perm)
@@ -90,7 +96,7 @@ __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
vhost_user_iotlb_rd_unlock(vq);
vhost_user_iotlb_pending_insert(dev, iova, perm);
- if (vhost_user_iotlb_miss(dev, iova, perm)) {
+ if (vhost_iotlb_miss(dev, iova, perm)) {
VHOST_LOG_DATA(dev->ifname, ERR,
"IOTLB miss req failed for IOVA 0x%" PRIx64 "\n",
iova);
@@ -690,6 +696,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
return -1;
}
+ if (ops->iotlb_miss == NULL) {
+ VHOST_LOG_CONFIG("device", ERR, "missing IOTLB miss backend op.\n");
+ return -1;
+ }
+
pthread_mutex_lock(&vhost_dev_lock);
for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
if (vhost_devices[i] == NULL)
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 69df8b14c0..25255aaea8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -89,13 +89,17 @@
for (iter = val; iter < num; iter++)
#endif
+struct virtio_net;
typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
+typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+
/**
* Structure that contains backend-specific ops.
*/
struct vhost_backend_ops {
vhost_iotlb_remove_notify iotlb_remove_notify;
+ vhost_iotlb_miss_cb iotlb_miss;
};
/**
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 2cd86686a5..30ad63aba0 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3305,7 +3305,7 @@ vhost_user_msg_handler(int vid, int fd)
return ret;
}
-int
+static int
vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
{
int ret;
@@ -3465,7 +3465,9 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
return ret;
}
-static struct vhost_backend_ops vhost_user_backend_ops;
+static struct vhost_backend_ops vhost_user_backend_ops = {
+ .iotlb_miss = vhost_user_iotlb_miss,
+};
int
vhost_user_new_device(void)
diff --git a/lib/vhost/vhost_user.h b/lib/vhost/vhost_user.h
index 61456049c8..1ffeca92f3 100644
--- a/lib/vhost/vhost_user.h
+++ b/lib/vhost/vhost_user.h
@@ -179,7 +179,6 @@ struct __rte_packed vhu_msg_context {
/* vhost_user.c */
int vhost_user_msg_handler(int vid, int fd);
-int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
/* socket.c */
int read_fd_message(char *ifname, int sockfd, char *buf, int buflen, int *fds, int max_fds,
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 12/26] vhost: add helper for interrupt injection
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (10 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 13/26] vhost: add API to set max queue pairs Maxime Coquelin
` (13 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
Vhost-user uses eventfd to inject IRQs, but VDUSE uses
an ioctl.
This patch prepares vhost_vring_call_split() and
vhost_vring_call_packed() to support VDUSE by introducing
a new helper.
It also adds a new counter for guest notification failures,
which could happen in case of uninitialized call file
descriptor for example.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vhost.c | 25 +++++++++++++------------
lib/vhost/vhost.h | 19 +++++++++----------
lib/vhost/vhost_user.c | 10 ++++++++++
3 files changed, 32 insertions(+), 22 deletions(-)
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index f77f30d674..eb6309b681 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -701,6 +701,11 @@ vhost_new_device(struct vhost_backend_ops *ops)
return -1;
}
+ if (ops->inject_irq == NULL) {
+ VHOST_LOG_CONFIG("device", ERR, "missing IRQ injection backend op.\n");
+ return -1;
+ }
+
pthread_mutex_lock(&vhost_dev_lock);
for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) {
if (vhost_devices[i] == NULL)
@@ -1511,20 +1516,16 @@ rte_vhost_notify_guest(int vid, uint16_t queue_id)
rte_rwlock_read_lock(&vq->access_lock);
- if (vq->callfd >= 0) {
- int ret = eventfd_write(vq->callfd, (eventfd_t)1);
-
- if (ret) {
- if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
- __atomic_fetch_add(&vq->stats.guest_notifications_error,
+ if (dev->backend_ops->inject_irq(dev, vq)) {
+ if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+ __atomic_fetch_add(&vq->stats.guest_notifications_error,
1, __ATOMIC_RELAXED);
- } else {
- if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
- __atomic_fetch_add(&vq->stats.guest_notifications,
+ } else {
+ if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
+ __atomic_fetch_add(&vq->stats.guest_notifications,
1, __ATOMIC_RELAXED);
- if (dev->notify_ops->guest_notified)
- dev->notify_ops->guest_notified(dev->vid);
- }
+ if (dev->notify_ops->guest_notified)
+ dev->notify_ops->guest_notified(dev->vid);
}
rte_rwlock_read_unlock(&vq->access_lock);
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 25255aaea8..ea2798b0bf 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -90,16 +90,20 @@
#endif
struct virtio_net;
+struct vhost_virtqueue;
+
typedef void (*vhost_iotlb_remove_notify)(uint64_t addr, uint64_t off, uint64_t size);
typedef int (*vhost_iotlb_miss_cb)(struct virtio_net *dev, uint64_t iova, uint8_t perm);
+typedef int (*vhost_vring_inject_irq_cb)(struct virtio_net *dev, struct vhost_virtqueue *vq);
/**
* Structure that contains backend-specific ops.
*/
struct vhost_backend_ops {
vhost_iotlb_remove_notify iotlb_remove_notify;
vhost_iotlb_miss_cb iotlb_miss;
+ vhost_vring_inject_irq_cb inject_irq;
};
/**
@@ -906,8 +910,6 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
static __rte_always_inline void
vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
{
- int ret;
-
if (dev->notify_ops->guest_notify &&
dev->notify_ops->guest_notify(dev->vid, vq->index)) {
if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
@@ -916,8 +918,7 @@ vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
return;
}
- ret = eventfd_write(vq->callfd, (eventfd_t) 1);
- if (ret) {
+ if (dev->backend_ops->inject_irq(dev, vq)) {
if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
__atomic_fetch_add(&vq->stats.guest_notifications_error,
1, __ATOMIC_RELAXED);
@@ -950,14 +951,12 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
"%s: used_event_idx=%d, old=%d, new=%d\n",
__func__, vhost_used_event(vq), old, new);
- if ((vhost_need_event(vhost_used_event(vq), new, old) ||
- unlikely(!signalled_used_valid)) &&
- vq->callfd >= 0)
+ if (vhost_need_event(vhost_used_event(vq), new, old) ||
+ unlikely(!signalled_used_valid))
vhost_vring_inject_irq(dev, vq);
} else {
/* Kick the guest if necessary. */
- if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) &&
- (vq->callfd >= 0))
+ if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
vhost_vring_inject_irq(dev, vq);
}
}
@@ -1009,7 +1008,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
if (vhost_need_event(off, new, old))
kick = true;
kick:
- if (kick && vq->callfd >= 0)
+ if (kick)
vhost_vring_inject_irq(dev, vq);
}
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 30ad63aba0..901a80bbaa 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -3465,8 +3465,18 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
return ret;
}
+static int
+vhost_user_inject_irq(struct virtio_net *dev __rte_unused, struct vhost_virtqueue *vq)
+{
+ if (vq->callfd < 0)
+ return -1;
+
+ return eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
static struct vhost_backend_ops vhost_user_backend_ops = {
.iotlb_miss = vhost_user_iotlb_miss,
+ .inject_irq = vhost_user_inject_irq,
};
int
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 13/26] vhost: add API to set max queue pairs
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (11 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 12/26] vhost: add helper for interrupt injection Maxime Coquelin
@ 2023-06-01 20:07 ` Maxime Coquelin
2023-06-05 7:56 ` Xia, Chenbo
2023-06-01 20:08 ` [PATCH v4 14/26] net/vhost: use " Maxime Coquelin
` (12 subsequent siblings)
25 siblings, 1 reply; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:07 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch introduces a new rte_vhost_driver_set_max_queues
API as preliminary work for multiqueue support with VDUSE.
Indeed, with VDUSE we need to pre-allocate the vrings at
device creation time, so we need such API not to allocate
the 128 queue pairs supported by the Vhost library.
Calling the API is optional, 128 queue pairs remaining the
default.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
doc/guides/prog_guide/vhost_lib.rst | 4 +++
doc/guides/rel_notes/release_23_07.rst | 5 ++++
lib/vhost/rte_vhost.h | 17 ++++++++++++
lib/vhost/socket.c | 36 ++++++++++++++++++++++++--
lib/vhost/version.map | 1 +
5 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 0f964d7a4a..0c2b4d020a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -339,6 +339,10 @@ The following is an overview of some key Vhost API functions:
Inject the offloaded interrupt received by the 'guest_notify' callback,
into the vhost device's queue.
+* ``rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)``
+
+ Set the maximum number of queue pairs supported by the device.
+
Vhost-user Implementations
--------------------------
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 3eed8ac561..7034fb664c 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -62,6 +62,11 @@ New Features
rte_vhost_notify_guest(), is added to raise the interrupt outside of the
fast path.
+* **Added Vhost API to set maximum queue pairs supported.**
+
+ Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
+ maximum number of supported queue pairs, required for VDUSE support.
+
Removed Items
-------------
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index 7a10bc36cf..7844c9f142 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -609,6 +609,23 @@ rte_vhost_driver_get_protocol_features(const char *path,
int
rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Set the maximum number of queue pairs supported by the device.
+ *
+ * @param path
+ * The vhost-user socket file path
+ * @param max_queue_pairs
+ * The maximum number of queue pairs
+ * @return
+ * 0 on success, -1 on failure
+ */
+__rte_experimental
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs);
+
/**
* Get the feature bits after negotiation
*
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 407d0011c3..29f7a8cece 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -57,6 +57,8 @@ struct vhost_user_socket {
uint64_t protocol_features;
+ uint32_t max_queue_pairs;
+
struct rte_vdpa_device *vdpa_dev;
struct rte_vhost_device_ops const *notify_ops;
@@ -823,7 +825,7 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
vdpa_dev = vsocket->vdpa_dev;
if (!vdpa_dev) {
- *queue_num = VHOST_MAX_QUEUE_PAIRS;
+ *queue_num = vsocket->max_queue_pairs;
goto unlock_exit;
}
@@ -833,7 +835,36 @@ rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num)
goto unlock_exit;
}
- *queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS, vdpa_queue_num);
+ *queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
+
+unlock_exit:
+ pthread_mutex_unlock(&vhost_user.mutex);
+ return ret;
+}
+
+int
+rte_vhost_driver_set_max_queue_num(const char *path, uint32_t max_queue_pairs)
+{
+ struct vhost_user_socket *vsocket;
+ int ret = 0;
+
+ VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n", max_queue_pairs);
+
+ if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
+ VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u queue pairs\n",
+ VHOST_MAX_QUEUE_PAIRS);
+ return -1;
+ }
+
+ pthread_mutex_lock(&vhost_user.mutex);
+ vsocket = find_vhost_user_socket(path);
+ if (!vsocket) {
+ VHOST_LOG_CONFIG(path, ERR, "socket file is not registered yet.\n");
+ ret = -1;
+ goto unlock_exit;
+ }
+
+ vsocket->max_queue_pairs = max_queue_pairs;
unlock_exit:
pthread_mutex_unlock(&vhost_user.mutex);
@@ -889,6 +920,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
goto out_free;
}
vsocket->vdpa_dev = NULL;
+ vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 7bcbfd12cf..5051c29022 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -107,6 +107,7 @@ EXPERIMENTAL {
# added in 23.07
rte_vhost_notify_guest;
+ rte_vhost_driver_set_max_queue_num;
};
INTERNAL {
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 14/26] net/vhost: use API to set max queue pairs
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (12 preceding siblings ...)
2023-06-01 20:07 ` [PATCH v4 13/26] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 15/26] vhost: add control virtqueue support Maxime Coquelin
` (11 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
In order to support multiqueue with VDUSE, we need to
be able to limit the maximum number of queue pairs, to
avoid unnecessary memory consumption since the maximum
number of queue pairs need to be allocated at device
creation time, as opposed to Vhost-user which allocate
only when the frontend initialize them.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
drivers/net/vhost/rte_eth_vhost.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 62ef955ebc..8d37ec9775 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -1013,6 +1013,9 @@ vhost_driver_setup(struct rte_eth_dev *eth_dev)
goto drv_unreg;
}
+ if (rte_vhost_driver_set_max_queue_num(internal->iface_name, internal->max_queues))
+ goto drv_unreg;
+
if (rte_vhost_driver_callback_register(internal->iface_name,
&vhost_ops) < 0) {
VHOST_LOG(ERR, "Can't register callbacks\n");
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 15/26] vhost: add control virtqueue support
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (13 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 14/26] net/vhost: use " Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
` (10 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
In order to support multi-queue with VDUSE, having
control queue support is required.
This patch adds control queue implementation, it will be
used later when adding VDUSE support. Only split ring
layout is supported for now, packed ring support will be
added later.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/meson.build | 1 +
lib/vhost/vhost.h | 2 +
lib/vhost/virtio_net_ctrl.c | 286 ++++++++++++++++++++++++++++++++++++
lib/vhost/virtio_net_ctrl.h | 10 ++
4 files changed, 299 insertions(+)
create mode 100644 lib/vhost/virtio_net_ctrl.c
create mode 100644 lib/vhost/virtio_net_ctrl.h
diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index 05679447db..d211a0bd37 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -27,6 +27,7 @@ sources = files(
'vhost_crypto.c',
'vhost_user.c',
'virtio_net.c',
+ 'virtio_net_ctrl.c',
)
headers = files(
'rte_vdpa.h',
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index ea2798b0bf..04267a3ac5 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -527,6 +527,8 @@ struct virtio_net {
int postcopy_ufd;
int postcopy_listening;
+ struct vhost_virtqueue *cvq;
+
struct rte_vdpa_device *vdpa_dev;
/* context data for the external message handlers */
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
new file mode 100644
index 0000000000..6b583a0ce6
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -0,0 +1,286 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "iotlb.h"
+#include "vhost.h"
+#include "virtio_net_ctrl.h"
+
+struct virtio_net_ctrl {
+ uint8_t class;
+ uint8_t command;
+ uint8_t command_data[];
+};
+
+struct virtio_net_ctrl_elem {
+ struct virtio_net_ctrl *ctrl_req;
+ uint16_t head_idx;
+ uint16_t n_descs;
+ uint8_t *desc_ack;
+};
+
+static int
+virtio_net_ctrl_pop(struct virtio_net *dev, struct vhost_virtqueue *cvq,
+ struct virtio_net_ctrl_elem *ctrl_elem)
+ __rte_shared_locks_required(&cvq->iotlb_lock)
+{
+ uint16_t avail_idx, desc_idx, n_descs = 0;
+ uint64_t desc_len, desc_addr, desc_iova, data_len = 0;
+ uint8_t *ctrl_req;
+ struct vring_desc *descs;
+
+ avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+ if (avail_idx == cvq->last_avail_idx) {
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
+ return 0;
+ }
+
+ desc_idx = cvq->avail->ring[cvq->last_avail_idx];
+ if (desc_idx >= cvq->size) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Out of range desc index, dropping\n");
+ goto err;
+ }
+
+ ctrl_elem->head_idx = desc_idx;
+
+ if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+ desc_len = cvq->desc[desc_idx].len;
+ desc_iova = cvq->desc[desc_idx].addr;
+
+ descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+ desc_iova, &desc_len, VHOST_ACCESS_RO);
+ if (!descs || desc_len != cvq->desc[desc_idx].len) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+ goto err;
+ }
+
+ desc_idx = 0;
+ } else {
+ descs = cvq->desc;
+ }
+
+ while (1) {
+ desc_len = descs[desc_idx].len;
+ desc_iova = descs[desc_idx].addr;
+
+ n_descs++;
+
+ if (descs[desc_idx].flags & VRING_DESC_F_WRITE) {
+ if (ctrl_elem->desc_ack) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Unexpected ctrl chain layout\n");
+ goto err;
+ }
+
+ if (desc_len != sizeof(uint8_t)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Invalid ack size for ctrl req, dropping\n");
+ goto err;
+ }
+
+ ctrl_elem->desc_ack = (uint8_t *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+ desc_iova, &desc_len, VHOST_ACCESS_WO);
+ if (!ctrl_elem->desc_ack || desc_len != sizeof(uint8_t)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Failed to map ctrl ack descriptor\n");
+ goto err;
+ }
+ } else {
+ if (ctrl_elem->desc_ack) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Unexpected ctrl chain layout\n");
+ goto err;
+ }
+
+ data_len += desc_len;
+ }
+
+ if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+ break;
+
+ desc_idx = descs[desc_idx].next;
+ }
+
+ desc_idx = ctrl_elem->head_idx;
+
+ if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT)
+ ctrl_elem->n_descs = 1;
+ else
+ ctrl_elem->n_descs = n_descs;
+
+ if (!ctrl_elem->desc_ack) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Missing ctrl ack descriptor\n");
+ goto err;
+ }
+
+ if (data_len < sizeof(ctrl_elem->ctrl_req->class) + sizeof(ctrl_elem->ctrl_req->command)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Invalid control header size\n");
+ goto err;
+ }
+
+ ctrl_elem->ctrl_req = malloc(data_len);
+ if (!ctrl_elem->ctrl_req) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to alloc ctrl request\n");
+ goto err;
+ }
+
+ ctrl_req = (uint8_t *)ctrl_elem->ctrl_req;
+
+ if (cvq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
+ desc_len = cvq->desc[desc_idx].len;
+ desc_iova = cvq->desc[desc_idx].addr;
+
+ descs = (struct vring_desc *)(uintptr_t)vhost_iova_to_vva(dev, cvq,
+ desc_iova, &desc_len, VHOST_ACCESS_RO);
+ if (!descs || desc_len != cvq->desc[desc_idx].len) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl indirect descs\n");
+ goto free_err;
+ }
+
+ desc_idx = 0;
+ } else {
+ descs = cvq->desc;
+ }
+
+ while (!(descs[desc_idx].flags & VRING_DESC_F_WRITE)) {
+ desc_len = descs[desc_idx].len;
+ desc_iova = descs[desc_idx].addr;
+
+ desc_addr = vhost_iova_to_vva(dev, cvq, desc_iova, &desc_len, VHOST_ACCESS_RO);
+ if (!desc_addr || desc_len < descs[desc_idx].len) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to map ctrl descriptor\n");
+ goto free_err;
+ }
+
+ memcpy(ctrl_req, (void *)(uintptr_t)desc_addr, desc_len);
+ ctrl_req += desc_len;
+
+ if (!(descs[desc_idx].flags & VRING_DESC_F_NEXT))
+ break;
+
+ desc_idx = descs[desc_idx].next;
+ }
+
+ cvq->last_avail_idx++;
+ if (cvq->last_avail_idx >= cvq->size)
+ cvq->last_avail_idx -= cvq->size;
+
+ if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+ vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+ return 1;
+
+free_err:
+ free(ctrl_elem->ctrl_req);
+err:
+ cvq->last_avail_idx++;
+ if (cvq->last_avail_idx >= cvq->size)
+ cvq->last_avail_idx -= cvq->size;
+
+ if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+ vhost_avail_event(cvq) = cvq->last_avail_idx;
+
+ return -1;
+}
+
+static uint8_t
+virtio_net_ctrl_handle_req(struct virtio_net *dev, struct virtio_net_ctrl *ctrl_req)
+{
+ uint8_t ret = VIRTIO_NET_ERR;
+
+ if (ctrl_req->class == VIRTIO_NET_CTRL_MQ &&
+ ctrl_req->command == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
+ uint16_t queue_pairs;
+ uint32_t i;
+
+ queue_pairs = *(uint16_t *)(uintptr_t)ctrl_req->command_data;
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl req: MQ %u queue pairs\n", queue_pairs);
+ ret = VIRTIO_NET_OK;
+
+ for (i = 0; i < dev->nr_vring; i++) {
+ struct vhost_virtqueue *vq = dev->virtqueue[i];
+ bool enable;
+
+ if (vq == dev->cvq)
+ continue;
+
+ if (i < queue_pairs * 2)
+ enable = true;
+ else
+ enable = false;
+
+ vq->enabled = enable;
+ if (dev->notify_ops->vring_state_changed)
+ dev->notify_ops->vring_state_changed(dev->vid, i, enable);
+ }
+ }
+
+ return ret;
+}
+
+static int
+virtio_net_ctrl_push(struct virtio_net *dev, struct virtio_net_ctrl_elem *ctrl_elem)
+{
+ struct vhost_virtqueue *cvq = dev->cvq;
+ struct vring_used_elem *used_elem;
+
+ used_elem = &cvq->used->ring[cvq->last_used_idx];
+ used_elem->id = ctrl_elem->head_idx;
+ used_elem->len = ctrl_elem->n_descs;
+
+ cvq->last_used_idx++;
+ if (cvq->last_used_idx >= cvq->size)
+ cvq->last_used_idx -= cvq->size;
+
+ __atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+
+ vhost_vring_call_split(dev, dev->cvq);
+
+ free(ctrl_elem->ctrl_req);
+
+ return 0;
+}
+
+int
+virtio_net_ctrl_handle(struct virtio_net *dev)
+{
+ int ret = 0;
+
+ if (dev->features & (1ULL << VIRTIO_F_RING_PACKED)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Packed ring not supported yet\n");
+ return -1;
+ }
+
+ if (!dev->cvq) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "missing control queue\n");
+ return -1;
+ }
+
+ rte_rwlock_read_lock(&dev->cvq->access_lock);
+ vhost_user_iotlb_rd_lock(dev->cvq);
+
+ while (1) {
+ struct virtio_net_ctrl_elem ctrl_elem;
+
+ memset(&ctrl_elem, 0, sizeof(struct virtio_net_ctrl_elem));
+
+ ret = virtio_net_ctrl_pop(dev, dev->cvq, &ctrl_elem);
+ if (ret <= 0)
+ break;
+
+ *ctrl_elem.desc_ack = virtio_net_ctrl_handle_req(dev, ctrl_elem.ctrl_req);
+
+ ret = virtio_net_ctrl_push(dev, &ctrl_elem);
+ if (ret < 0)
+ break;
+ }
+
+ vhost_user_iotlb_rd_unlock(dev->cvq);
+ rte_rwlock_read_unlock(&dev->cvq->access_lock);
+
+ return ret;
+}
diff --git a/lib/vhost/virtio_net_ctrl.h b/lib/vhost/virtio_net_ctrl.h
new file mode 100644
index 0000000000..9a90f4b9da
--- /dev/null
+++ b/lib/vhost/virtio_net_ctrl.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VIRTIO_NET_CTRL_H
+#define _VIRTIO_NET_CTRL_H
+
+int virtio_net_ctrl_handle(struct virtio_net *dev);
+
+#endif
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 16/26] vhost: add VDUSE device creation and destruction
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (14 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 15/26] vhost: add control virtqueue support Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
` (9 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds initial support for VDUSE, which includes
the device creation and destruction.
It does not include the virtqueues configuration, so this is
not functionnal at this point.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/meson.build | 4 +
lib/vhost/socket.c | 34 ++++---
lib/vhost/vduse.c | 201 ++++++++++++++++++++++++++++++++++++++++++
lib/vhost/vduse.h | 33 +++++++
lib/vhost/vhost.h | 2 +
5 files changed, 262 insertions(+), 12 deletions(-)
create mode 100644 lib/vhost/vduse.c
create mode 100644 lib/vhost/vduse.h
diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build
index d211a0bd37..13e0382c92 100644
--- a/lib/vhost/meson.build
+++ b/lib/vhost/meson.build
@@ -29,6 +29,10 @@ sources = files(
'virtio_net.c',
'virtio_net_ctrl.c',
)
+if cc.has_header('linux/vduse.h')
+ sources += files('vduse.c')
+ cflags += '-DVHOST_HAS_VDUSE'
+endif
headers = files(
'rte_vdpa.h',
'rte_vhost.h',
diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
index 29f7a8cece..19a7469e45 100644
--- a/lib/vhost/socket.c
+++ b/lib/vhost/socket.c
@@ -19,6 +19,7 @@
#include <rte_log.h>
#include "fd_man.h"
+#include "vduse.h"
#include "vhost.h"
#include "vhost_user.h"
@@ -36,6 +37,7 @@ struct vhost_user_socket {
int socket_fd;
struct sockaddr_un un;
bool is_server;
+ bool is_vduse;
bool reconnect;
bool iommu_support;
bool use_builtin_virtio_net;
@@ -991,18 +993,21 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
#endif
}
- if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
- vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
- if (vsocket->reconnect && reconn_tid == 0) {
- if (vhost_user_reconnect_init() != 0)
- goto out_mutex;
- }
+ if (!strncmp("/dev/vduse/", path, strlen("/dev/vduse/"))) {
+ vsocket->is_vduse = true;
} else {
- vsocket->is_server = true;
- }
- ret = create_unix_socket(vsocket);
- if (ret < 0) {
- goto out_mutex;
+ if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
+ vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
+ if (vsocket->reconnect && reconn_tid == 0) {
+ if (vhost_user_reconnect_init() != 0)
+ goto out_mutex;
+ }
+ } else {
+ vsocket->is_server = true;
+ }
+ ret = create_unix_socket(vsocket);
+ if (ret < 0)
+ goto out_mutex;
}
vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -1067,7 +1072,9 @@ rte_vhost_driver_unregister(const char *path)
if (strcmp(vsocket->path, path))
continue;
- if (vsocket->is_server) {
+ if (vsocket->is_vduse) {
+ vduse_device_destroy(path);
+ } else if (vsocket->is_server) {
/*
* If r/wcb is executing, release vhost_user's
* mutex lock, and try again since the r/wcb
@@ -1218,6 +1225,9 @@ rte_vhost_driver_start(const char *path)
if (!vsocket)
return -1;
+ if (vsocket->is_vduse)
+ return vduse_device_create(path);
+
if (fdset_tid == 0) {
/**
* create a pipe which will be waited by poll and notified to
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
new file mode 100644
index 0000000000..d67818bfb5
--- /dev/null
+++ b/lib/vhost/vduse.c
@@ -0,0 +1,201 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+
+#include <linux/vduse.h>
+#include <linux/virtio_net.h>
+
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <rte_common.h>
+
+#include "vduse.h"
+#include "vhost.h"
+
+#define VHOST_VDUSE_API_VERSION 0
+#define VDUSE_CTRL_PATH "/dev/vduse/control"
+
+#define VDUSE_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
+ (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+ (1ULL << VIRTIO_F_VERSION_1) | \
+ (1ULL << VIRTIO_NET_F_GSO) | \
+ (1ULL << VIRTIO_NET_F_HOST_TSO4) | \
+ (1ULL << VIRTIO_NET_F_HOST_TSO6) | \
+ (1ULL << VIRTIO_NET_F_HOST_UFO) | \
+ (1ULL << VIRTIO_NET_F_HOST_ECN) | \
+ (1ULL << VIRTIO_NET_F_CSUM) | \
+ (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
+ (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
+ (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
+ (1ULL << VIRTIO_NET_F_GUEST_UFO) | \
+ (1ULL << VIRTIO_NET_F_GUEST_ECN) | \
+ (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
+ (1ULL << VIRTIO_F_IN_ORDER) | \
+ (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+static struct vhost_backend_ops vduse_backend_ops = {
+};
+
+int
+vduse_device_create(const char *path)
+{
+ int control_fd, dev_fd, vid, ret;
+ uint32_t i;
+ struct virtio_net *dev;
+ uint64_t ver = VHOST_VDUSE_API_VERSION;
+ struct vduse_dev_config *dev_config = NULL;
+ const char *name = path + strlen("/dev/vduse/");
+
+ control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
+ if (control_fd < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
+ VDUSE_CTRL_PATH, strerror(errno));
+ return -1;
+ }
+
+ if (ioctl(control_fd, VDUSE_SET_API_VERSION, &ver)) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to set API version: %" PRIu64 ": %s\n",
+ ver, strerror(errno));
+ ret = -1;
+ goto out_ctrl_close;
+ }
+
+ dev_config = malloc(offsetof(struct vduse_dev_config, config));
+ if (!dev_config) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
+ ret = -1;
+ goto out_ctrl_close;
+ }
+
+ memset(dev_config, 0, sizeof(struct vduse_dev_config));
+
+ strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
+ dev_config->device_id = VIRTIO_ID_NET;
+ dev_config->vendor_id = 0;
+ dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
+ dev_config->vq_num = 2;
+ dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
+ dev_config->config_size = 0;
+
+ ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to create VDUSE device: %s\n",
+ strerror(errno));
+ goto out_free;
+ }
+
+ dev_fd = open(path, O_RDWR);
+ if (dev_fd < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to open device %s: %s\n",
+ path, strerror(errno));
+ ret = -1;
+ goto out_dev_close;
+ }
+
+ ret = fcntl(dev_fd, F_SETFL, O_NONBLOCK);
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to set chardev as non-blocking: %s\n",
+ strerror(errno));
+ goto out_dev_close;
+ }
+
+ vid = vhost_new_device(&vduse_backend_ops);
+ if (vid < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to create new Vhost device\n");
+ ret = -1;
+ goto out_dev_close;
+ }
+
+ dev = get_device(vid);
+ if (!dev) {
+ ret = -1;
+ goto out_dev_close;
+ }
+
+ strncpy(dev->ifname, path, IF_NAME_SZ - 1);
+ dev->vduse_ctrl_fd = control_fd;
+ dev->vduse_dev_fd = dev_fd;
+ vhost_setup_virtio_net(dev->vid, true, true, true, true);
+
+ for (i = 0; i < 2; i++) {
+ struct vduse_vq_config vq_cfg = { 0 };
+
+ ret = alloc_vring_queue(dev, i);
+ if (ret) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to alloc vring %d metadata\n", i);
+ goto out_dev_destroy;
+ }
+
+ vq_cfg.index = i;
+ vq_cfg.max_size = 1024;
+
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP, &vq_cfg);
+ if (ret) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to set-up VQ %d\n", i);
+ goto out_dev_destroy;
+ }
+ }
+
+ free(dev_config);
+
+ return 0;
+
+out_dev_destroy:
+ vhost_destroy_device(vid);
+out_dev_close:
+ if (dev_fd >= 0)
+ close(dev_fd);
+ ioctl(control_fd, VDUSE_DESTROY_DEV, name);
+out_free:
+ free(dev_config);
+out_ctrl_close:
+ close(control_fd);
+
+ return ret;
+}
+
+int
+vduse_device_destroy(const char *path)
+{
+ const char *name = path + strlen("/dev/vduse/");
+ struct virtio_net *dev;
+ int vid, ret;
+
+ for (vid = 0; vid < RTE_MAX_VHOST_DEVICE; vid++) {
+ dev = vhost_devices[vid];
+
+ if (dev == NULL)
+ continue;
+
+ if (!strcmp(path, dev->ifname))
+ break;
+ }
+
+ if (vid == RTE_MAX_VHOST_DEVICE)
+ return -1;
+
+ if (dev->vduse_dev_fd >= 0) {
+ close(dev->vduse_dev_fd);
+ dev->vduse_dev_fd = -1;
+ }
+
+ if (dev->vduse_ctrl_fd >= 0) {
+ ret = ioctl(dev->vduse_ctrl_fd, VDUSE_DESTROY_DEV, name);
+ if (ret)
+ VHOST_LOG_CONFIG(name, ERR, "Failed to destroy VDUSE device: %s\n",
+ strerror(errno));
+ close(dev->vduse_ctrl_fd);
+ dev->vduse_ctrl_fd = -1;
+ }
+
+ vhost_destroy_device(vid);
+
+ return 0;
+}
diff --git a/lib/vhost/vduse.h b/lib/vhost/vduse.h
new file mode 100644
index 0000000000..a15e5d4c16
--- /dev/null
+++ b/lib/vhost/vduse.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Red Hat, Inc.
+ */
+
+#ifndef _VDUSE_H
+#define _VDUSE_H
+
+#include "vhost.h"
+
+#ifdef VHOST_HAS_VDUSE
+
+int vduse_device_create(const char *path);
+int vduse_device_destroy(const char *path);
+
+#else
+
+static inline int
+vduse_device_create(const char *path)
+{
+ VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+ return -1;
+}
+
+static inline int
+vduse_device_destroy(const char *path)
+{
+ VHOST_LOG_CONFIG(path, ERR, "VDUSE support disabled at build time\n");
+ return -1;
+}
+
+#endif /* VHOST_HAS_VDUSE */
+
+#endif /* _VDUSE_H */
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 04267a3ac5..9ecede0f30 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -526,6 +526,8 @@ struct virtio_net {
int postcopy_ufd;
int postcopy_listening;
+ int vduse_ctrl_fd;
+ int vduse_dev_fd;
struct vhost_virtqueue *cvq;
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 17/26] vhost: add VDUSE callback for IOTLB miss
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (15 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
` (8 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch implements the VDUSE callback for IOTLB misses,
which is done by using the VDUSE VDUSE_IOTLB_GET_FD ioctl.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d67818bfb5..f72c7bf6ab 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -13,9 +13,11 @@
#include <sys/ioctl.h>
#include <sys/mman.h>
+#include <sys/stat.h>
#include <rte_common.h>
+#include "iotlb.h"
#include "vduse.h"
#include "vhost.h"
@@ -40,7 +42,63 @@
(1ULL << VIRTIO_F_IN_ORDER) | \
(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+static int
+vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
+{
+ struct vduse_iotlb_entry entry;
+ uint64_t size, page_size;
+ struct stat stat;
+ void *mmap_addr;
+ int fd, ret;
+
+ entry.start = iova;
+ entry.last = iova + 1;
+
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_IOTLB_GET_FD, &entry);
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get IOTLB entry for 0x%" PRIx64 "\n",
+ iova);
+ return -1;
+ }
+
+ fd = ret;
+
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "New IOTLB entry:\n");
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tIOVA: %" PRIx64 " - %" PRIx64 "\n",
+ (uint64_t)entry.start, (uint64_t)entry.last);
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\toffset: %" PRIx64 "\n", (uint64_t)entry.offset);
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tfd: %d\n", fd);
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "\tperm: %x\n", entry.perm);
+
+ size = entry.last - entry.start + 1;
+ mmap_addr = mmap(0, size + entry.offset, entry.perm, MAP_SHARED, fd, 0);
+ if (!mmap_addr) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Failed to mmap IOTLB entry for 0x%" PRIx64 "\n", iova);
+ ret = -1;
+ goto close_fd;
+ }
+
+ ret = fstat(fd, &stat);
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get page size.\n");
+ munmap(mmap_addr, entry.offset + size);
+ goto close_fd;
+ }
+ page_size = (uint64_t)stat.st_blksize;
+
+ vhost_user_iotlb_cache_insert(dev, entry.start, (uint64_t)(uintptr_t)mmap_addr,
+ entry.offset, size, page_size, entry.perm);
+
+ ret = 0;
+close_fd:
+ close(fd);
+
+ return ret;
+}
+
static struct vhost_backend_ops vduse_backend_ops = {
+ .iotlb_miss = vduse_iotlb_miss,
};
int
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 18/26] vhost: add VDUSE callback for IOTLB entry removal
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (16 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
` (7 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch implements the VDUSE callback for IOTLB entry
removal, where it unmaps the pages from the invalidated
IOTLB entry.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index f72c7bf6ab..58c1b384a8 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
(1ULL << VIRTIO_F_IN_ORDER) | \
(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+static void
+vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
+{
+ munmap((void *)(uintptr_t)addr, offset + size);
+}
+
static int
vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unused)
{
@@ -99,6 +105,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
static struct vhost_backend_ops vduse_backend_ops = {
.iotlb_miss = vduse_iotlb_miss,
+ .iotlb_remove_notify = vduse_iotlb_remove_notify,
};
int
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 19/26] vhost: add VDUSE callback for IRQ injection
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (17 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 20/26] vhost: add VDUSE events handler Maxime Coquelin
` (6 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch implements the VDUSE callback for kicking
virtqueues.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 58c1b384a8..d39e39b9dc 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -42,6 +42,12 @@
(1ULL << VIRTIO_F_IN_ORDER) | \
(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+static int
+vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+ return ioctl(dev->vduse_dev_fd, VDUSE_VQ_INJECT_IRQ, &vq->index);
+}
+
static void
vduse_iotlb_remove_notify(uint64_t addr, uint64_t offset, uint64_t size)
{
@@ -106,6 +112,7 @@ vduse_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm __rte_unuse
static struct vhost_backend_ops vduse_backend_ops = {
.iotlb_miss = vduse_iotlb_miss,
.iotlb_remove_notify = vduse_iotlb_remove_notify,
+ .inject_irq = vduse_inject_irq,
};
int
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 20/26] vhost: add VDUSE events handler
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (18 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
` (5 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch makes use of Vhost lib's FD manager to install
a handler for VDUSE events occurring on the VDUSE device FD.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index d39e39b9dc..92c515cff2 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -17,6 +17,7 @@
#include <rte_common.h>
+#include "fd_man.h"
#include "iotlb.h"
#include "vduse.h"
#include "vhost.h"
@@ -42,6 +43,31 @@
(1ULL << VIRTIO_F_IN_ORDER) | \
(1ULL << VIRTIO_F_IOMMU_PLATFORM))
+struct vduse {
+ struct fdset fdset;
+};
+
+static struct vduse vduse = {
+ .fdset = {
+ .fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+ .fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+ .fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+ .num = 0
+ },
+};
+
+static bool vduse_events_thread;
+
+static const char * const vduse_reqs_str[] = {
+ "VDUSE_GET_VQ_STATE",
+ "VDUSE_SET_STATUS",
+ "VDUSE_UPDATE_IOTLB",
+};
+
+#define vduse_req_id_to_str(id) \
+ (id < RTE_DIM(vduse_reqs_str) ? \
+ vduse_reqs_str[id] : "Unknown")
+
static int
vduse_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq)
{
@@ -115,16 +141,80 @@ static struct vhost_backend_ops vduse_backend_ops = {
.inject_irq = vduse_inject_irq,
};
+static void
+vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
+{
+ struct virtio_net *dev = arg;
+ struct vduse_dev_request req;
+ struct vduse_dev_response resp;
+ int ret;
+
+ memset(&resp, 0, sizeof(resp));
+
+ ret = read(fd, &req, sizeof(req));
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read request: %s\n",
+ strerror(errno));
+ return;
+ } else if (ret < (int)sizeof(req)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Incomplete to read request %d\n", ret);
+ return;
+ }
+
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "New request: %s (%u)\n",
+ vduse_req_id_to_str(req.type), req.type);
+
+ switch (req.type) {
+ default:
+ resp.result = VDUSE_REQ_RESULT_FAILED;
+ break;
+ }
+
+ resp.request_id = req.request_id;
+
+ ret = write(dev->vduse_dev_fd, &resp, sizeof(resp));
+ if (ret != sizeof(resp)) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to write response %s\n",
+ strerror(errno));
+ return;
+ }
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Request %s (%u) handled successfully\n",
+ vduse_req_id_to_str(req.type), req.type);
+}
+
int
vduse_device_create(const char *path)
{
int control_fd, dev_fd, vid, ret;
+ pthread_t fdset_tid;
uint32_t i;
struct virtio_net *dev;
uint64_t ver = VHOST_VDUSE_API_VERSION;
struct vduse_dev_config *dev_config = NULL;
const char *name = path + strlen("/dev/vduse/");
+ /* If first device, create events dispatcher thread */
+ if (vduse_events_thread == false) {
+ /**
+ * create a pipe which will be waited by poll and notified to
+ * rebuild the wait list of poll.
+ */
+ if (fdset_pipe_init(&vduse.fdset) < 0) {
+ VHOST_LOG_CONFIG(path, ERR, "failed to create pipe for vduse fdset\n");
+ return -1;
+ }
+
+ ret = rte_ctrl_thread_create(&fdset_tid, "vduse-events", NULL,
+ fdset_event_dispatch, &vduse.fdset);
+ if (ret != 0) {
+ VHOST_LOG_CONFIG(path, ERR, "failed to create vduse fdset handling thread\n");
+ fdset_pipe_uninit(&vduse.fdset);
+ return -1;
+ }
+
+ vduse_events_thread = true;
+ }
+
control_fd = open(VDUSE_CTRL_PATH, O_RDWR);
if (control_fd < 0) {
VHOST_LOG_CONFIG(name, ERR, "Failed to open %s: %s\n",
@@ -215,6 +305,14 @@ vduse_device_create(const char *path)
}
}
+ ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
+ if (ret) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
+ dev->vduse_dev_fd);
+ goto out_dev_destroy;
+ }
+ fdset_pipe_notify(&vduse.fdset);
+
free(dev_config);
return 0;
@@ -253,6 +351,9 @@ vduse_device_destroy(const char *path)
if (vid == RTE_MAX_VHOST_DEVICE)
return -1;
+ fdset_del(&vduse.fdset, dev->vduse_dev_fd);
+ fdset_pipe_notify(&vduse.fdset);
+
if (dev->vduse_dev_fd >= 0) {
close(dev->vduse_dev_fd);
dev->vduse_dev_fd = -1;
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 21/26] vhost: add support for virtqueue state get event
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (19 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 20/26] vhost: add VDUSE events handler Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
` (4 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds support for VDUSE_GET_VQ_STATE event
handling, which consists in providing the backend last
available index for the specified virtqueue.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 92c515cff2..7e36c50b6c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -147,6 +147,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
struct virtio_net *dev = arg;
struct vduse_dev_request req;
struct vduse_dev_response resp;
+ struct vhost_virtqueue *vq;
int ret;
memset(&resp, 0, sizeof(resp));
@@ -165,6 +166,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
vduse_req_id_to_str(req.type), req.type);
switch (req.type) {
+ case VDUSE_GET_VQ_STATE:
+ vq = dev->virtqueue[req.vq_state.index];
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tvq index: %u, avail_index: %u\n",
+ req.vq_state.index, vq->last_avail_idx);
+ resp.vq_state.split.avail_index = vq->last_avail_idx;
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
default:
resp.result = VDUSE_REQ_RESULT_FAILED;
break;
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 22/26] vhost: add support for VDUSE status set event
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (20 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
` (3 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds support for VDUSE_SET_STATUS event
handling, which consists in updating the Virtio device
status set by the Virtio driver.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 7e36c50b6c..3bf65d4b8b 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -173,6 +173,12 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
resp.vq_state.split.avail_index = vq->last_avail_idx;
resp.result = VDUSE_REQ_RESULT_OK;
break;
+ case VDUSE_SET_STATUS:
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
+ req.s.status);
+ dev->status = req.s.status;
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
default:
resp.result = VDUSE_REQ_RESULT_FAILED;
break;
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 23/26] vhost: add support for VDUSE IOTLB update event
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (21 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 24/26] vhost: add VDUSE device startup Maxime Coquelin
` (2 subsequent siblings)
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds support for VDUSE_UPDATE_IOTLB event
handling, which consists in invaliding IOTLB entries for
the range specified in the request.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 3bf65d4b8b..110654ec68 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -179,6 +179,13 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
dev->status = req.s.status;
resp.result = VDUSE_REQ_RESULT_OK;
break;
+ case VDUSE_UPDATE_IOTLB:
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tIOVA range: %" PRIx64 " - %" PRIx64 "\n",
+ (uint64_t)req.iova.start, (uint64_t)req.iova.last);
+ vhost_user_iotlb_cache_remove(dev, req.iova.start,
+ req.iova.last - req.iova.start + 1);
+ resp.result = VDUSE_REQ_RESULT_OK;
+ break;
default:
resp.result = VDUSE_REQ_RESULT_FAILED;
break;
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 24/26] vhost: add VDUSE device startup
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (22 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 26/26] vhost: add VDUSE device stop Maxime Coquelin
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds the device and its virtqueues
initialization once the Virtio driver has set the DRIVER_OK
in the Virtio status register.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 126 insertions(+)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 110654ec68..a10dc24d38 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -141,6 +141,128 @@ static struct vhost_backend_ops vduse_backend_ops = {
.inject_irq = vduse_inject_irq,
};
+static void
+vduse_vring_setup(struct virtio_net *dev, unsigned int index)
+{
+ struct vhost_virtqueue *vq = dev->virtqueue[index];
+ struct vhost_vring_addr *ra = &vq->ring_addrs;
+ struct vduse_vq_info vq_info;
+ struct vduse_vq_eventfd vq_efd;
+ int ret;
+
+ vq_info.index = index;
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, &vq_info);
+ if (ret) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get VQ %u info: %s\n",
+ index, strerror(errno));
+ return;
+ }
+
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "VQ %u info:\n", index);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnum: %u\n", vq_info.num);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdesc_addr: %llx\n", vq_info.desc_addr);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdriver_addr: %llx\n", vq_info.driver_addr);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tdevice_addr: %llx\n", vq_info.device_addr);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tavail_idx: %u\n", vq_info.split.avail_index);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tready: %u\n", vq_info.ready);
+
+ vq->last_avail_idx = vq_info.split.avail_index;
+ vq->size = vq_info.num;
+ vq->ready = true;
+ vq->enabled = vq_info.ready;
+ ra->desc_user_addr = vq_info.desc_addr;
+ ra->avail_user_addr = vq_info.driver_addr;
+ ra->used_user_addr = vq_info.device_addr;
+
+ vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+ if (vq->kickfd < 0) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to init kickfd for VQ %u: %s\n",
+ index, strerror(errno));
+ vq->kickfd = VIRTIO_INVALID_EVENTFD;
+ return;
+ }
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "\tkick fd: %d\n", vq->kickfd);
+
+ vq->shadow_used_split = rte_malloc_socket(NULL,
+ vq->size * sizeof(struct vring_used_elem),
+ RTE_CACHE_LINE_SIZE, 0);
+ vq->batch_copy_elems = rte_malloc_socket(NULL,
+ vq->size * sizeof(struct batch_copy_elem),
+ RTE_CACHE_LINE_SIZE, 0);
+
+ vhost_user_iotlb_rd_lock(vq);
+ if (vring_translate(dev, vq))
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to translate vring %d addresses\n",
+ index);
+
+ if (vhost_enable_guest_notification(dev, vq, 0))
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Failed to disable guest notifications on vring %d\n",
+ index);
+ vhost_user_iotlb_rd_unlock(vq);
+
+ vq_efd.index = index;
+ vq_efd.fd = vq->kickfd;
+
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+ if (ret) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to setup kickfd for VQ %u: %s\n",
+ index, strerror(errno));
+ close(vq->kickfd);
+ vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ return;
+ }
+}
+
+static void
+vduse_device_start(struct virtio_net *dev)
+{
+ unsigned int i, ret;
+
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Starting device...\n");
+
+ dev->notify_ops = vhost_driver_callback_get(dev->ifname);
+ if (!dev->notify_ops) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Failed to get callback ops for driver\n");
+ return;
+ }
+
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_DEV_GET_FEATURES, &dev->features);
+ if (ret) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to get features: %s\n",
+ strerror(errno));
+ return;
+ }
+
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Negotiated Virtio features: 0x%" PRIx64 "\n",
+ dev->features);
+
+ if (dev->features &
+ ((1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+ (1ULL << VIRTIO_F_VERSION_1) |
+ (1ULL << VIRTIO_F_RING_PACKED))) {
+ dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+ } else {
+ dev->vhost_hlen = sizeof(struct virtio_net_hdr);
+ }
+
+ for (i = 0; i < dev->nr_vring; i++)
+ vduse_vring_setup(dev, i);
+
+ dev->flags |= VIRTIO_DEV_READY;
+
+ if (dev->notify_ops->new_device(dev->vid) == 0)
+ dev->flags |= VIRTIO_DEV_RUNNING;
+
+ for (i = 0; i < dev->nr_vring; i++) {
+ struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+ if (dev->notify_ops->vring_state_changed)
+ dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
+ }
+}
+
static void
vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
{
@@ -177,6 +299,10 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
req.s.status);
dev->status = req.s.status;
+
+ if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+ vduse_device_start(dev);
+
resp.result = VDUSE_REQ_RESULT_OK;
break;
case VDUSE_UPDATE_IOTLB:
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 25/26] vhost: add multiqueue support to VDUSE
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (23 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 24/26] vhost: add VDUSE device startup Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 26/26] vhost: add VDUSE device stop Maxime Coquelin
25 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch enables control queue support in order to
support multiqueue.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
lib/vhost/vduse.c | 83 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 76 insertions(+), 7 deletions(-)
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index a10dc24d38..699cfed9e3 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -21,6 +21,7 @@
#include "iotlb.h"
#include "vduse.h"
#include "vhost.h"
+#include "virtio_net_ctrl.h"
#define VHOST_VDUSE_API_VERSION 0
#define VDUSE_CTRL_PATH "/dev/vduse/control"
@@ -41,7 +42,9 @@
(1ULL << VIRTIO_NET_F_GUEST_ECN) | \
(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
(1ULL << VIRTIO_F_IN_ORDER) | \
- (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+ (1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
+ (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+ (1ULL << VIRTIO_NET_F_MQ))
struct vduse {
struct fdset fdset;
@@ -141,6 +144,25 @@ static struct vhost_backend_ops vduse_backend_ops = {
.inject_irq = vduse_inject_irq,
};
+static void
+vduse_control_queue_event(int fd, void *arg, int *remove __rte_unused)
+{
+ struct virtio_net *dev = arg;
+ uint64_t buf;
+ int ret;
+
+ ret = read(fd, &buf, sizeof(buf));
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to read control queue event: %s\n",
+ strerror(errno));
+ return;
+ }
+
+ VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue kicked\n");
+ if (virtio_net_ctrl_handle(dev))
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to handle ctrl request\n");
+}
+
static void
vduse_vring_setup(struct virtio_net *dev, unsigned int index)
{
@@ -212,6 +234,22 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
return;
}
+
+ if (vq == dev->cvq) {
+ ret = fdset_add(&vduse.fdset, vq->kickfd, vduse_control_queue_event, NULL, dev);
+ if (ret) {
+ VHOST_LOG_CONFIG(dev->ifname, ERR,
+ "Failed to setup kickfd handler for VQ %u: %s\n",
+ index, strerror(errno));
+ vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+ ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+ close(vq->kickfd);
+ vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ }
+ fdset_pipe_notify(&vduse.fdset);
+ vhost_enable_guest_notification(dev, vq, 1);
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Ctrl queue event handler installed\n");
+ }
}
static void
@@ -258,6 +296,9 @@ vduse_device_start(struct virtio_net *dev)
for (i = 0; i < dev->nr_vring; i++) {
struct vhost_virtqueue *vq = dev->virtqueue[i];
+ if (vq == dev->cvq)
+ continue;
+
if (dev->notify_ops->vring_state_changed)
dev->notify_ops->vring_state_changed(dev->vid, i, vq->enabled);
}
@@ -334,9 +375,11 @@ vduse_device_create(const char *path)
{
int control_fd, dev_fd, vid, ret;
pthread_t fdset_tid;
- uint32_t i;
+ uint32_t i, max_queue_pairs, total_queues;
struct virtio_net *dev;
+ struct virtio_net_config vnet_config = { 0 };
uint64_t ver = VHOST_VDUSE_API_VERSION;
+ uint64_t features = VDUSE_NET_SUPPORTED_FEATURES;
struct vduse_dev_config *dev_config = NULL;
const char *name = path + strlen("/dev/vduse/");
@@ -376,22 +419,39 @@ vduse_device_create(const char *path)
goto out_ctrl_close;
}
- dev_config = malloc(offsetof(struct vduse_dev_config, config));
+ dev_config = malloc(offsetof(struct vduse_dev_config, config) +
+ sizeof(vnet_config));
if (!dev_config) {
VHOST_LOG_CONFIG(name, ERR, "Failed to allocate VDUSE config\n");
ret = -1;
goto out_ctrl_close;
}
+ ret = rte_vhost_driver_get_queue_num(path, &max_queue_pairs);
+ if (ret < 0) {
+ VHOST_LOG_CONFIG(name, ERR, "Failed to get max queue pairs\n");
+ goto out_free;
+ }
+
+ VHOST_LOG_CONFIG(path, INFO, "VDUSE max queue pairs: %u\n", max_queue_pairs);
+ total_queues = max_queue_pairs * 2;
+
+ if (max_queue_pairs == 1)
+ features &= ~(RTE_BIT64(VIRTIO_NET_F_CTRL_VQ) | RTE_BIT64(VIRTIO_NET_F_MQ));
+ else
+ total_queues += 1; /* Includes ctrl queue */
+
+ vnet_config.max_virtqueue_pairs = max_queue_pairs;
memset(dev_config, 0, sizeof(struct vduse_dev_config));
strncpy(dev_config->name, name, VDUSE_NAME_MAX - 1);
dev_config->device_id = VIRTIO_ID_NET;
dev_config->vendor_id = 0;
- dev_config->features = VDUSE_NET_SUPPORTED_FEATURES;
- dev_config->vq_num = 2;
+ dev_config->features = features;
+ dev_config->vq_num = total_queues;
dev_config->vq_align = sysconf(_SC_PAGE_SIZE);
- dev_config->config_size = 0;
+ dev_config->config_size = sizeof(struct virtio_net_config);
+ memcpy(dev_config->config, &vnet_config, sizeof(vnet_config));
ret = ioctl(control_fd, VDUSE_CREATE_DEV, dev_config);
if (ret < 0) {
@@ -433,7 +493,7 @@ vduse_device_create(const char *path)
dev->vduse_dev_fd = dev_fd;
vhost_setup_virtio_net(dev->vid, true, true, true, true);
- for (i = 0; i < 2; i++) {
+ for (i = 0; i < total_queues; i++) {
struct vduse_vq_config vq_cfg = { 0 };
ret = alloc_vring_queue(dev, i);
@@ -452,6 +512,8 @@ vduse_device_create(const char *path)
}
}
+ dev->cvq = dev->virtqueue[max_queue_pairs * 2];
+
ret = fdset_add(&vduse.fdset, dev->vduse_dev_fd, vduse_events_handler, NULL, dev);
if (ret) {
VHOST_LOG_CONFIG(name, ERR, "Failed to add fd %d to vduse fdset\n",
@@ -498,6 +560,13 @@ vduse_device_destroy(const char *path)
if (vid == RTE_MAX_VHOST_DEVICE)
return -1;
+ if (dev->cvq && dev->cvq->kickfd >= 0) {
+ fdset_del(&vduse.fdset, dev->cvq->kickfd);
+ fdset_pipe_notify(&vduse.fdset);
+ close(dev->cvq->kickfd);
+ dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ }
+
fdset_del(&vduse.fdset, dev->vduse_dev_fd);
fdset_pipe_notify(&vduse.fdset);
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v4 26/26] vhost: add VDUSE device stop
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
` (24 preceding siblings ...)
2023-06-01 20:08 ` [PATCH v4 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
@ 2023-06-01 20:08 ` Maxime Coquelin
2023-06-05 7:56 ` Xia, Chenbo
25 siblings, 1 reply; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-01 20:08 UTC (permalink / raw)
To: dev, chenbo.xia, david.marchand, mkp, fbl, jasowang,
cunming.liang, xieyongji, echaudro, eperezma, amorenoz, lulu
Cc: Maxime Coquelin
This patch adds VDUSE device stop and cleanup of its
virtqueues.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
doc/guides/rel_notes/release_23_07.rst | 6 +++
lib/vhost/vduse.c | 72 +++++++++++++++++++++++---
2 files changed, 70 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 7034fb664c..6f43e8e633 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -67,6 +67,12 @@ New Features
Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit the
maximum number of supported queue pairs, required for VDUSE support.
+* **Added VDUSE support into Vhost library.**
+
+ VDUSE aims at implementing vDPA devices in userspace. It can be used as an
+ alternative to Vhost-user when using Vhost-vDPA, but also enable providing a
+ virtio-net netdev to the host when using Virtio-vDPA driver.
+
Removed Items
-------------
diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 699cfed9e3..f421b1cf4c 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned int index)
}
}
+static void
+vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
+{
+ struct vhost_virtqueue *vq = dev->virtqueue[index];
+ struct vduse_vq_eventfd vq_efd;
+ int ret;
+
+ if (vq == dev->cvq && vq->kickfd >= 0) {
+ fdset_del(&vduse.fdset, vq->kickfd);
+ fdset_pipe_notify(&vduse.fdset);
+ }
+
+ vq_efd.index = index;
+ vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
+
+ ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
+ if (ret)
+ VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd for VQ %u: %s\n",
+ index, strerror(errno));
+
+ close(vq->kickfd);
+ vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+
+ vring_invalidate(dev, vq);
+
+ rte_free(vq->batch_copy_elems);
+ vq->batch_copy_elems = NULL;
+
+ rte_free(vq->shadow_used_split);
+ vq->shadow_used_split = NULL;
+
+ vq->enabled = false;
+ vq->ready = false;
+ vq->size = 0;
+ vq->last_used_idx = 0;
+ vq->last_avail_idx = 0;
+}
+
static void
vduse_device_start(struct virtio_net *dev)
{
@@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
}
}
+static void
+vduse_device_stop(struct virtio_net *dev)
+{
+ unsigned int i;
+
+ VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
+
+ vhost_destroy_device_notify(dev);
+
+ dev->flags &= ~VIRTIO_DEV_READY;
+
+ for (i = 0; i < dev->nr_vring; i++)
+ vduse_vring_cleanup(dev, i);
+
+ vhost_user_iotlb_flush_all(dev);
+}
+
static void
vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
{
@@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
struct vduse_dev_request req;
struct vduse_dev_response resp;
struct vhost_virtqueue *vq;
+ uint8_t old_status;
int ret;
memset(&resp, 0, sizeof(resp));
@@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
case VDUSE_SET_STATUS:
VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
req.s.status);
+ old_status = dev->status;
dev->status = req.s.status;
- if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
- vduse_device_start(dev);
+ if ((old_status ^ dev->status) & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
+ if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+ vduse_device_start(dev);
+ else
+ vduse_device_stop(dev);
+ }
resp.result = VDUSE_REQ_RESULT_OK;
break;
@@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
if (vid == RTE_MAX_VHOST_DEVICE)
return -1;
- if (dev->cvq && dev->cvq->kickfd >= 0) {
- fdset_del(&vduse.fdset, dev->cvq->kickfd);
- fdset_pipe_notify(&vduse.fdset);
- close(dev->cvq->kickfd);
- dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
- }
+ vduse_device_stop(dev);
fdset_del(&vduse.fdset, dev->vduse_dev_fd);
fdset_pipe_notify(&vduse.fdset);
--
2.40.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v4 13/26] vhost: add API to set max queue pairs
2023-06-01 20:07 ` [PATCH v4 13/26] vhost: add API to set max queue pairs Maxime Coquelin
@ 2023-06-05 7:56 ` Xia, Chenbo
0 siblings, 0 replies; 30+ messages in thread
From: Xia, Chenbo @ 2023-06-05 7:56 UTC (permalink / raw)
To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 2, 2023 4:08 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v4 13/26] vhost: add API to set max queue pairs
>
> This patch introduces a new rte_vhost_driver_set_max_queues
> API as preliminary work for multiqueue support with VDUSE.
>
> Indeed, with VDUSE we need to pre-allocate the vrings at
> device creation time, so we need such API not to allocate
> the 128 queue pairs supported by the Vhost library.
>
> Calling the API is optional, 128 queue pairs remaining the
> default.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> doc/guides/prog_guide/vhost_lib.rst | 4 +++
> doc/guides/rel_notes/release_23_07.rst | 5 ++++
> lib/vhost/rte_vhost.h | 17 ++++++++++++
> lib/vhost/socket.c | 36 ++++++++++++++++++++++++--
> lib/vhost/version.map | 1 +
> 5 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 0f964d7a4a..0c2b4d020a 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -339,6 +339,10 @@ The following is an overview of some key Vhost API
> functions:
> Inject the offloaded interrupt received by the 'guest_notify' callback,
> into the vhost device's queue.
>
> +* ``rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
> max_queue_pairs)``
> +
> + Set the maximum number of queue pairs supported by the device.
> +
> Vhost-user Implementations
> --------------------------
>
> diff --git a/doc/guides/rel_notes/release_23_07.rst
> b/doc/guides/rel_notes/release_23_07.rst
> index 3eed8ac561..7034fb664c 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -62,6 +62,11 @@ New Features
> rte_vhost_notify_guest(), is added to raise the interrupt outside of
> the
> fast path.
>
> +* **Added Vhost API to set maximum queue pairs supported.**
> +
> + Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
> the
> + maximum number of supported queue pairs, required for VDUSE support.
> +
>
> Removed Items
> -------------
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index 7a10bc36cf..7844c9f142 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -609,6 +609,23 @@ rte_vhost_driver_get_protocol_features(const char
> *path,
> int
> rte_vhost_driver_get_queue_num(const char *path, uint32_t *queue_num);
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice.
> + *
> + * Set the maximum number of queue pairs supported by the device.
> + *
> + * @param path
> + * The vhost-user socket file path
> + * @param max_queue_pairs
> + * The maximum number of queue pairs
> + * @return
> + * 0 on success, -1 on failure
> + */
> +__rte_experimental
> +int
> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
> max_queue_pairs);
> +
> /**
> * Get the feature bits after negotiation
> *
> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c
> index 407d0011c3..29f7a8cece 100644
> --- a/lib/vhost/socket.c
> +++ b/lib/vhost/socket.c
> @@ -57,6 +57,8 @@ struct vhost_user_socket {
>
> uint64_t protocol_features;
>
> + uint32_t max_queue_pairs;
> +
> struct rte_vdpa_device *vdpa_dev;
>
> struct rte_vhost_device_ops const *notify_ops;
> @@ -823,7 +825,7 @@ rte_vhost_driver_get_queue_num(const char *path,
> uint32_t *queue_num)
>
> vdpa_dev = vsocket->vdpa_dev;
> if (!vdpa_dev) {
> - *queue_num = VHOST_MAX_QUEUE_PAIRS;
> + *queue_num = vsocket->max_queue_pairs;
> goto unlock_exit;
> }
>
> @@ -833,7 +835,36 @@ rte_vhost_driver_get_queue_num(const char *path,
> uint32_t *queue_num)
> goto unlock_exit;
> }
>
> - *queue_num = RTE_MIN((uint32_t)VHOST_MAX_QUEUE_PAIRS,
> vdpa_queue_num);
> + *queue_num = RTE_MIN(vsocket->max_queue_pairs, vdpa_queue_num);
> +
> +unlock_exit:
> + pthread_mutex_unlock(&vhost_user.mutex);
> + return ret;
> +}
> +
> +int
> +rte_vhost_driver_set_max_queue_num(const char *path, uint32_t
> max_queue_pairs)
> +{
> + struct vhost_user_socket *vsocket;
> + int ret = 0;
> +
> + VHOST_LOG_CONFIG(path, INFO, "Setting max queue pairs to %u\n",
> max_queue_pairs);
> +
> + if (max_queue_pairs > VHOST_MAX_QUEUE_PAIRS) {
> + VHOST_LOG_CONFIG(path, ERR, "Library only supports up to %u
> queue pairs\n",
> + VHOST_MAX_QUEUE_PAIRS);
> + return -1;
> + }
> +
> + pthread_mutex_lock(&vhost_user.mutex);
> + vsocket = find_vhost_user_socket(path);
> + if (!vsocket) {
> + VHOST_LOG_CONFIG(path, ERR, "socket file is not registered
> yet.\n");
> + ret = -1;
> + goto unlock_exit;
> + }
> +
> + vsocket->max_queue_pairs = max_queue_pairs;
>
> unlock_exit:
> pthread_mutex_unlock(&vhost_user.mutex);
> @@ -889,6 +920,7 @@ rte_vhost_driver_register(const char *path, uint64_t
> flags)
> goto out_free;
> }
> vsocket->vdpa_dev = NULL;
> + vsocket->max_queue_pairs = VHOST_MAX_QUEUE_PAIRS;
> vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT;
> vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY;
> diff --git a/lib/vhost/version.map b/lib/vhost/version.map
> index 7bcbfd12cf..5051c29022 100644
> --- a/lib/vhost/version.map
> +++ b/lib/vhost/version.map
> @@ -107,6 +107,7 @@ EXPERIMENTAL {
>
> # added in 23.07
> rte_vhost_notify_guest;
> + rte_vhost_driver_set_max_queue_num;
> };
>
> INTERNAL {
> --
> 2.40.1
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [PATCH v4 26/26] vhost: add VDUSE device stop
2023-06-01 20:08 ` [PATCH v4 26/26] vhost: add VDUSE device stop Maxime Coquelin
@ 2023-06-05 7:56 ` Xia, Chenbo
2023-06-06 8:14 ` Maxime Coquelin
0 siblings, 1 reply; 30+ messages in thread
From: Xia, Chenbo @ 2023-06-05 7:56 UTC (permalink / raw)
To: Maxime Coquelin, dev, david.marchand, mkp, fbl, jasowang, Liang,
Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 2, 2023 4:08 AM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
> amorenoz@redhat.com; lulu@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v4 26/26] vhost: add VDUSE device stop
>
> This patch adds VDUSE device stop and cleanup of its
> virtqueues.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> doc/guides/rel_notes/release_23_07.rst | 6 +++
> lib/vhost/vduse.c | 72 +++++++++++++++++++++++---
> 2 files changed, 70 insertions(+), 8 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_23_07.rst
> b/doc/guides/rel_notes/release_23_07.rst
> index 7034fb664c..6f43e8e633 100644
> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -67,6 +67,12 @@ New Features
> Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
> the
> maximum number of supported queue pairs, required for VDUSE support.
>
> +* **Added VDUSE support into Vhost library.**
> +
> + VDUSE aims at implementing vDPA devices in userspace. It can be used as
> an
> + alternative to Vhost-user when using Vhost-vDPA, but also enable
> providing a
> + virtio-net netdev to the host when using Virtio-vDPA driver.
> +
>
> Removed Items
> -------------
> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
> index 699cfed9e3..f421b1cf4c 100644
> --- a/lib/vhost/vduse.c
> +++ b/lib/vhost/vduse.c
> @@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
> int index)
> }
> }
>
> +static void
> +vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
> +{
> + struct vhost_virtqueue *vq = dev->virtqueue[index];
> + struct vduse_vq_eventfd vq_efd;
> + int ret;
> +
> + if (vq == dev->cvq && vq->kickfd >= 0) {
> + fdset_del(&vduse.fdset, vq->kickfd);
> + fdset_pipe_notify(&vduse.fdset);
> + }
> +
> + vq_efd.index = index;
> + vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
> +
> + ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
> + if (ret)
> + VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd
> for VQ %u: %s\n",
> + index, strerror(errno));
> +
> + close(vq->kickfd);
> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> +
> + vring_invalidate(dev, vq);
> +
> + rte_free(vq->batch_copy_elems);
> + vq->batch_copy_elems = NULL;
> +
> + rte_free(vq->shadow_used_split);
> + vq->shadow_used_split = NULL;
> +
> + vq->enabled = false;
> + vq->ready = false;
> + vq->size = 0;
> + vq->last_used_idx = 0;
> + vq->last_avail_idx = 0;
> +}
> +
> static void
> vduse_device_start(struct virtio_net *dev)
> {
> @@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
> }
> }
>
> +static void
> +vduse_device_stop(struct virtio_net *dev)
> +{
> + unsigned int i;
> +
> + VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
> +
> + vhost_destroy_device_notify(dev);
> +
> + dev->flags &= ~VIRTIO_DEV_READY;
> +
> + for (i = 0; i < dev->nr_vring; i++)
> + vduse_vring_cleanup(dev, i);
> +
> + vhost_user_iotlb_flush_all(dev);
> +}
> +
> static void
> vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
> {
> @@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
> struct vduse_dev_request req;
> struct vduse_dev_response resp;
> struct vhost_virtqueue *vq;
> + uint8_t old_status;
> int ret;
>
> memset(&resp, 0, sizeof(resp));
> @@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove
> __rte_unused)
> case VDUSE_SET_STATUS:
> VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
> req.s.status);
> + old_status = dev->status;
> dev->status = req.s.status;
>
> - if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> - vduse_device_start(dev);
> + if ((old_status ^ dev->status) &
> VIRTIO_DEVICE_STATUS_DRIVER_OK) {
> + if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
> + vduse_device_start(dev);
> + else
> + vduse_device_stop(dev);
> + }
>
> resp.result = VDUSE_REQ_RESULT_OK;
> break;
> @@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
> if (vid == RTE_MAX_VHOST_DEVICE)
> return -1;
>
> - if (dev->cvq && dev->cvq->kickfd >= 0) {
> - fdset_del(&vduse.fdset, dev->cvq->kickfd);
> - fdset_pipe_notify(&vduse.fdset);
> - close(dev->cvq->kickfd);
> - dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> - }
> + vduse_device_stop(dev);
>
> fdset_del(&vduse.fdset, dev->vduse_dev_fd);
> fdset_pipe_notify(&vduse.fdset);
> --
> 2.40.1
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v4 26/26] vhost: add VDUSE device stop
2023-06-05 7:56 ` Xia, Chenbo
@ 2023-06-06 8:14 ` Maxime Coquelin
0 siblings, 0 replies; 30+ messages in thread
From: Maxime Coquelin @ 2023-06-06 8:14 UTC (permalink / raw)
To: Xia, Chenbo, dev, david.marchand, mkp, fbl, jasowang, Liang,
Cunming, Xie, Yongji, echaudro, eperezma, amorenoz, lulu
On 6/5/23 09:56, Xia, Chenbo wrote:
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, June 2, 2023 4:08 AM
>> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com; lulu@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v4 26/26] vhost: add VDUSE device stop
>>
>> This patch adds VDUSE device stop and cleanup of its
>> virtqueues.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> doc/guides/rel_notes/release_23_07.rst | 6 +++
>> lib/vhost/vduse.c | 72 +++++++++++++++++++++++---
>> 2 files changed, 70 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_23_07.rst
>> b/doc/guides/rel_notes/release_23_07.rst
>> index 7034fb664c..6f43e8e633 100644
>> --- a/doc/guides/rel_notes/release_23_07.rst
>> +++ b/doc/guides/rel_notes/release_23_07.rst
>> @@ -67,6 +67,12 @@ New Features
>> Introduced ``rte_vhost_driver_set_max_queue_num()`` to be able to limit
>> the
>> maximum number of supported queue pairs, required for VDUSE support.
>>
>> +* **Added VDUSE support into Vhost library.**
>> +
>> + VDUSE aims at implementing vDPA devices in userspace. It can be used as
>> an
>> + alternative to Vhost-user when using Vhost-vDPA, but also enable
>> providing a
>> + virtio-net netdev to the host when using Virtio-vDPA driver.
>> +
>>
>> Removed Items
>> -------------
>> diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
>> index 699cfed9e3..f421b1cf4c 100644
>> --- a/lib/vhost/vduse.c
>> +++ b/lib/vhost/vduse.c
>> @@ -252,6 +252,44 @@ vduse_vring_setup(struct virtio_net *dev, unsigned
>> int index)
>> }
>> }
>>
>> +static void
>> +vduse_vring_cleanup(struct virtio_net *dev, unsigned int index)
>> +{
>> + struct vhost_virtqueue *vq = dev->virtqueue[index];
>> + struct vduse_vq_eventfd vq_efd;
>> + int ret;
>> +
>> + if (vq == dev->cvq && vq->kickfd >= 0) {
>> + fdset_del(&vduse.fdset, vq->kickfd);
>> + fdset_pipe_notify(&vduse.fdset);
>> + }
>> +
>> + vq_efd.index = index;
>> + vq_efd.fd = VDUSE_EVENTFD_DEASSIGN;
>> +
>> + ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_SETUP_KICKFD, &vq_efd);
>> + if (ret)
>> + VHOST_LOG_CONFIG(dev->ifname, ERR, "Failed to cleanup kickfd
>> for VQ %u: %s\n",
>> + index, strerror(errno));
>> +
>> + close(vq->kickfd);
>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>> +
>> + vring_invalidate(dev, vq);
>> +
>> + rte_free(vq->batch_copy_elems);
>> + vq->batch_copy_elems = NULL;
>> +
>> + rte_free(vq->shadow_used_split);
>> + vq->shadow_used_split = NULL;
>> +
>> + vq->enabled = false;
>> + vq->ready = false;
>> + vq->size = 0;
>> + vq->last_used_idx = 0;
>> + vq->last_avail_idx = 0;
>> +}
>> +
>> static void
>> vduse_device_start(struct virtio_net *dev)
>> {
>> @@ -304,6 +342,23 @@ vduse_device_start(struct virtio_net *dev)
>> }
>> }
>>
>> +static void
>> +vduse_device_stop(struct virtio_net *dev)
>> +{
>> + unsigned int i;
>> +
>> + VHOST_LOG_CONFIG(dev->ifname, INFO, "Stopping device...\n");
>> +
>> + vhost_destroy_device_notify(dev);
>> +
>> + dev->flags &= ~VIRTIO_DEV_READY;
>> +
>> + for (i = 0; i < dev->nr_vring; i++)
>> + vduse_vring_cleanup(dev, i);
>> +
>> + vhost_user_iotlb_flush_all(dev);
>> +}
>> +
>> static void
>> vduse_events_handler(int fd, void *arg, int *remove __rte_unused)
>> {
>> @@ -311,6 +366,7 @@ vduse_events_handler(int fd, void *arg, int *remove
>> __rte_unused)
>> struct vduse_dev_request req;
>> struct vduse_dev_response resp;
>> struct vhost_virtqueue *vq;
>> + uint8_t old_status;
>> int ret;
>>
>> memset(&resp, 0, sizeof(resp));
>> @@ -339,10 +395,15 @@ vduse_events_handler(int fd, void *arg, int *remove
>> __rte_unused)
>> case VDUSE_SET_STATUS:
>> VHOST_LOG_CONFIG(dev->ifname, INFO, "\tnew status: 0x%08x\n",
>> req.s.status);
>> + old_status = dev->status;
>> dev->status = req.s.status;
>>
>> - if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
>> - vduse_device_start(dev);
>> + if ((old_status ^ dev->status) &
>> VIRTIO_DEVICE_STATUS_DRIVER_OK) {
>> + if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
>> + vduse_device_start(dev);
>> + else
>> + vduse_device_stop(dev);
>> + }
>>
>> resp.result = VDUSE_REQ_RESULT_OK;
>> break;
>> @@ -560,12 +621,7 @@ vduse_device_destroy(const char *path)
>> if (vid == RTE_MAX_VHOST_DEVICE)
>> return -1;
>>
>> - if (dev->cvq && dev->cvq->kickfd >= 0) {
>> - fdset_del(&vduse.fdset, dev->cvq->kickfd);
>> - fdset_pipe_notify(&vduse.fdset);
>> - close(dev->cvq->kickfd);
>> - dev->cvq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>> - }
>> + vduse_device_stop(dev);
>>
>> fdset_del(&vduse.fdset, dev->vduse_dev_fd);
>> fdset_pipe_notify(&vduse.fdset);
>> --
>> 2.40.1
>
> Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
>
Thanks Chenbo, I just fixed an issue when testing with OVS that impact
both this patch and the one adding VDUSE device startup.
So your R-by on these patches will be removed for both these patches in
v5.
Maxime
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2023-06-06 8:15 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 20:07 [PATCH v4 00/26] Add VDUSE support to Vhost library Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 09/26] vhost: introduce backend ops Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 12/26] vhost: add helper for interrupt injection Maxime Coquelin
2023-06-01 20:07 ` [PATCH v4 13/26] vhost: add API to set max queue pairs Maxime Coquelin
2023-06-05 7:56 ` Xia, Chenbo
2023-06-01 20:08 ` [PATCH v4 14/26] net/vhost: use " Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 15/26] vhost: add control virtqueue support Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 20/26] vhost: add VDUSE events handler Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 24/26] vhost: add VDUSE device startup Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-06-01 20:08 ` [PATCH v4 26/26] vhost: add VDUSE device stop Maxime Coquelin
2023-06-05 7:56 ` Xia, Chenbo
2023-06-06 8:14 ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).