vDPA allows to offload Virtio Datapath processing by supported NICs, like IFCVF for example. The control path has to be handled by a dedicated vDPA driver, so that it can translate Vhost-user protocol requests to proprietary NICs registers accesses. This driver is the vDPA driver for Virtio devices, meaning that Vhost-user protocol requests get translated to Virtio registers accesses as per defined in the Virtio spec. Basically, it can be used within a guest with a para-virtualized Virtio-net device, or even with a full Virtio HW offload NIC directly on host. Amongst the main features, all currently supported Virtio spec versions are supported (split & packed rings, but only tested with split ring for now) and also multiqueue support is added by implementing the cotnrol virtqueue in the driver. The structure of this driver is heavily based on IFCVF vDPA. Maxime Coquelin (15): vhost: remove vhost kernel header inclusion vhost: configure vDPA as soon as the device is ready net/virtio: move control path fonctions in virtqueue file net/virtio: add virtio PCI subsystem device ID declaration net/virtio: save notify bar ID in virtio HW struct net/virtio: add skeleton for virtio vDPA driver net/virtio: add vDPA ops to get number of queue net/virtio: add virtio vDPA op to get features net/virtio: add virtio vDPA op to get protocol features net/virtio: add vDPA op to configure and start the device net/virtio: add vDPA op to stop and close the device net/virtio: add vDPA op to set features net/virtio: add vDPA ops to get VFIO FDs net/virtio: add vDPA op to get notification area doc: add documentation for Virtio vDPA driver config/common_linux | 1 + doc/guides/nics/index.rst | 1 + doc/guides/nics/virtio_vdpa.rst | 45 ++ drivers/net/ifc/ifcvf_vdpa.c | 1 + drivers/net/virtio/Makefile | 4 + drivers/net/virtio/meson.build | 3 +- drivers/net/virtio/virtio_ethdev.c | 252 -------- drivers/net/virtio/virtio_pci.c | 6 +- drivers/net/virtio/virtio_pci.h | 2 + drivers/net/virtio/virtio_vdpa.c | 918 +++++++++++++++++++++++++++++ drivers/net/virtio/virtqueue.c | 255 ++++++++ drivers/net/virtio/virtqueue.h | 5 + lib/librte_vhost/rte_vdpa.h | 1 - lib/librte_vhost/rte_vhost.h | 9 +- lib/librte_vhost/vhost_user.c | 3 +- 15 files changed, 1243 insertions(+), 263 deletions(-) create mode 100644 doc/guides/nics/virtio_vdpa.rst create mode 100644 drivers/net/virtio/virtio_vdpa.c -- 2.21.0
This is preliminary rework for virtio-vdpa driver, in order to avoid conflicts with Virtio PMD headers. Generally, I think it is better not to include kernel headers in RTE headers, especially in the case of Vhost and Virtio which just re-use the kernel definitions, and has no runtime dependencies. In order to not break IFC driver build, the vhost kernel header is now included directly in the driver. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/ifc/ifcvf_vdpa.c | 1 + lib/librte_vhost/rte_vdpa.h | 1 - lib/librte_vhost/rte_vhost.h | 9 ++++----- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 8de9ef199..40cb15ca8 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -7,6 +7,7 @@ #include <fcntl.h> #include <sys/ioctl.h> #include <sys/epoll.h> +#include <linux/vhost.h> #include <linux/virtio_net.h> #include <stdbool.h> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index 9a3deb31d..69438210b 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -14,7 +14,6 @@ #include <stdbool.h> #include <rte_pci.h> -#include "rte_vhost.h" #define MAX_VDPA_NAME_LEN 128 diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 7fb172912..62d3c3c36 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -20,11 +20,6 @@ extern "C" { #endif -/* These are not C++-aware. */ -#include <linux/vhost.h> -#include <linux/virtio_ring.h> -#include <linux/virtio_net.h> - #define RTE_VHOST_USER_CLIENT (1ULL << 0) #define RTE_VHOST_USER_NO_RECONNECT (1ULL << 1) #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) @@ -72,6 +67,10 @@ extern "C" { #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 #endif +#ifndef VHOST_F_LOG_ALL +#define VHOST_F_LOG_ALL 26 +#endif + /** Indicate whether protocol features negotiation is supported. */ #ifndef VHOST_USER_F_PROTOCOL_FEATURES #define VHOST_USER_F_PROTOCOL_FEATURES 30 -- 2.21.0
There might not have any VHOST_USER_SET_VRING_CALL requests sent once virtio device is ready. When it happens, the vDPA device's dev_conf() callback may never be called. Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call message") Cc: stable@dpdk.org Cc: xiaolong.ye@intel.com Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- lib/librte_vhost/vhost_user.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 0b72648a5..b1ea80c52 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd) did = dev->vdpa_dev_id; vdpa_dev = rte_vdpa_get_device(did); if (vdpa_dev && virtio_is_ready(dev) && - !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) && - msg.request.master == VHOST_USER_SET_VRING_CALL) { + !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) { if (vdpa_dev->ops->dev_conf) vdpa_dev->ops->dev_conf(dev->vid); dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED; -- 2.21.0
Virtio-vdpa driver needs to implement the control path, so move related functions to virtqueue file so that it can be used by both Virtio PMD and Virtio-vdpa drivers. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_ethdev.c | 252 ---------------------------- drivers/net/virtio/virtqueue.c | 255 +++++++++++++++++++++++++++++ drivers/net/virtio/virtqueue.h | 5 + 3 files changed, 260 insertions(+), 252 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index f96588b9d..3682ee318 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -140,226 +140,6 @@ static const struct rte_virtio_xstats_name_off rte_virtio_txq_stat_strings[] = { struct virtio_hw_internal virtio_hw_internal[RTE_MAX_ETHPORTS]; -static struct virtio_pmd_ctrl * -virtio_send_command_packed(struct virtnet_ctl *cvq, - struct virtio_pmd_ctrl *ctrl, - int *dlen, int pkt_num) -{ - struct virtqueue *vq = cvq->vq; - int head; - struct vring_packed_desc *desc = vq->vq_packed.ring.desc; - struct virtio_pmd_ctrl *result; - uint16_t flags; - int sum = 0; - int nb_descs = 0; - int k; - - /* - * Format is enforced in qemu code: - * One TX packet for header; - * At least one TX packet per argument; - * One RX packet for ACK. - */ - head = vq->vq_avail_idx; - flags = vq->vq_packed.cached_flags; - desc[head].addr = cvq->virtio_net_hdr_mem; - desc[head].len = sizeof(struct virtio_net_ctrl_hdr); - vq->vq_free_cnt--; - nb_descs++; - if (++vq->vq_avail_idx >= vq->vq_nentries) { - vq->vq_avail_idx -= vq->vq_nentries; - vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED; - } - - for (k = 0; k < pkt_num; k++) { - desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem - + sizeof(struct virtio_net_ctrl_hdr) - + sizeof(ctrl->status) + sizeof(uint8_t) * sum; - desc[vq->vq_avail_idx].len = dlen[k]; - desc[vq->vq_avail_idx].flags = VRING_DESC_F_NEXT | - vq->vq_packed.cached_flags; - sum += dlen[k]; - vq->vq_free_cnt--; - nb_descs++; - if (++vq->vq_avail_idx >= vq->vq_nentries) { - vq->vq_avail_idx -= vq->vq_nentries; - vq->vq_packed.cached_flags ^= - VRING_PACKED_DESC_F_AVAIL_USED; - } - } - - desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem - + sizeof(struct virtio_net_ctrl_hdr); - desc[vq->vq_avail_idx].len = sizeof(ctrl->status); - desc[vq->vq_avail_idx].flags = VRING_DESC_F_WRITE | - vq->vq_packed.cached_flags; - vq->vq_free_cnt--; - nb_descs++; - if (++vq->vq_avail_idx >= vq->vq_nentries) { - vq->vq_avail_idx -= vq->vq_nentries; - vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED; - } - - virtio_wmb(vq->hw->weak_barriers); - desc[head].flags = VRING_DESC_F_NEXT | flags; - - virtio_wmb(vq->hw->weak_barriers); - virtqueue_notify(vq); - - /* wait for used descriptors in virtqueue */ - while (!desc_is_used(&desc[head], vq)) - usleep(100); - - virtio_rmb(vq->hw->weak_barriers); - - /* now get used descriptors */ - vq->vq_free_cnt += nb_descs; - vq->vq_used_cons_idx += nb_descs; - if (vq->vq_used_cons_idx >= vq->vq_nentries) { - vq->vq_used_cons_idx -= vq->vq_nentries; - vq->vq_packed.used_wrap_counter ^= 1; - } - - PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\n" - "vq->vq_avail_idx=%d\n" - "vq->vq_used_cons_idx=%d\n" - "vq->vq_packed.cached_flags=0x%x\n" - "vq->vq_packed.used_wrap_counter=%d\n", - vq->vq_free_cnt, - vq->vq_avail_idx, - vq->vq_used_cons_idx, - vq->vq_packed.cached_flags, - vq->vq_packed.used_wrap_counter); - - result = cvq->virtio_net_hdr_mz->addr; - return result; -} - -static struct virtio_pmd_ctrl * -virtio_send_command_split(struct virtnet_ctl *cvq, - struct virtio_pmd_ctrl *ctrl, - int *dlen, int pkt_num) -{ - struct virtio_pmd_ctrl *result; - struct virtqueue *vq = cvq->vq; - uint32_t head, i; - int k, sum = 0; - - head = vq->vq_desc_head_idx; - - /* - * Format is enforced in qemu code: - * One TX packet for header; - * At least one TX packet per argument; - * One RX packet for ACK. - */ - vq->vq_split.ring.desc[head].flags = VRING_DESC_F_NEXT; - vq->vq_split.ring.desc[head].addr = cvq->virtio_net_hdr_mem; - vq->vq_split.ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr); - vq->vq_free_cnt--; - i = vq->vq_split.ring.desc[head].next; - - for (k = 0; k < pkt_num; k++) { - vq->vq_split.ring.desc[i].flags = VRING_DESC_F_NEXT; - vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem - + sizeof(struct virtio_net_ctrl_hdr) - + sizeof(ctrl->status) + sizeof(uint8_t)*sum; - vq->vq_split.ring.desc[i].len = dlen[k]; - sum += dlen[k]; - vq->vq_free_cnt--; - i = vq->vq_split.ring.desc[i].next; - } - - vq->vq_split.ring.desc[i].flags = VRING_DESC_F_WRITE; - vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem - + sizeof(struct virtio_net_ctrl_hdr); - vq->vq_split.ring.desc[i].len = sizeof(ctrl->status); - vq->vq_free_cnt--; - - vq->vq_desc_head_idx = vq->vq_split.ring.desc[i].next; - - vq_update_avail_ring(vq, head); - vq_update_avail_idx(vq); - - PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index); - - virtqueue_notify(vq); - - rte_rmb(); - while (VIRTQUEUE_NUSED(vq) == 0) { - rte_rmb(); - usleep(100); - } - - while (VIRTQUEUE_NUSED(vq)) { - uint32_t idx, desc_idx, used_idx; - struct vring_used_elem *uep; - - used_idx = (uint32_t)(vq->vq_used_cons_idx - & (vq->vq_nentries - 1)); - uep = &vq->vq_split.ring.used->ring[used_idx]; - idx = (uint32_t) uep->id; - desc_idx = idx; - - while (vq->vq_split.ring.desc[desc_idx].flags & - VRING_DESC_F_NEXT) { - desc_idx = vq->vq_split.ring.desc[desc_idx].next; - vq->vq_free_cnt++; - } - - vq->vq_split.ring.desc[desc_idx].next = vq->vq_desc_head_idx; - vq->vq_desc_head_idx = idx; - - vq->vq_used_cons_idx++; - vq->vq_free_cnt++; - } - - PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d", - vq->vq_free_cnt, vq->vq_desc_head_idx); - - result = cvq->virtio_net_hdr_mz->addr; - return result; -} - -static int -virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl, - int *dlen, int pkt_num) -{ - virtio_net_ctrl_ack status = ~0; - struct virtio_pmd_ctrl *result; - struct virtqueue *vq; - - ctrl->status = status; - - if (!cvq || !cvq->vq) { - PMD_INIT_LOG(ERR, "Control queue is not supported."); - return -1; - } - - rte_spinlock_lock(&cvq->lock); - vq = cvq->vq; - - PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, " - "vq->hw->cvq = %p vq = %p", - vq->vq_desc_head_idx, status, vq->hw->cvq, vq); - - if (vq->vq_free_cnt < pkt_num + 2 || pkt_num < 1) { - rte_spinlock_unlock(&cvq->lock); - return -1; - } - - memcpy(cvq->virtio_net_hdr_mz->addr, ctrl, - sizeof(struct virtio_pmd_ctrl)); - - if (vtpci_packed_queue(vq->hw)) - result = virtio_send_command_packed(cvq, ctrl, dlen, pkt_num); - else - result = virtio_send_command_split(cvq, ctrl, dlen, pkt_num); - - rte_spinlock_unlock(&cvq->lock); - return result->status; -} - static int virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues) { @@ -401,38 +181,6 @@ virtio_get_nr_vq(struct virtio_hw *hw) return nr_vq; } -static void -virtio_init_vring(struct virtqueue *vq) -{ - int size = vq->vq_nentries; - uint8_t *ring_mem = vq->vq_ring_virt_mem; - - PMD_INIT_FUNC_TRACE(); - - memset(ring_mem, 0, vq->vq_ring_size); - - vq->vq_used_cons_idx = 0; - vq->vq_desc_head_idx = 0; - vq->vq_avail_idx = 0; - vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); - vq->vq_free_cnt = vq->vq_nentries; - memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries); - if (vtpci_packed_queue(vq->hw)) { - vring_init_packed(&vq->vq_packed.ring, ring_mem, - VIRTIO_PCI_VRING_ALIGN, size); - vring_desc_init_packed(vq, size); - } else { - struct vring *vr = &vq->vq_split.ring; - - vring_init_split(vr, ring_mem, VIRTIO_PCI_VRING_ALIGN, size); - vring_desc_init_split(vr->desc, size); - } - /* - * Disable device(host) interrupting guest - */ - virtqueue_disable_intr(vq); -} - static int virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx) { diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c index 5ff1e3587..db630e07c 100644 --- a/drivers/net/virtio/virtqueue.c +++ b/drivers/net/virtio/virtqueue.c @@ -2,6 +2,7 @@ * Copyright(c) 2010-2015 Intel Corporation */ #include <stdint.h> +#include <unistd.h> #include <rte_mbuf.h> @@ -141,3 +142,257 @@ virtqueue_rxvq_flush(struct virtqueue *vq) else virtqueue_rxvq_flush_split(vq); } + +static struct virtio_pmd_ctrl * +virtio_send_command_packed(struct virtnet_ctl *cvq, + struct virtio_pmd_ctrl *ctrl, + int *dlen, int pkt_num) +{ + struct virtqueue *vq = cvq->vq; + int head; + struct vring_packed_desc *desc = vq->vq_packed.ring.desc; + struct virtio_pmd_ctrl *result; + uint16_t flags; + int sum = 0; + int nb_descs = 0; + int k; + + /* + * Format is enforced in qemu code: + * One TX packet for header; + * At least one TX packet per argument; + * One RX packet for ACK. + */ + head = vq->vq_avail_idx; + flags = vq->vq_packed.cached_flags; + desc[head].addr = cvq->virtio_net_hdr_mem; + desc[head].len = sizeof(struct virtio_net_ctrl_hdr); + vq->vq_free_cnt--; + nb_descs++; + if (++vq->vq_avail_idx >= vq->vq_nentries) { + vq->vq_avail_idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED; + } + + for (k = 0; k < pkt_num; k++) { + desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem + + sizeof(struct virtio_net_ctrl_hdr) + + sizeof(ctrl->status) + sizeof(uint8_t) * sum; + desc[vq->vq_avail_idx].len = dlen[k]; + desc[vq->vq_avail_idx].flags = VRING_DESC_F_NEXT | + vq->vq_packed.cached_flags; + sum += dlen[k]; + vq->vq_free_cnt--; + nb_descs++; + if (++vq->vq_avail_idx >= vq->vq_nentries) { + vq->vq_avail_idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= + VRING_PACKED_DESC_F_AVAIL_USED; + } + } + + desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem + + sizeof(struct virtio_net_ctrl_hdr); + desc[vq->vq_avail_idx].len = sizeof(ctrl->status); + desc[vq->vq_avail_idx].flags = VRING_DESC_F_WRITE | + vq->vq_packed.cached_flags; + vq->vq_free_cnt--; + nb_descs++; + if (++vq->vq_avail_idx >= vq->vq_nentries) { + vq->vq_avail_idx -= vq->vq_nentries; + vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED; + } + + virtio_wmb(vq->hw->weak_barriers); + desc[head].flags = VRING_DESC_F_NEXT | flags; + + virtio_wmb(vq->hw->weak_barriers); + virtqueue_notify(vq); + + /* wait for used descriptors in virtqueue */ + while (!desc_is_used(&desc[head], vq)) + usleep(100); + + virtio_rmb(vq->hw->weak_barriers); + + /* now get used descriptors */ + vq->vq_free_cnt += nb_descs; + vq->vq_used_cons_idx += nb_descs; + if (vq->vq_used_cons_idx >= vq->vq_nentries) { + vq->vq_used_cons_idx -= vq->vq_nentries; + vq->vq_packed.used_wrap_counter ^= 1; + } + + PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\n" + "vq->vq_avail_idx=%d\n" + "vq->vq_used_cons_idx=%d\n" + "vq->vq_packed.cached_flags=0x%x\n" + "vq->vq_packed.used_wrap_counter=%d\n", + vq->vq_free_cnt, + vq->vq_avail_idx, + vq->vq_used_cons_idx, + vq->vq_packed.cached_flags, + vq->vq_packed.used_wrap_counter); + + result = cvq->virtio_net_hdr_mz->addr; + return result; +} + +static struct virtio_pmd_ctrl * +virtio_send_command_split(struct virtnet_ctl *cvq, + struct virtio_pmd_ctrl *ctrl, + int *dlen, int pkt_num) +{ + struct virtio_pmd_ctrl *result; + struct virtqueue *vq = cvq->vq; + uint32_t head, i; + int k, sum = 0; + + head = vq->vq_desc_head_idx; + + /* + * Format is enforced in qemu code: + * One TX packet for header; + * At least one TX packet per argument; + * One RX packet for ACK. + */ + vq->vq_split.ring.desc[head].flags = VRING_DESC_F_NEXT; + vq->vq_split.ring.desc[head].addr = cvq->virtio_net_hdr_mem; + vq->vq_split.ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr); + vq->vq_free_cnt--; + i = vq->vq_split.ring.desc[head].next; + + for (k = 0; k < pkt_num; k++) { + vq->vq_split.ring.desc[i].flags = VRING_DESC_F_NEXT; + vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem + + sizeof(struct virtio_net_ctrl_hdr) + + sizeof(ctrl->status) + sizeof(uint8_t) * sum; + vq->vq_split.ring.desc[i].len = dlen[k]; + sum += dlen[k]; + vq->vq_free_cnt--; + i = vq->vq_split.ring.desc[i].next; + } + + vq->vq_split.ring.desc[i].flags = VRING_DESC_F_WRITE; + vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem + + sizeof(struct virtio_net_ctrl_hdr); + vq->vq_split.ring.desc[i].len = sizeof(ctrl->status); + vq->vq_free_cnt--; + + vq->vq_desc_head_idx = vq->vq_split.ring.desc[i].next; + + vq_update_avail_ring(vq, head); + vq_update_avail_idx(vq); + + PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index); + + virtqueue_notify(vq); + + rte_rmb(); + while (VIRTQUEUE_NUSED(vq) == 0) { + rte_rmb(); + usleep(100); + } + + while (VIRTQUEUE_NUSED(vq)) { + uint32_t idx, desc_idx, used_idx; + struct vring_used_elem *uep; + + used_idx = (uint32_t)(vq->vq_used_cons_idx + & (vq->vq_nentries - 1)); + uep = &vq->vq_split.ring.used->ring[used_idx]; + idx = (uint32_t)uep->id; + desc_idx = idx; + + while (vq->vq_split.ring.desc[desc_idx].flags & + VRING_DESC_F_NEXT) { + desc_idx = vq->vq_split.ring.desc[desc_idx].next; + vq->vq_free_cnt++; + } + + vq->vq_split.ring.desc[desc_idx].next = vq->vq_desc_head_idx; + vq->vq_desc_head_idx = idx; + + vq->vq_used_cons_idx++; + vq->vq_free_cnt++; + } + + PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d", + vq->vq_free_cnt, vq->vq_desc_head_idx); + + result = cvq->virtio_net_hdr_mz->addr; + return result; +} + +int +virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl, + int *dlen, int pkt_num) +{ + virtio_net_ctrl_ack status = ~0; + struct virtio_pmd_ctrl *result; + struct virtqueue *vq; + + ctrl->status = status; + + if (!cvq || !cvq->vq) { + PMD_INIT_LOG(ERR, "Control queue is not supported."); + return -1; + } + + rte_spinlock_lock(&cvq->lock); + vq = cvq->vq; + + PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, " + "vq->hw->cvq = %p vq = %p", + vq->vq_desc_head_idx, status, vq->hw->cvq, vq); + + if (vq->vq_free_cnt < pkt_num + 2 || pkt_num < 1) { + rte_spinlock_unlock(&cvq->lock); + return -1; + } + + memcpy(cvq->virtio_net_hdr_mz->addr, ctrl, + sizeof(struct virtio_pmd_ctrl)); + + if (vtpci_packed_queue(vq->hw)) + result = virtio_send_command_packed(cvq, ctrl, dlen, pkt_num); + else + result = virtio_send_command_split(cvq, ctrl, dlen, pkt_num); + + rte_spinlock_unlock(&cvq->lock); + return result->status; +} + +void +virtio_init_vring(struct virtqueue *vq) +{ + int size = vq->vq_nentries; + uint8_t *ring_mem = vq->vq_ring_virt_mem; + + PMD_INIT_FUNC_TRACE(); + + memset(ring_mem, 0, vq->vq_ring_size); + + vq->vq_used_cons_idx = 0; + vq->vq_desc_head_idx = 0; + vq->vq_avail_idx = 0; + vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1); + vq->vq_free_cnt = vq->vq_nentries; + if (vq->vq_descx) + memset(vq->vq_descx, 0, + sizeof(struct vq_desc_extra) * vq->vq_nentries); + if (vtpci_packed_queue(vq->hw)) { + vring_init_packed(&vq->vq_packed.ring, ring_mem, + VIRTIO_PCI_VRING_ALIGN, size); + vring_desc_init_packed(vq, size); + } else { + struct vring *vr = &vq->vq_split.ring; + + vring_init_split(vr, ring_mem, VIRTIO_PCI_VRING_ALIGN, size); + vring_desc_init_split(vr->desc, size); + } + /* + * Disable device(host) interrupting guest + */ + virtqueue_disable_intr(vq); +} diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index c6dd4a347..4d8d069c7 100644 --- a/drivers/net/virtio/virtqueue.h +++ b/drivers/net/virtio/virtqueue.h @@ -480,6 +480,11 @@ virtqueue_notify(struct virtqueue *vq) VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq); } +int virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl, + int *dlen, int pkt_num); + +void virtio_init_vring(struct virtqueue *vq); + #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP #define VIRTQUEUE_DUMP(vq) do { \ uint16_t used_idx, nused; \ -- 2.21.0
The Virtio PCI susbsytem IDs need to be specified to prevent it to probe IFC vDPA VFs. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_pci.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index a38cb45ad..56f89a454 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -19,6 +19,7 @@ struct virtnet_ctl; #define VIRTIO_PCI_VENDORID 0x1AF4 #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000 #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041 +#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100 /* VirtIO ABI version, this must match exactly. */ #define VIRTIO_PCI_ABI_VERSION 0 -- 2.21.0
SAving the notify bar ID in the virtio HW struct will be used by the virtio-vdpa driver in its .get_notify_area() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_pci.c | 6 ++++-- drivers/net/virtio/virtio_pci.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c index 4468e89cb..762847d70 100644 --- a/drivers/net/virtio/virtio_pci.c +++ b/drivers/net/virtio/virtio_pci.c @@ -617,12 +617,14 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) ret = rte_pci_read_config(dev, &hw->notify_off_multiplier, 4, pos + sizeof(cap)); - if (ret != 4) + if (ret != 4) { PMD_INIT_LOG(DEBUG, "failed to read notify_off_multiplier, ret %d", ret); - else + } else { hw->notify_base = get_cfg_addr(dev, &cap); + hw->notify_region = cap.bar; + } break; case VIRTIO_PCI_CAP_DEVICE_CFG: hw->dev_cfg = get_cfg_addr(dev, &cap); diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 56f89a454..afdcc906f 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -256,6 +256,7 @@ struct virtio_hw { uint32_t notify_off_multiplier; uint8_t *isr; uint16_t *notify_base; + uint8_t notify_region; struct virtio_pci_common_cfg *common_cfg; struct virtio_net_config *dev_cfg; void *virtio_user_dev; -- 2.21.0
This patch adds the base for the Virtio vDPA driver. This driver can be used either for development purpose, when probed with a para-virtualized Virtio device from a guest, or it can be used with real HW supporting full Virtio offload (both data and control paths). Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- config/common_linux | 1 + drivers/net/virtio/Makefile | 4 + drivers/net/virtio/meson.build | 3 +- drivers/net/virtio/virtio_vdpa.c | 301 +++++++++++++++++++++++++++++++ 4 files changed, 308 insertions(+), 1 deletion(-) create mode 100644 drivers/net/virtio/virtio_vdpa.c diff --git a/config/common_linux b/config/common_linux index 6e252553a..293a8ff48 100644 --- a/config/common_linux +++ b/config/common_linux @@ -27,6 +27,7 @@ CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y +CONFIG_RTE_VIRTIO_VDPA=y CONFIG_RTE_PROC_INFO=y CONFIG_RTE_LIBRTE_VMBUS=y diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index 6c2c9967b..0760074ad 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -45,4 +45,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c endif +ifeq ($(CONFIG_RTE_VIRTIO_VDPA),y) +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_vdpa.c +endif + include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build index 794905401..f5f1f6e68 100644 --- a/drivers/net/virtio/meson.build +++ b/drivers/net/virtio/meson.build @@ -6,8 +6,9 @@ sources += files('virtio_ethdev.c', 'virtio_pci.c', 'virtio_rxtx.c', 'virtio_rxtx_simple.c', + 'virtio_vdpa.c', 'virtqueue.c') -deps += ['kvargs', 'bus_pci'] +deps += ['kvargs', 'bus_pci', 'vhost'] if arch_subdir == 'x86' sources += files('virtio_rxtx_simple_sse.c') diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c new file mode 100644 index 000000000..bbb417b94 --- /dev/null +++ b/drivers/net/virtio/virtio_vdpa.c @@ -0,0 +1,301 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2019 Red Hat, Inc. + */ + +#include <unistd.h> +#include <sys/ioctl.h> + +#include <rte_kvargs.h> +#include <rte_malloc.h> +#include <rte_vdpa.h> +#include <rte_vfio.h> +#include <rte_vhost.h> + +#include "virtio_pci.h" +#include "virtqueue.h" + +#ifndef PAGE_SIZE +#define PAGE_SIZE 4096 +#endif + +#define DRV_LOG(level, fmt, args...) \ + rte_log(RTE_LOG_ ## level, virtio_vdpa_logtype, \ + "VIRTIO_VDPA %s(): " fmt "\n", __func__, ##args) + +#define VIRTIO_VDPA_MODE "vdpa" + +static const char * const virtio_vdpa_valid_arguments[] = { + VIRTIO_VDPA_MODE, + NULL +}; + +static int virtio_vdpa_logtype; + +struct virtio_vdpa_device { + struct rte_vdpa_dev_addr dev_addr; + struct rte_pci_device *pdev; + struct virtio_hw hw; + int vfio_container_fd; + int vfio_group_fd; + int vfio_dev_fd; + int iommu_group_num; + int vid; + int did; + uint16_t max_queue_pairs; + bool has_ctrl_vq; + struct virtqueue *vqs; + struct virtqueue *cvq; + rte_spinlock_t lock; +}; + +struct internal_list { + TAILQ_ENTRY(internal_list) next; + struct virtio_vdpa_device *dev; +}; + +TAILQ_HEAD(internal_list_head, internal_list); +static struct internal_list_head internal_list = + TAILQ_HEAD_INITIALIZER(internal_list); + +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER; + +static struct internal_list * +find_internal_resource_by_dev(struct rte_pci_device *pdev) +{ + int found = 0; + struct internal_list *list; + + pthread_mutex_lock(&internal_list_lock); + + TAILQ_FOREACH(list, &internal_list, next) { + if (pdev == list->dev->pdev) { + found = 1; + break; + } + } + + pthread_mutex_unlock(&internal_list_lock); + + if (!found) + return NULL; + + return list; +} + +static int +virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) +{ + struct rte_pci_device *pdev = dev->pdev; + char devname[RTE_DEV_NAME_MAX_LEN] = {0}; + int iommu_group_num; + + dev->vfio_dev_fd = -1; + dev->vfio_group_fd = -1; + dev->vfio_container_fd = -1; + dev->iommu_group_num = -1; + + rte_pci_device_name(&pdev->addr, devname, RTE_DEV_NAME_MAX_LEN); + rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname, + &iommu_group_num); + + dev->vfio_container_fd = rte_vfio_container_create(); + if (dev->vfio_container_fd < 0) + return -1; + + dev->vfio_group_fd = + rte_vfio_container_group_bind(dev->vfio_container_fd, + iommu_group_num); + if (dev->vfio_group_fd < 0) + goto err_container_destroy; + + if (rte_pci_map_device(pdev)) + goto err_container_unbind; + + dev->vfio_dev_fd = pdev->intr_handle.vfio_dev_fd; + dev->iommu_group_num = iommu_group_num; + + return 0; + +err_container_unbind: + rte_vfio_container_group_unbind(dev->vfio_container_fd, + iommu_group_num); +err_container_destroy: + rte_vfio_container_destroy(dev->vfio_container_fd); + + dev->vfio_dev_fd = -1; + dev->vfio_group_fd = -1; + dev->vfio_container_fd = -1; + dev->iommu_group_num = -1; + + return -1; +} + +static struct rte_vdpa_dev_ops virtio_vdpa_ops = { +}; + +static inline int +open_int(const char *key __rte_unused, const char *value, void *extra_args) +{ + uint16_t *n = extra_args; + + if (value == NULL || extra_args == NULL) + return -EINVAL; + + *n = (uint16_t)strtoul(value, NULL, 0); + if (*n == USHRT_MAX && errno == ERANGE) + return -1; + + return 0; +} + +static int +virtio_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, + struct rte_pci_device *pci_dev) +{ + struct virtio_vdpa_device *dev; + struct internal_list *list = NULL; + struct rte_kvargs *kvlist = NULL; + int ret, vdpa_mode = 0; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return 0; + + if (!pci_dev->device.devargs) + return -1; + + kvlist = rte_kvargs_parse(pci_dev->device.devargs->args, + virtio_vdpa_valid_arguments); + if (kvlist == NULL) + return -1; + + /* probe only when vdpa mode is specified */ + if (rte_kvargs_count(kvlist, VIRTIO_VDPA_MODE) == 0) + goto err_free_kvargs; + + ret = rte_kvargs_process(kvlist, VIRTIO_VDPA_MODE, &open_int, + &vdpa_mode); + if (ret < 0 || vdpa_mode == 0) + goto err_free_kvargs; + + list = rte_zmalloc("virtio_vdpa", sizeof(*list), 0); + if (list == NULL) + goto err_free_kvargs; + + dev = rte_zmalloc("virtio_vdpa", sizeof(*dev), 0); + if (!dev) + goto err_free_list; + + dev->pdev = pci_dev; + rte_spinlock_init(&dev->lock); + + if (virtio_vdpa_vfio_setup(dev) < 0) { + DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); + goto err_free_vvdev; + } + + dev->dev_addr.pci_addr = pci_dev->addr; + dev->dev_addr.type = PCI_ADDR; + dev->max_queue_pairs = 1; + list->dev = dev; + + if (vtpci_init(pci_dev, &dev->hw)) + goto err_free_vfio; + + dev->did = rte_vdpa_register_device(&dev->dev_addr, + &virtio_vdpa_ops); + + if (dev->did < 0) { + DRV_LOG(ERR, "failed to register device %s", pci_dev->name); + goto err_free_vfio; + } + + pthread_mutex_lock(&internal_list_lock); + TAILQ_INSERT_TAIL(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + + rte_kvargs_free(kvlist); + + return 0; + +err_free_vfio: + rte_vfio_container_destroy(dev->vfio_container_fd); +err_free_vvdev: + rte_free(dev); +err_free_list: + rte_free(list); +err_free_kvargs: + rte_kvargs_free(kvlist); + + return 1; +} + +static int +virtio_vdpa_pci_remove(struct rte_pci_device *pci_dev) +{ + struct virtio_vdpa_device *dev; + struct internal_list *list; + + if (rte_eal_process_type() != RTE_PROC_PRIMARY) + return 0; + + list = find_internal_resource_by_dev(pci_dev); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device: %s", pci_dev->name); + return -1; + } + + dev = list->dev; + + rte_vdpa_unregister_device(dev->did); + rte_pci_unmap_device(dev->pdev); + rte_vfio_container_group_unbind(dev->vfio_container_fd, + dev->iommu_group_num); + rte_vfio_container_destroy(dev->vfio_container_fd); + + pthread_mutex_lock(&internal_list_lock); + TAILQ_REMOVE(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + + rte_free(dev->vqs); + rte_free(list); + rte_free(dev); + + return 0; +} + +static const struct rte_pci_id pci_id_virtio_vdpa_map[] = { + { .class_id = RTE_CLASS_ANY_ID, + .vendor_id = VIRTIO_PCI_VENDORID, + .device_id = VIRTIO_PCI_LEGACY_DEVICEID_NET, + .subsystem_vendor_id = VIRTIO_PCI_VENDORID, + .subsystem_device_id = VIRTIO_PCI_SUBSY_DEVICEID_NET, + }, + { .class_id = RTE_CLASS_ANY_ID, + .vendor_id = VIRTIO_PCI_VENDORID, + .device_id = VIRTIO_PCI_MODERN_DEVICEID_NET, + .subsystem_vendor_id = VIRTIO_PCI_VENDORID, + .subsystem_device_id = VIRTIO_PCI_SUBSY_DEVICEID_NET, + }, + { .vendor_id = 0, /* sentinel */ + }, +}; + +static struct rte_pci_driver rte_virtio_vdpa = { + .id_table = pci_id_virtio_vdpa_map, + .drv_flags = 0, + .probe = virtio_vdpa_pci_probe, + .remove = virtio_vdpa_pci_remove, +}; + +RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa); +RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map); +RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci"); + +RTE_INIT(virtio_vdpa_init_log) +{ + virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa"); + if (virtio_vdpa_logtype >= 0) + rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE); +} + -- 2.21.0
This patch implements the vDPA .get_queue_num() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 43 ++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index bbb417b94..07ff1e090 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -60,6 +60,29 @@ static struct internal_list_head internal_list = static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER; +static struct internal_list * +find_internal_resource_by_did(int did) +{ + int found = 0; + struct internal_list *list; + + pthread_mutex_lock(&internal_list_lock); + + TAILQ_FOREACH(list, &internal_list, next) { + if (did == list->dev->did) { + found = 1; + break; + } + } + + pthread_mutex_unlock(&internal_list_lock); + + if (!found) + return NULL; + + return list; +} + static struct internal_list * find_internal_resource_by_dev(struct rte_pci_device *pdev) { @@ -131,7 +154,27 @@ virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) return -1; } +static int +virtio_vdpa_get_queue_num(int did, uint32_t *queue_num) +{ + struct internal_list *list; + struct virtio_vdpa_device *dev; + + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + + *queue_num = dev->max_queue_pairs; + + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { + .get_queue_num = virtio_vdpa_get_queue_num, }; static inline int -- 2.21.0
This patch implements the vDPA .get_features() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index 07ff1e090..9e2af8313 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num) return 0; } +static int +virtio_vdpa_get_features(int did, uint64_t *features) +{ + struct internal_list *list; + struct virtio_vdpa_device *dev; + + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + + *features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw); + + if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) { + DRV_LOG(ERR, "Device does not support IOMMU"); + return -1; + } + + if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ)) + dev->has_ctrl_vq = true; + else + dev->has_ctrl_vq = false; + + *features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES); + + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, + .get_features = virtio_vdpa_get_features, }; static inline int -- 2.21.0
This patch implements the vDPA .get_protocol_features() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index 9e2af8313..fc52a8e92 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -204,9 +204,22 @@ virtio_vdpa_get_features(int did, uint64_t *features) return 0; } +#define VDPA_SUPPORTED_PROTOCOL_FEATURES \ + (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \ + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \ + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \ + 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) +static int +virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) +{ + *features = VDPA_SUPPORTED_PROTOCOL_FEATURES; + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, + .get_protocol_features = virtio_vdpa_get_protocol_features, }; static inline int -- 2.21.0
In order to support multi-queue, we need to implement the control path. The problem is that both the Vhost-user master and slave use VAs in their processes address spaces as IOVAs, which creates collusions between the data rings IOVAs managed the master, and the Control ring IOVAs. The trick here is to remmap the Control ring memory to another range, after the slave is aware of master's ranges. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index fc52a8e92..13b4dd07d 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev) return list; } +static int +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map, + uint64_t iova) +{ + const struct rte_memzone *mz; + int ret; + + /* + * IOVAs are processes VAs. We cannot use them as the Data and Control + * paths are run in different processes, which may (does) lead to + * collusions. The trick here is to fixup Ctrl path IOVAs so that they + * start after the Data path ranges. + */ + if (do_map) { + mz = dev->cvq->cq.mz; + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, + (uint64_t)(uintptr_t)mz->addr, + iova, mz->len); + if (ret < 0) { + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); + return ret; + } + + dev->cvq->vq_ring_mem = iova; + iova += mz->len; + + mz = dev->cvq->cq.virtio_net_hdr_mz; + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, + (uint64_t)(uintptr_t)mz->addr, + iova, mz->len); + if (ret < 0) { + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); + return ret; + } + + dev->cvq->cq.virtio_net_hdr_mem = iova; + } else { + mz = dev->cvq->cq.mz; + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, + (uint64_t)(uintptr_t)mz->addr, + iova, mz->len); + if (ret < 0) { + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); + return ret; + } + + dev->cvq->vq_ring_mem = 0; + iova += mz->len; + + mz = dev->cvq->cq.virtio_net_hdr_mz; + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, + (uint64_t)(uintptr_t)mz->addr, + iova, mz->len); + if (ret < 0) { + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); + return ret; + } + + dev->cvq->cq.virtio_net_hdr_mem = 0; + } + + return 0; +} + +static int +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) +{ + uint32_t i; + int ret; + struct rte_vhost_memory *mem = NULL; + int vfio_container_fd; + uint64_t avail_iova = 0; + + ret = rte_vhost_get_mem_table(dev->vid, &mem); + if (ret < 0 || !mem) { + DRV_LOG(ERR, "failed to get VM memory layout."); + return ret; + } + + vfio_container_fd = dev->vfio_container_fd; + + for (i = 0; i < mem->nregions; i++) { + struct rte_vhost_mem_region *reg; + + reg = &mem->regions[i]; + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", + do_map ? "DMA map" : "DMA unmap", i, + reg->host_user_addr, reg->guest_phys_addr, reg->size); + + if (reg->guest_phys_addr + reg->size > avail_iova) + avail_iova = reg->guest_phys_addr + reg->size; + + if (do_map) { + ret = rte_vfio_container_dma_map(vfio_container_fd, + reg->host_user_addr, reg->guest_phys_addr, + reg->size); + if (ret < 0) { + DRV_LOG(ERR, "DMA map failed."); + goto exit; + } + } else { + ret = rte_vfio_container_dma_unmap(vfio_container_fd, + reg->host_user_addr, reg->guest_phys_addr, + reg->size); + if (ret < 0) { + DRV_LOG(ERR, "DMA unmap failed."); + goto exit; + } + } + } + + if (dev->cvq) + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova); + +exit: + free(mem); + + return ret; +} + static int virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) { @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) return 0; } +static uint64_t +hva_to_gpa(int vid, uint64_t hva) +{ + struct rte_vhost_memory *mem = NULL; + struct rte_vhost_mem_region *reg; + uint32_t i; + uint64_t gpa = 0; + + if (rte_vhost_get_mem_table(vid, &mem) < 0) + goto exit; + + for (i = 0; i < mem->nregions; i++) { + reg = &mem->regions[i]; + + if (hva >= reg->host_user_addr && + hva < reg->host_user_addr + reg->size) { + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; + break; + } + } + +exit: + if (mem) + free(mem); + return gpa; +} + +static int +virtio_vdpa_start(struct virtio_vdpa_device *dev) +{ + struct virtio_hw *hw = &dev->hw; + int i, vid, nr_vring, ret; + struct rte_vhost_vring vr; + struct virtio_pmd_ctrl ctrl; + int dlen[1]; + + vid = dev->vid; + nr_vring = rte_vhost_get_vring_num(vid); + + if (dev->vqs) + rte_free(dev->vqs); + + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0); + + for (i = 0; i < nr_vring; i++) { + struct virtqueue *vq = &dev->vqs[i]; + + rte_vhost_get_vhost_vring(vid, i, &vr); + + vq->vq_queue_index = i; + vq->vq_nentries = vr.size; + vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc); + if (vq->vq_ring_mem == 0) { + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); + ret = -1; + goto out_free_vqs; + } + + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); + if (ret) { + DRV_LOG(ERR, "Fail to setup queue."); + goto out_free_vqs; + } + } + + if (dev->cvq) { + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); + if (ret) { + DRV_LOG(ERR, "Fail to setup ctrl queue."); + goto out_free_vqs; + } + } + + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); + + if (!dev->cvq) + return 0; + + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); + + dlen[0] = sizeof(uint16_t); + + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); + if (ret) { + DRV_LOG(ERR, "Multiqueue configured but send command " + "failed, this is too late now..."); + ret = -EINVAL; + goto out_free_vqs; + } + + return 0; +out_free_vqs: + rte_free(dev->vqs); + + return ret; +} + +static int +virtio_vdpa_dev_config(int vid) +{ + int did, ret; + struct internal_list *list; + struct virtio_vdpa_device *dev; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + dev->vid = vid; + + rte_spinlock_lock(&dev->lock); + + ret = virtio_vdpa_dma_map(dev, 1); + if (ret) + goto out_unlock; + + ret = virtio_vdpa_start(dev); + + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); + +out_unlock: + rte_spinlock_unlock(&dev->lock); + + return ret; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, .get_protocol_features = virtio_vdpa_get_protocol_features, + .dev_conf = virtio_vdpa_dev_config, }; static inline int -- 2.21.0
This patch implements the vDPA .dev_close() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index 13b4dd07d..691844906 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev) return ret; } +static void +virtio_vdpa_stop(struct virtio_vdpa_device *dev) +{ + struct virtio_hw *hw = &dev->hw; + uint32_t i, nr_vring; + int vid = dev->vid; + struct rte_vhost_vring vr; + uint16_t last_used_idx, last_avail_idx; + + nr_vring = rte_vhost_get_vring_num(vid); + + vtpci_reset(hw); + + for (i = 0; i < nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vr); + + last_used_idx = vr.used->idx; + last_avail_idx = vr.avail->idx; + + rte_vhost_set_vring_base(vid, i, last_avail_idx, + last_used_idx); + } + + rte_free(dev->vqs); + dev->vqs = NULL; +} + static int virtio_vdpa_dev_config(int vid) { @@ -470,11 +497,36 @@ virtio_vdpa_dev_config(int vid) return ret; } +static int +virtio_vdpa_dev_close(int vid) +{ + int did; + struct internal_list *list; + struct virtio_vdpa_device *dev; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + + rte_spinlock_lock(&dev->lock); + virtio_vdpa_stop(dev); + virtio_vdpa_dma_map(dev, 0); + rte_spinlock_unlock(&dev->lock); + + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, .get_protocol_features = virtio_vdpa_get_protocol_features, .dev_conf = virtio_vdpa_dev_config, + .dev_close = virtio_vdpa_dev_close, }; static inline int -- 2.21.0
On top of that, it allocates the control virtqueue if multiqueue has been negotiated. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 155 +++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index 691844906..9b61688a1 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -521,12 +521,167 @@ virtio_vdpa_dev_close(int vid) return 0; } + +static void +virtio_vdpa_free_ctrl_vq(struct virtio_vdpa_device *dev) +{ + if (!dev->cvq) + return; + + rte_memzone_free(dev->cvq->cq.virtio_net_hdr_mz); + rte_memzone_free(dev->cvq->cq.mz); + rte_free(dev->cvq); + + dev->hw.cvq = NULL; +} + +static int +virtio_vdpa_allocate_ctrl_vq(struct virtio_vdpa_device *dev) +{ + struct virtio_hw *hw = &dev->hw; + char vq_name[VIRTQUEUE_MAX_NAME_SZ]; + char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ]; + int numa_node = dev->pdev->device.numa_node; + const struct rte_memzone *mz = NULL, *hdr_mz = NULL; + uint16_t ctrl_queue_idx = dev->max_queue_pairs * 2; + uint16_t ctrl_queue_sz; + int size, ret; + + if (dev->cvq) + virtio_vdpa_free_ctrl_vq(dev); + + ctrl_queue_sz = VTPCI_OPS(hw)->get_queue_num(hw, ctrl_queue_idx); + if (ctrl_queue_sz == 0) { + DRV_LOG(ERR, "Ctrl VQ does not exist"); + return -EINVAL; + } + + dev->cvq = rte_zmalloc_socket(vq_name, sizeof(*dev->cvq), + RTE_CACHE_LINE_SIZE, numa_node); + if (!dev->cvq) + return -ENOMEM; + + dev->cvq->hw = &dev->hw; + dev->cvq->vq_queue_index = ctrl_queue_idx; + dev->cvq->vq_nentries = ctrl_queue_sz; + + if (vtpci_packed_queue(hw)) { + dev->cvq->vq_packed.used_wrap_counter = 1; + dev->cvq->vq_packed.cached_flags = VRING_PACKED_DESC_F_AVAIL; + dev->cvq->vq_packed.event_flags_shadow = 0; + } + + size = vring_size(hw, ctrl_queue_sz, VIRTIO_PCI_VRING_ALIGN); + dev->cvq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN); + + snprintf(vq_name, sizeof(vq_name), "vdpa_ctrlvq%d", + dev->did); + + mz = rte_memzone_reserve_aligned(vq_name, dev->cvq->vq_ring_size, + numa_node, RTE_MEMZONE_IOVA_CONTIG, + VIRTIO_PCI_VRING_ALIGN); + if (mz == NULL) { + if (rte_errno == EEXIST) + mz = rte_memzone_lookup(vq_name); + if (mz == NULL) { + ret = -ENOMEM; + goto out_free_cvq; + } + } + + memset(mz->addr, 0, mz->len); + dev->cvq->vq_ring_virt_mem = mz->addr; + + virtio_init_vring(dev->cvq); + + snprintf(vq_hdr_name, sizeof(vq_hdr_name), "vdpa_ctrlvq%d_hdr", + dev->did); + + hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, PAGE_SIZE * 2, + numa_node, RTE_MEMZONE_IOVA_CONTIG, + VIRTIO_PCI_VRING_ALIGN); + if (hdr_mz == NULL) { + if (rte_errno == EEXIST) + hdr_mz = rte_memzone_lookup(vq_hdr_name); + if (hdr_mz == NULL) { + ret = -ENOMEM; + goto out_free_mz; + } + } + + memset(hdr_mz->addr, 0, hdr_mz->len); + + dev->cvq->cq.vq = dev->cvq; + dev->cvq->cq.mz = mz; + dev->cvq->cq.virtio_net_hdr_mz = hdr_mz; + dev->hw.cvq = &dev->cvq->cq; + + return 0; + +out_free_mz: + rte_memzone_free(mz); +out_free_cvq: + rte_free(dev->cvq); + dev->cvq = NULL; + + return ret; +} + +static int +virtio_vdpa_set_features(int vid) +{ + uint64_t features; + int did, ret; + struct internal_list *list; + struct virtio_vdpa_device *dev; + struct virtio_hw *hw; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + hw = &dev->hw; + rte_vhost_get_negotiated_features(vid, &features); + + features |= (1ULL << VIRTIO_F_IOMMU_PLATFORM); + if (dev->has_ctrl_vq) + features |= (1ULL << VIRTIO_NET_F_CTRL_VQ); + + VTPCI_OPS(&dev->hw)->set_features(&dev->hw, features); + hw->guest_features = features; + + if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) { + if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) { + vtpci_read_dev_config(hw, + offsetof(struct virtio_net_config, + max_virtqueue_pairs), + &dev->max_queue_pairs, + sizeof(dev->max_queue_pairs)); + } else { + dev->max_queue_pairs = 1; + } + + ret = virtio_vdpa_allocate_ctrl_vq(dev); + if (ret) { + DRV_LOG(ERR, "Failed to allocate ctrl vq"); + return ret; + } + } + + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, .get_protocol_features = virtio_vdpa_get_protocol_features, .dev_conf = virtio_vdpa_dev_config, .dev_close = virtio_vdpa_dev_close, + .set_features = virtio_vdpa_set_features, }; static inline int -- 2.21.0
This patch implements the vDPA .get_vfio_group_fd() and .get_vfio_device_fd() callbacks. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 34 ++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index 9b61688a1..e0b2f99ba 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -675,6 +675,38 @@ virtio_vdpa_set_features(int vid) return 0; } +static int +virtio_vdpa_get_vfio_group_fd(int vid) +{ + int did; + struct internal_list *list; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + return list->dev->vfio_group_fd; +} + +static int +virtio_vdpa_get_vfio_device_fd(int vid) +{ + int did; + struct internal_list *list; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + return list->dev->vfio_dev_fd; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, @@ -682,6 +714,8 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .dev_conf = virtio_vdpa_dev_config, .dev_close = virtio_vdpa_dev_close, .set_features = virtio_vdpa_set_features, + .get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd, + .get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd, }; static inline int -- 2.21.0
This patch implements the vDPA .get_notify_area() callback. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c index e0b2f99ba..e681f527a 100644 --- a/drivers/net/virtio/virtio_vdpa.c +++ b/drivers/net/virtio/virtio_vdpa.c @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid) return list->dev->vfio_dev_fd; } +static int +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size) +{ + int did; + struct internal_list *list; + struct virtio_vdpa_device *dev; + struct vfio_region_info reg = { .argsz = sizeof(reg) }; + int ret; + + did = rte_vhost_get_vdpa_device_id(vid); + list = find_internal_resource_by_did(did); + if (list == NULL) { + DRV_LOG(ERR, "Invalid device id: %d", did); + return -1; + } + + dev = list->dev; + + reg.index = dev->hw.notify_region; + ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, ®); + if (ret) { + DRV_LOG(ERR, "Get not get device region info: %s", + strerror(errno)); + return -1; + } + + *offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset; + *size = 0x1000; + + return 0; +} + static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .get_queue_num = virtio_vdpa_get_queue_num, .get_features = virtio_vdpa_get_features, @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = { .set_features = virtio_vdpa_set_features, .get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd, .get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd, + .get_notify_area = virtio_vdpa_get_notify_area, }; static inline int -- 2.21.0
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- doc/guides/nics/index.rst | 1 + doc/guides/nics/virtio_vdpa.rst | 45 +++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) create mode 100644 doc/guides/nics/virtio_vdpa.rst diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 9fec02f3e..12aee63d7 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -55,6 +55,7 @@ Network Interface Controller Drivers thunderx vdev_netvsc virtio + virtio_vdpa vhost vmxnet3 pcap_ring diff --git a/doc/guides/nics/virtio_vdpa.rst b/doc/guides/nics/virtio_vdpa.rst new file mode 100644 index 000000000..b708ef77e --- /dev/null +++ b/doc/guides/nics/virtio_vdpa.rst @@ -0,0 +1,45 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2019 Red Hat, Inc. + +Virtio vDPA driver +================== + +The Virtio vDPA driver provides support to either para-virtualized +or fully HW offloaded Virtio-net devices. + +Pre-Installation Configuration +------------------------------ + +Config File Options +~~~~~~~~~~~~~~~~~~~ + +The following option can be modified in the ``config`` file. + +- ``CONFIG_RTE_VIRTIO_VDPA`` (default ``y`` for linux) + +Virtio vDPA Implementation +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To let the Virtio-net device being probed by the Virtio vDPA driver, adding +"vdpa=1" parameter helps to specify that this device is to be used in vDPA +mode, rather than polling mode, virtio pmd will skip when it detects this +message. If no specified, device will not be used as a vDPA device, and it +will be driven by virtio pmd. + +This driver requires the use of VFIO with IOMMU enabled, as a second level +of addresses translation is required. + +Features +-------- + +Features of the Virtio vDPA driver are: + +- Compatibility with virtio 0.95, 1.0 and 1.1. +- Multiqueue support. + +Prerequisites +------------- + +- Platform with IOMMU feature. Virtio device needs address translation + service to Rx/Tx directly with virtio driver in VM or container. + -- 2.21.0
On Thu, Aug 29, 2019 at 09:59:46AM +0200, Maxime Coquelin wrote: > This is preliminary rework for virtio-vdpa driver, in > order to avoid conflicts with Virtio PMD headers. > > Generally, I think it is better not to include kernel > headers in RTE headers, especially in the case of Vhost > and Virtio which just re-use the kernel definitions, > and has no runtime dependencies. > > In order to not break IFC driver build, the vhost kernel > header is now included directly in the driver. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/ifc/ifcvf_vdpa.c | 1 + > lib/librte_vhost/rte_vdpa.h | 1 - > lib/librte_vhost/rte_vhost.h | 9 ++++----- > 3 files changed, 5 insertions(+), 6 deletions(-) Vhost examples need to be updated as well. Regards, Tiwei > > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c > index 8de9ef199..40cb15ca8 100644
> net/virtio: move control path fonctions in virtqueue file s/fonctions/functions/ On Thu, Aug 29, 2019 at 09:59:48AM +0200, Maxime Coquelin wrote: > Virtio-vdpa driver needs to implement the control path, > so move related functions to virtqueue file so that it > can be used by both Virtio PMD and Virtio-vdpa drivers. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_ethdev.c | 252 ---------------------------- > drivers/net/virtio/virtqueue.c | 255 +++++++++++++++++++++++++++++ > drivers/net/virtio/virtqueue.h | 5 + > 3 files changed, 260 insertions(+), 252 deletions(-) Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
On Thu, Aug 29, 2019 at 09:59:49AM +0200, Maxime Coquelin wrote: > The Virtio PCI susbsytem IDs need to be specified to > prevent it to probe IFC vDPA VFs. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_pci.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h > index a38cb45ad..56f89a454 100644 > --- a/drivers/net/virtio/virtio_pci.h > +++ b/drivers/net/virtio/virtio_pci.h > @@ -19,6 +19,7 @@ struct virtnet_ctl; > #define VIRTIO_PCI_VENDORID 0x1AF4 > #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000 > #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041 > +#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100 0x1100 is the subsystem device ID used by QEMU. Maybe naming it VIRTIO_PCI_SUBSYS_DEVICEID_QEMU is better? Regards, Tiwei > > /* VirtIO ABI version, this must match exactly. */ > #define VIRTIO_PCI_ABI_VERSION 0 > -- > 2.21.0 >
On Thu, Aug 29, 2019 at 09:59:50AM +0200, Maxime Coquelin wrote: > SAving the notify bar ID in the virtio HW struct will be s/SAving/Saving/ > used by the virtio-vdpa driver in its .get_notify_area() > callback. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_pci.c | 6 ++++-- > drivers/net/virtio/virtio_pci.h | 1 + > 2 files changed, 5 insertions(+), 2 deletions(-) Reviewed-by: Tiwei Bie <tiwei.bie@intel.com> > > diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c > index 4468e89cb..762847d70 100644 > --- a/drivers/net/virtio/virtio_pci.c > +++ b/drivers/net/virtio/virtio_pci.c > @@ -617,12 +617,14 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) > ret = rte_pci_read_config(dev, > &hw->notify_off_multiplier, > 4, pos + sizeof(cap)); > - if (ret != 4) > + if (ret != 4) { > PMD_INIT_LOG(DEBUG, > "failed to read notify_off_multiplier, ret %d", > ret); > - else > + } else { > hw->notify_base = get_cfg_addr(dev, &cap); > + hw->notify_region = cap.bar; > + } > break; > case VIRTIO_PCI_CAP_DEVICE_CFG: > hw->dev_cfg = get_cfg_addr(dev, &cap); > diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h > index 56f89a454..afdcc906f 100644 > --- a/drivers/net/virtio/virtio_pci.h > +++ b/drivers/net/virtio/virtio_pci.h > @@ -256,6 +256,7 @@ struct virtio_hw { > uint32_t notify_off_multiplier; > uint8_t *isr; > uint16_t *notify_base; > + uint8_t notify_region; > struct virtio_pci_common_cfg *common_cfg; > struct virtio_net_config *dev_cfg; > void *virtio_user_dev; > -- > 2.21.0 >
On Thu, Aug 29, 2019 at 09:59:51AM +0200, Maxime Coquelin wrote: > + > +RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa); > +RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map); > +RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci"); > + > +RTE_INIT(virtio_vdpa_init_log) > +{ > + virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa"); We need to use pmd.net.virtio.init to enable the logs in virtio_pci.c when using virtio vdpa. Maybe naming above logtype e.g. "pmd.net.virtio.vdpa" will be more consistent? Regards, Tiwei > + if (virtio_vdpa_logtype >= 0) > + rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE); > +} > + > -- > 2.21.0 >
On Thu, Aug 29, 2019 at 09:59:52AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_queue_num() callback.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> drivers/net/virtio/virtio_vdpa.c | 43 ++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
On Thu, Aug 29, 2019 at 09:59:53AM +0200, Maxime Coquelin wrote: > This patch implements the vDPA .get_features() callback. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > index 07ff1e090..9e2af8313 100644 > --- a/drivers/net/virtio/virtio_vdpa.c > +++ b/drivers/net/virtio/virtio_vdpa.c > @@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num) > return 0; > } > > +static int > +virtio_vdpa_get_features(int did, uint64_t *features) > +{ > + struct internal_list *list; > + struct virtio_vdpa_device *dev; > + > + list = find_internal_resource_by_did(did); > + if (list == NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + dev = list->dev; > + > + *features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw); > + > + if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) { > + DRV_LOG(ERR, "Device does not support IOMMU"); > + return -1; > + } > + > + if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ)) > + dev->has_ctrl_vq = true; > + else > + dev->has_ctrl_vq = false; > + > + *features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES); We need to drop the VIRTIO_F_IOMMU_PLATFORM bit before reporting features to the upper layer as the vDPA backend doesn't support the vIOMMU yet. Regards, Tiwei > + > + return 0; > +} > + > static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > .get_queue_num = virtio_vdpa_get_queue_num, > + .get_features = virtio_vdpa_get_features, > }; > > static inline int > -- > 2.21.0 >
On Thu, Aug 29, 2019 at 09:59:54AM +0200, Maxime Coquelin wrote: > This patch implements the vDPA .get_protocol_features() > callback. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_vdpa.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > index 9e2af8313..fc52a8e92 100644 > --- a/drivers/net/virtio/virtio_vdpa.c > +++ b/drivers/net/virtio/virtio_vdpa.c > @@ -204,9 +204,22 @@ virtio_vdpa_get_features(int did, uint64_t *features) > return 0; > } > > +#define VDPA_SUPPORTED_PROTOCOL_FEATURES \ > + (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \ > + 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \ > + 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) Looks better to add an empty line here. > +static int > +virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) > +{ > + *features = VDPA_SUPPORTED_PROTOCOL_FEATURES; > + return 0; > +} > + > static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > .get_queue_num = virtio_vdpa_get_queue_num, > .get_features = virtio_vdpa_get_features, > + .get_protocol_features = virtio_vdpa_get_protocol_features, > }; > > static inline int > -- > 2.21.0 > Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
On Thu, Aug 29, 2019 at 09:59:56AM +0200, Maxime Coquelin wrote: > This patch implements the vDPA .dev_close() callback. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > > diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > index 13b4dd07d..691844906 100644 > --- a/drivers/net/virtio/virtio_vdpa.c > +++ b/drivers/net/virtio/virtio_vdpa.c > @@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev) > return ret; > } > > +static void > +virtio_vdpa_stop(struct virtio_vdpa_device *dev) > +{ > + struct virtio_hw *hw = &dev->hw; > + uint32_t i, nr_vring; > + int vid = dev->vid; > + struct rte_vhost_vring vr; > + uint16_t last_used_idx, last_avail_idx; > + > + nr_vring = rte_vhost_get_vring_num(vid); > + > + vtpci_reset(hw); > + > + for (i = 0; i < nr_vring; i++) { > + rte_vhost_get_vhost_vring(vid, i, &vr); > + > + last_used_idx = vr.used->idx; > + last_avail_idx = vr.avail->idx; This only works in split ring. Regards, Tiwei > + > + rte_vhost_set_vring_base(vid, i, last_avail_idx, > + last_used_idx); > + } > + > + rte_free(dev->vqs); > + dev->vqs = NULL; > +} > +
On 08/29, Maxime Coquelin wrote:
>There might not have any VHOST_USER_SET_VRING_CALL requests
>sent once virtio device is ready. When it happens, the vDPA
>device's dev_conf() callback may never be called.
>
>Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call message")
>Cc: stable@dpdk.org
>Cc: xiaolong.ye@intel.com
>
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/vhost_user.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
>diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>index 0b72648a5..b1ea80c52 100644
>--- a/lib/librte_vhost/vhost_user.c
>+++ b/lib/librte_vhost/vhost_user.c
>@@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
> did = dev->vdpa_dev_id;
> vdpa_dev = rte_vdpa_get_device(did);
> if (vdpa_dev && virtio_is_ready(dev) &&
>- !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
>- msg.request.master == VHOST_USER_SET_VRING_CALL) {
>+ !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> if (vdpa_dev->ops->dev_conf)
> vdpa_dev->ops->dev_conf(dev->vid);
> dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
>--
>2.21.0
>
Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
Hi,
> -----Original Message-----
> From: Ye, Xiaolong
> Sent: Monday, September 2, 2019 4:35 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W
> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com;
> stable@dpdk.org
> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is
> ready
>
> On 08/29, Maxime Coquelin wrote:
> >There might not have any VHOST_USER_SET_VRING_CALL requests
> >sent once virtio device is ready. When it happens, the vDPA
> >device's dev_conf() callback may never be called.
> >
> >Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call
> message")
> >Cc: stable@dpdk.org
> >Cc: xiaolong.ye@intel.com
> >
> >Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >---
> > lib/librte_vhost/vhost_user.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >index 0b72648a5..b1ea80c52 100644
> >--- a/lib/librte_vhost/vhost_user.c
> >+++ b/lib/librte_vhost/vhost_user.c
> >@@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
> > did = dev->vdpa_dev_id;
> > vdpa_dev = rte_vdpa_get_device(did);
> > if (vdpa_dev && virtio_is_ready(dev) &&
> >- !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
> >- msg.request.master ==
> VHOST_USER_SET_VRING_CALL) {
> >+ !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
In the early beginning of vhost user messages, there seems to be a VHOST_USER_SET_VRING_CALL with invalid call fd,
not sure if QEMU has any update on this point.
If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf() cannot setup interrupt properly.
I think that's why in our previous implementation, we wait for the real call fd and then call dev_conf().
BRs,
Xiao
On Thu, Aug 29, 2019 at 09:59:58AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_vfio_group_fd() and
> .get_vfio_device_fd() callbacks.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> drivers/net/virtio/virtio_vdpa.c | 34 ++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote: > This patch implements the vDPA .get_notify_area() > callback. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++ > 1 file changed, 33 insertions(+) > > diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > index e0b2f99ba..e681f527a 100644 > --- a/drivers/net/virtio/virtio_vdpa.c > +++ b/drivers/net/virtio/virtio_vdpa.c > @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid) > return list->dev->vfio_dev_fd; > } > > +static int > +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size) > +{ > + int did; > + struct internal_list *list; > + struct virtio_vdpa_device *dev; > + struct vfio_region_info reg = { .argsz = sizeof(reg) }; > + int ret; > + > + did = rte_vhost_get_vdpa_device_id(vid); > + list = find_internal_resource_by_did(did); > + if (list == NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + dev = list->dev; > + > + reg.index = dev->hw.notify_region; > + ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, ®); > + if (ret) { > + DRV_LOG(ERR, "Get not get device region info: %s", > + strerror(errno)); > + return -1; > + } > + > + *offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset; > + *size = 0x1000; It would be better to check whether the size is no less than 0x1000, otherwise it's possible to give guest the access to other registers of the vdpa device. Regards, Tiwei > + > + return 0; > +} > + > static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > .get_queue_num = virtio_vdpa_get_queue_num, > .get_features = virtio_vdpa_get_features, > @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > .set_features = virtio_vdpa_set_features, > .get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd, > .get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd, > + .get_notify_area = virtio_vdpa_get_notify_area, > }; > > static inline int > -- > 2.21.0 >
On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: > In order to support multi-queue, we need to implement the control > path. The problem is that both the Vhost-user master and slave use > VAs in their processes address spaces as IOVAs, which creates > collusions between the data rings IOVAs managed the master, and > the Control ring IOVAs. The trick here is to remmap the Control > ring memory to another range, after the slave is aware of master's > ranges. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++ > 1 file changed, 255 insertions(+) > > diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > index fc52a8e92..13b4dd07d 100644 > --- a/drivers/net/virtio/virtio_vdpa.c > +++ b/drivers/net/virtio/virtio_vdpa.c > @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev) > return list; > } > > +static int > +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map, > + uint64_t iova) > +{ > + const struct rte_memzone *mz; > + int ret; > + > + /* > + * IOVAs are processes VAs. We cannot use them as the Data and Control > + * paths are run in different processes, which may (does) lead to > + * collusions. The trick here is to fixup Ctrl path IOVAs so that they > + * start after the Data path ranges. > + */ > + if (do_map) { > + mz = dev->cvq->cq.mz; > + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > + (uint64_t)(uintptr_t)mz->addr, > + iova, mz->len); > + if (ret < 0) { > + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); > + return ret; > + } > + > + dev->cvq->vq_ring_mem = iova; > + iova += mz->len; > + > + mz = dev->cvq->cq.virtio_net_hdr_mz; > + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > + (uint64_t)(uintptr_t)mz->addr, > + iova, mz->len); > + if (ret < 0) { > + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); > + return ret; > + } This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz via the device which may have potential risks. Regards, Tiwei > + > + dev->cvq->cq.virtio_net_hdr_mem = iova; > + } else { > + mz = dev->cvq->cq.mz; > + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > + (uint64_t)(uintptr_t)mz->addr, > + iova, mz->len); > + if (ret < 0) { > + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); > + return ret; > + } > + > + dev->cvq->vq_ring_mem = 0; > + iova += mz->len; > + > + mz = dev->cvq->cq.virtio_net_hdr_mz; > + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > + (uint64_t)(uintptr_t)mz->addr, > + iova, mz->len); > + if (ret < 0) { > + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); > + return ret; > + } > + > + dev->cvq->cq.virtio_net_hdr_mem = 0; > + } > + > + return 0; > +} > + > +static int > +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) > +{ > + uint32_t i; > + int ret; > + struct rte_vhost_memory *mem = NULL; > + int vfio_container_fd; > + uint64_t avail_iova = 0; > + > + ret = rte_vhost_get_mem_table(dev->vid, &mem); > + if (ret < 0 || !mem) { > + DRV_LOG(ERR, "failed to get VM memory layout."); > + return ret; > + } > + > + vfio_container_fd = dev->vfio_container_fd; > + > + for (i = 0; i < mem->nregions; i++) { > + struct rte_vhost_mem_region *reg; > + > + reg = &mem->regions[i]; > + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " > + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", > + do_map ? "DMA map" : "DMA unmap", i, > + reg->host_user_addr, reg->guest_phys_addr, reg->size); > + > + if (reg->guest_phys_addr + reg->size > avail_iova) > + avail_iova = reg->guest_phys_addr + reg->size; > + > + if (do_map) { > + ret = rte_vfio_container_dma_map(vfio_container_fd, > + reg->host_user_addr, reg->guest_phys_addr, > + reg->size); > + if (ret < 0) { > + DRV_LOG(ERR, "DMA map failed."); > + goto exit; > + } > + } else { > + ret = rte_vfio_container_dma_unmap(vfio_container_fd, > + reg->host_user_addr, reg->guest_phys_addr, > + reg->size); > + if (ret < 0) { > + DRV_LOG(ERR, "DMA unmap failed."); > + goto exit; > + } > + } > + } > + > + if (dev->cvq) > + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova); > + > +exit: > + free(mem); > + > + return ret; > +} > + > static int > virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) > { > @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) > return 0; > } > > +static uint64_t > +hva_to_gpa(int vid, uint64_t hva) > +{ > + struct rte_vhost_memory *mem = NULL; > + struct rte_vhost_mem_region *reg; > + uint32_t i; > + uint64_t gpa = 0; > + > + if (rte_vhost_get_mem_table(vid, &mem) < 0) > + goto exit; > + > + for (i = 0; i < mem->nregions; i++) { > + reg = &mem->regions[i]; > + > + if (hva >= reg->host_user_addr && > + hva < reg->host_user_addr + reg->size) { > + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; > + break; > + } > + } > + > +exit: > + if (mem) > + free(mem); > + return gpa; > +} > + > +static int > +virtio_vdpa_start(struct virtio_vdpa_device *dev) > +{ > + struct virtio_hw *hw = &dev->hw; > + int i, vid, nr_vring, ret; > + struct rte_vhost_vring vr; > + struct virtio_pmd_ctrl ctrl; > + int dlen[1]; > + > + vid = dev->vid; > + nr_vring = rte_vhost_get_vring_num(vid); > + > + if (dev->vqs) > + rte_free(dev->vqs); > + > + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0); > + > + for (i = 0; i < nr_vring; i++) { > + struct virtqueue *vq = &dev->vqs[i]; > + > + rte_vhost_get_vhost_vring(vid, i, &vr); > + > + vq->vq_queue_index = i; > + vq->vq_nentries = vr.size; > + vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc); > + if (vq->vq_ring_mem == 0) { > + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > + ret = -1; > + goto out_free_vqs; > + } > + > + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); > + if (ret) { > + DRV_LOG(ERR, "Fail to setup queue."); > + goto out_free_vqs; > + } > + } > + > + if (dev->cvq) { > + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); > + if (ret) { > + DRV_LOG(ERR, "Fail to setup ctrl queue."); > + goto out_free_vqs; > + } > + } > + > + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); > + > + if (!dev->cvq) > + return 0; > + > + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; > + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; > + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); > + > + dlen[0] = sizeof(uint16_t); > + > + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); > + if (ret) { > + DRV_LOG(ERR, "Multiqueue configured but send command " > + "failed, this is too late now..."); > + ret = -EINVAL; > + goto out_free_vqs; > + } > + > + return 0; > +out_free_vqs: > + rte_free(dev->vqs); > + > + return ret; > +} > + > +static int > +virtio_vdpa_dev_config(int vid) > +{ > + int did, ret; > + struct internal_list *list; > + struct virtio_vdpa_device *dev; > + > + did = rte_vhost_get_vdpa_device_id(vid); > + list = find_internal_resource_by_did(did); > + if (list == NULL) { > + DRV_LOG(ERR, "Invalid device id: %d", did); > + return -1; > + } > + > + dev = list->dev; > + dev->vid = vid; > + > + rte_spinlock_lock(&dev->lock); > + > + ret = virtio_vdpa_dma_map(dev, 1); > + if (ret) > + goto out_unlock; > + > + ret = virtio_vdpa_start(dev); > + > + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) > + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); > + > +out_unlock: > + rte_spinlock_unlock(&dev->lock); > + > + return ret; > +} > + > static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > .get_queue_num = virtio_vdpa_get_queue_num, > .get_features = virtio_vdpa_get_features, > .get_protocol_features = virtio_vdpa_get_protocol_features, > + .dev_conf = virtio_vdpa_dev_config, > }; > > static inline int > -- > 2.21.0 >
On 9/2/19 8:03 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:46AM +0200, Maxime Coquelin wrote: >> This is preliminary rework for virtio-vdpa driver, in >> order to avoid conflicts with Virtio PMD headers. >> >> Generally, I think it is better not to include kernel >> headers in RTE headers, especially in the case of Vhost >> and Virtio which just re-use the kernel definitions, >> and has no runtime dependencies. >> >> In order to not break IFC driver build, the vhost kernel >> header is now included directly in the driver. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/ifc/ifcvf_vdpa.c | 1 + >> lib/librte_vhost/rte_vdpa.h | 1 - >> lib/librte_vhost/rte_vhost.h | 9 ++++----- >> 3 files changed, 5 insertions(+), 6 deletions(-) > > Vhost examples need to be updated as well. Ha, yes, good catch. It will be done in v2. Thanks, Maxime > Regards, > Tiwei > >> >> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c >> index 8de9ef199..40cb15ca8 100644
On 9/2/19 8:14 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:49AM +0200, Maxime Coquelin wrote: >> The Virtio PCI susbsytem IDs need to be specified to >> prevent it to probe IFC vDPA VFs. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/virtio/virtio_pci.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h >> index a38cb45ad..56f89a454 100644 >> --- a/drivers/net/virtio/virtio_pci.h >> +++ b/drivers/net/virtio/virtio_pci.h >> @@ -19,6 +19,7 @@ struct virtnet_ctl; >> #define VIRTIO_PCI_VENDORID 0x1AF4 >> #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000 >> #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041 >> +#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100 > > 0x1100 is the subsystem device ID used by QEMU. > Maybe naming it VIRTIO_PCI_SUBSYS_DEVICEID_QEMU is better? Indeed, will do. Thanks, Maxime > Regards, > Tiwei > >> >> /* VirtIO ABI version, this must match exactly. */ >> #define VIRTIO_PCI_ABI_VERSION 0 >> -- >> 2.21.0 >>
On 9/2/19 8:27 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:51AM +0200, Maxime Coquelin wrote: >> + >> +RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa); >> +RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map); >> +RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci"); >> + >> +RTE_INIT(virtio_vdpa_init_log) >> +{ >> + virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa"); > > We need to use pmd.net.virtio.init to enable the logs in > virtio_pci.c when using virtio vdpa. Maybe naming above > logtype e.g. "pmd.net.virtio.vdpa" will be more consistent? It makes perfect sense, I will do the change. > Regards, > Tiwei > >> + if (virtio_vdpa_logtype >= 0) >> + rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE); >> +} >> + >> -- >> 2.21.0 >>
On 9/2/19 8:43 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:53AM +0200, Maxime Coquelin wrote: >> This patch implements the vDPA .get_features() callback. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++ >> 1 file changed, 32 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c >> index 07ff1e090..9e2af8313 100644 >> --- a/drivers/net/virtio/virtio_vdpa.c >> +++ b/drivers/net/virtio/virtio_vdpa.c >> @@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num) >> return 0; >> } >> >> +static int >> +virtio_vdpa_get_features(int did, uint64_t *features) >> +{ >> + struct internal_list *list; >> + struct virtio_vdpa_device *dev; >> + >> + list = find_internal_resource_by_did(did); >> + if (list == NULL) { >> + DRV_LOG(ERR, "Invalid device id: %d", did); >> + return -1; >> + } >> + >> + dev = list->dev; >> + >> + *features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw); >> + >> + if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) { >> + DRV_LOG(ERR, "Device does not support IOMMU"); >> + return -1; >> + } >> + >> + if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ)) >> + dev->has_ctrl_vq = true; >> + else >> + dev->has_ctrl_vq = false; >> + >> + *features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES); > > We need to drop the VIRTIO_F_IOMMU_PLATFORM bit before > reporting features to the upper layer as the vDPA backend > doesn't support the vIOMMU yet. Indeed, it just work for now because Virtio-user does not advertise IOMMU feature. > Regards, > Tiwei > > >> + >> + return 0; >> +} >> + >> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >> .get_queue_num = virtio_vdpa_get_queue_num, >> + .get_features = virtio_vdpa_get_features, >> }; >> >> static inline int >> -- >> 2.21.0 >>
On 9/2/19 9:07 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:56AM +0200, Maxime Coquelin wrote: >> This patch implements the vDPA .dev_close() callback. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++ >> 1 file changed, 52 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c >> index 13b4dd07d..691844906 100644 >> --- a/drivers/net/virtio/virtio_vdpa.c >> +++ b/drivers/net/virtio/virtio_vdpa.c >> @@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev) >> return ret; >> } >> >> +static void >> +virtio_vdpa_stop(struct virtio_vdpa_device *dev) >> +{ >> + struct virtio_hw *hw = &dev->hw; >> + uint32_t i, nr_vring; >> + int vid = dev->vid; >> + struct rte_vhost_vring vr; >> + uint16_t last_used_idx, last_avail_idx; >> + >> + nr_vring = rte_vhost_get_vring_num(vid); >> + >> + vtpci_reset(hw); >> + >> + for (i = 0; i < nr_vring; i++) { >> + rte_vhost_get_vhost_vring(vid, i, &vr); >> + >> + last_used_idx = vr.used->idx; >> + last_avail_idx = vr.avail->idx; > > This only works in split ring. Right, thanks for spotting it. For the v2, I'll try to fix it and test it with Jason's Qemu series adding packed ring support. > Regards, > Tiwei > >> + >> + rte_vhost_set_vring_base(vid, i, last_avail_idx, >> + last_used_idx); >> + } >> + >> + rte_free(dev->vqs); >> + dev->vqs = NULL; >> +} >> +
On 9/2/19 11:02 AM, Wang, Xiao W wrote: > Hi, > >> -----Original Message----- >> From: Ye, Xiaolong >> Sent: Monday, September 2, 2019 4:35 PM >> To: Maxime Coquelin <maxime.coquelin@redhat.com> >> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong >> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W >> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com; >> stable@dpdk.org >> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is >> ready >> >> On 08/29, Maxime Coquelin wrote: >>> There might not have any VHOST_USER_SET_VRING_CALL requests >>> sent once virtio device is ready. When it happens, the vDPA >>> device's dev_conf() callback may never be called. >>> >>> Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call >> message") >>> Cc: stable@dpdk.org >>> Cc: xiaolong.ye@intel.com >>> >>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >>> --- >>> lib/librte_vhost/vhost_user.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c >>> index 0b72648a5..b1ea80c52 100644 >>> --- a/lib/librte_vhost/vhost_user.c >>> +++ b/lib/librte_vhost/vhost_user.c >>> @@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd) >>> did = dev->vdpa_dev_id; >>> vdpa_dev = rte_vdpa_get_device(did); >>> if (vdpa_dev && virtio_is_ready(dev) && >>> - !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) && >>> - msg.request.master == >> VHOST_USER_SET_VRING_CALL) { >>> + !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) { > > In the early beginning of vhost user messages, there seems to be a VHOST_USER_SET_VRING_CALL with invalid call fd, > not sure if QEMU has any update on this point. > If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf() cannot setup interrupt properly. > I think that's why in our previous implementation, we wait for the real call fd and then call dev_conf(). I think that when we receive the first SET_VRING_CALL request from Qemu, virtio_is_ready() should be false because the rings addresses won't be received yet. Did it fixed a real issue, or was it based on code/logs review? Thanks for the explanations, Maxime > BRs, > Xiao >
On 9/3/19 7:02 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote: >> This patch implements the vDPA .get_notify_area() >> callback. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++ >> 1 file changed, 33 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c >> index e0b2f99ba..e681f527a 100644 >> --- a/drivers/net/virtio/virtio_vdpa.c >> +++ b/drivers/net/virtio/virtio_vdpa.c >> @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid) >> return list->dev->vfio_dev_fd; >> } >> >> +static int >> +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size) >> +{ >> + int did; >> + struct internal_list *list; >> + struct virtio_vdpa_device *dev; >> + struct vfio_region_info reg = { .argsz = sizeof(reg) }; >> + int ret; >> + >> + did = rte_vhost_get_vdpa_device_id(vid); >> + list = find_internal_resource_by_did(did); >> + if (list == NULL) { >> + DRV_LOG(ERR, "Invalid device id: %d", did); >> + return -1; >> + } >> + >> + dev = list->dev; >> + >> + reg.index = dev->hw.notify_region; >> + ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, ®); >> + if (ret) { >> + DRV_LOG(ERR, "Get not get device region info: %s", >> + strerror(errno)); >> + return -1; >> + } >> + >> + *offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset; >> + *size = 0x1000; > > It would be better to check whether the size is no less > than 0x1000, otherwise it's possible to give guest the > access to other registers of the vdpa device. Correct, if offset is not page-aligned, then it would mean one page may be mapped while not needed. I took the ifcvf driver as example and forgot to fix it. (Maybe it should also be fixed in ifcvf driver?) Thanks, Maxime > Regards, > Tiwei > >> + >> + return 0; >> +} >> + >> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >> .get_queue_num = virtio_vdpa_get_queue_num, >> .get_features = virtio_vdpa_get_features, >> @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >> .set_features = virtio_vdpa_set_features, >> .get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd, >> .get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd, >> + .get_notify_area = virtio_vdpa_get_notify_area, >> }; >> >> static inline int >> -- >> 2.21.0 >>
On 9/3/19 7:30 AM, Tiwei Bie wrote: > On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: >> In order to support multi-queue, we need to implement the control >> path. The problem is that both the Vhost-user master and slave use >> VAs in their processes address spaces as IOVAs, which creates >> collusions between the data rings IOVAs managed the master, and >> the Control ring IOVAs. The trick here is to remmap the Control >> ring memory to another range, after the slave is aware of master's >> ranges. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++ >> 1 file changed, 255 insertions(+) >> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c >> index fc52a8e92..13b4dd07d 100644 >> --- a/drivers/net/virtio/virtio_vdpa.c >> +++ b/drivers/net/virtio/virtio_vdpa.c >> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev) >> return list; >> } >> >> +static int >> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map, >> + uint64_t iova) >> +{ >> + const struct rte_memzone *mz; >> + int ret; >> + >> + /* >> + * IOVAs are processes VAs. We cannot use them as the Data and Control >> + * paths are run in different processes, which may (does) lead to >> + * collusions. The trick here is to fixup Ctrl path IOVAs so that they >> + * start after the Data path ranges. >> + */ >> + if (do_map) { >> + mz = dev->cvq->cq.mz; >> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >> + (uint64_t)(uintptr_t)mz->addr, >> + iova, mz->len); >> + if (ret < 0) { >> + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); >> + return ret; >> + } >> + >> + dev->cvq->vq_ring_mem = iova; >> + iova += mz->len; >> + >> + mz = dev->cvq->cq.virtio_net_hdr_mz; >> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >> + (uint64_t)(uintptr_t)mz->addr, >> + iova, mz->len); >> + if (ret < 0) { >> + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); >> + return ret; >> + } > > This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz > via the device which may have potential risks. I get what you mean, but I'm not sure to see how we could avoid that. AFAIU, we need to map the control queue in the device IOMMU, otherwise how could the host (in case of virtual device) or the NIC (in case of Virtio offload), could access the ring? Any thoughts? Thanks, Maxime > Regards, > Tiwei > >> + >> + dev->cvq->cq.virtio_net_hdr_mem = iova; >> + } else { >> + mz = dev->cvq->cq.mz; >> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >> + (uint64_t)(uintptr_t)mz->addr, >> + iova, mz->len); >> + if (ret < 0) { >> + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); >> + return ret; >> + } >> + >> + dev->cvq->vq_ring_mem = 0; >> + iova += mz->len; >> + >> + mz = dev->cvq->cq.virtio_net_hdr_mz; >> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >> + (uint64_t)(uintptr_t)mz->addr, >> + iova, mz->len); >> + if (ret < 0) { >> + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); >> + return ret; >> + } >> + >> + dev->cvq->cq.virtio_net_hdr_mem = 0; >> + } >> + >> + return 0; >> +} >> + >> +static int >> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) >> +{ >> + uint32_t i; >> + int ret; >> + struct rte_vhost_memory *mem = NULL; >> + int vfio_container_fd; >> + uint64_t avail_iova = 0; >> + >> + ret = rte_vhost_get_mem_table(dev->vid, &mem); >> + if (ret < 0 || !mem) { >> + DRV_LOG(ERR, "failed to get VM memory layout."); >> + return ret; >> + } >> + >> + vfio_container_fd = dev->vfio_container_fd; >> + >> + for (i = 0; i < mem->nregions; i++) { >> + struct rte_vhost_mem_region *reg; >> + >> + reg = &mem->regions[i]; >> + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " >> + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", >> + do_map ? "DMA map" : "DMA unmap", i, >> + reg->host_user_addr, reg->guest_phys_addr, reg->size); >> + >> + if (reg->guest_phys_addr + reg->size > avail_iova) >> + avail_iova = reg->guest_phys_addr + reg->size; >> + >> + if (do_map) { >> + ret = rte_vfio_container_dma_map(vfio_container_fd, >> + reg->host_user_addr, reg->guest_phys_addr, >> + reg->size); >> + if (ret < 0) { >> + DRV_LOG(ERR, "DMA map failed."); >> + goto exit; >> + } >> + } else { >> + ret = rte_vfio_container_dma_unmap(vfio_container_fd, >> + reg->host_user_addr, reg->guest_phys_addr, >> + reg->size); >> + if (ret < 0) { >> + DRV_LOG(ERR, "DMA unmap failed."); >> + goto exit; >> + } >> + } >> + } >> + >> + if (dev->cvq) >> + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova); >> + >> +exit: >> + free(mem); >> + >> + return ret; >> +} >> + >> static int >> virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) >> { >> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) >> return 0; >> } >> >> +static uint64_t >> +hva_to_gpa(int vid, uint64_t hva) >> +{ >> + struct rte_vhost_memory *mem = NULL; >> + struct rte_vhost_mem_region *reg; >> + uint32_t i; >> + uint64_t gpa = 0; >> + >> + if (rte_vhost_get_mem_table(vid, &mem) < 0) >> + goto exit; >> + >> + for (i = 0; i < mem->nregions; i++) { >> + reg = &mem->regions[i]; >> + >> + if (hva >= reg->host_user_addr && >> + hva < reg->host_user_addr + reg->size) { >> + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; >> + break; >> + } >> + } >> + >> +exit: >> + if (mem) >> + free(mem); >> + return gpa; >> +} >> + >> +static int >> +virtio_vdpa_start(struct virtio_vdpa_device *dev) >> +{ >> + struct virtio_hw *hw = &dev->hw; >> + int i, vid, nr_vring, ret; >> + struct rte_vhost_vring vr; >> + struct virtio_pmd_ctrl ctrl; >> + int dlen[1]; >> + >> + vid = dev->vid; >> + nr_vring = rte_vhost_get_vring_num(vid); >> + >> + if (dev->vqs) >> + rte_free(dev->vqs); >> + >> + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0); >> + >> + for (i = 0; i < nr_vring; i++) { >> + struct virtqueue *vq = &dev->vqs[i]; >> + >> + rte_vhost_get_vhost_vring(vid, i, &vr); >> + >> + vq->vq_queue_index = i; >> + vq->vq_nentries = vr.size; >> + vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc); >> + if (vq->vq_ring_mem == 0) { >> + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); >> + ret = -1; >> + goto out_free_vqs; >> + } >> + >> + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); >> + if (ret) { >> + DRV_LOG(ERR, "Fail to setup queue."); >> + goto out_free_vqs; >> + } >> + } >> + >> + if (dev->cvq) { >> + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); >> + if (ret) { >> + DRV_LOG(ERR, "Fail to setup ctrl queue."); >> + goto out_free_vqs; >> + } >> + } >> + >> + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); >> + >> + if (!dev->cvq) >> + return 0; >> + >> + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; >> + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; >> + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); >> + >> + dlen[0] = sizeof(uint16_t); >> + >> + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); >> + if (ret) { >> + DRV_LOG(ERR, "Multiqueue configured but send command " >> + "failed, this is too late now..."); >> + ret = -EINVAL; >> + goto out_free_vqs; >> + } >> + >> + return 0; >> +out_free_vqs: >> + rte_free(dev->vqs); >> + >> + return ret; >> +} >> + >> +static int >> +virtio_vdpa_dev_config(int vid) >> +{ >> + int did, ret; >> + struct internal_list *list; >> + struct virtio_vdpa_device *dev; >> + >> + did = rte_vhost_get_vdpa_device_id(vid); >> + list = find_internal_resource_by_did(did); >> + if (list == NULL) { >> + DRV_LOG(ERR, "Invalid device id: %d", did); >> + return -1; >> + } >> + >> + dev = list->dev; >> + dev->vid = vid; >> + >> + rte_spinlock_lock(&dev->lock); >> + >> + ret = virtio_vdpa_dma_map(dev, 1); >> + if (ret) >> + goto out_unlock; >> + >> + ret = virtio_vdpa_start(dev); >> + >> + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) >> + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); >> + >> +out_unlock: >> + rte_spinlock_unlock(&dev->lock); >> + >> + return ret; >> +} >> + >> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >> .get_queue_num = virtio_vdpa_get_queue_num, >> .get_features = virtio_vdpa_get_features, >> .get_protocol_features = virtio_vdpa_get_protocol_features, >> + .dev_conf = virtio_vdpa_dev_config, >> }; >> >> static inline int >> -- >> 2.21.0 >>
On Tue, Sep 03, 2019 at 09:36:54AM +0200, Maxime Coquelin wrote: > On 9/3/19 7:02 AM, Tiwei Bie wrote: > > On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote: > >> This patch implements the vDPA .get_notify_area() > >> callback. > >> > >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > >> --- > >> drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++ > >> 1 file changed, 33 insertions(+) > >> > >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > >> index e0b2f99ba..e681f527a 100644 > >> --- a/drivers/net/virtio/virtio_vdpa.c > >> +++ b/drivers/net/virtio/virtio_vdpa.c > >> @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid) > >> return list->dev->vfio_dev_fd; > >> } > >> > >> +static int > >> +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size) > >> +{ > >> + int did; > >> + struct internal_list *list; > >> + struct virtio_vdpa_device *dev; > >> + struct vfio_region_info reg = { .argsz = sizeof(reg) }; > >> + int ret; > >> + > >> + did = rte_vhost_get_vdpa_device_id(vid); > >> + list = find_internal_resource_by_did(did); > >> + if (list == NULL) { > >> + DRV_LOG(ERR, "Invalid device id: %d", did); > >> + return -1; > >> + } > >> + > >> + dev = list->dev; > >> + > >> + reg.index = dev->hw.notify_region; > >> + ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, ®); > >> + if (ret) { > >> + DRV_LOG(ERR, "Get not get device region info: %s", > >> + strerror(errno)); > >> + return -1; > >> + } > >> + > >> + *offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset; > >> + *size = 0x1000; > > > > It would be better to check whether the size is no less > > than 0x1000, otherwise it's possible to give guest the > > access to other registers of the vdpa device. > > Correct, if offset is not page-aligned, then it would mean one page may > be mapped while not needed. I took the ifcvf driver as example and > forgot to fix it. (Maybe it should also be fixed in ifcvf driver?) The ifcvf hardware will make sure that the notify register will stay in a separate page, so the size is hardcoded in ifcvf driver. Regards, Tiwei > > Thanks, > Maxime > > Regards, > > Tiwei > > > >> + > >> + return 0; > >> +} > >> + > >> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > >> .get_queue_num = virtio_vdpa_get_queue_num, > >> .get_features = virtio_vdpa_get_features, > >> @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > >> .set_features = virtio_vdpa_set_features, > >> .get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd, > >> .get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd, > >> + .get_notify_area = virtio_vdpa_get_notify_area, > >> }; > >> > >> static inline int > >> -- > >> 2.21.0 > >>
On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote: > On 9/3/19 7:30 AM, Tiwei Bie wrote: > > On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: > >> In order to support multi-queue, we need to implement the control > >> path. The problem is that both the Vhost-user master and slave use > >> VAs in their processes address spaces as IOVAs, which creates > >> collusions between the data rings IOVAs managed the master, and > >> the Control ring IOVAs. The trick here is to remmap the Control > >> ring memory to another range, after the slave is aware of master's > >> ranges. > >> > >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > >> --- > >> drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++ > >> 1 file changed, 255 insertions(+) > >> > >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c > >> index fc52a8e92..13b4dd07d 100644 > >> --- a/drivers/net/virtio/virtio_vdpa.c > >> +++ b/drivers/net/virtio/virtio_vdpa.c > >> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev) > >> return list; > >> } > >> > >> +static int > >> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map, > >> + uint64_t iova) > >> +{ > >> + const struct rte_memzone *mz; > >> + int ret; > >> + > >> + /* > >> + * IOVAs are processes VAs. We cannot use them as the Data and Control > >> + * paths are run in different processes, which may (does) lead to > >> + * collusions. The trick here is to fixup Ctrl path IOVAs so that they > >> + * start after the Data path ranges. > >> + */ > >> + if (do_map) { > >> + mz = dev->cvq->cq.mz; > >> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > >> + (uint64_t)(uintptr_t)mz->addr, > >> + iova, mz->len); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); > >> + return ret; > >> + } > >> + > >> + dev->cvq->vq_ring_mem = iova; > >> + iova += mz->len; > >> + > >> + mz = dev->cvq->cq.virtio_net_hdr_mz; > >> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > >> + (uint64_t)(uintptr_t)mz->addr, > >> + iova, mz->len); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); > >> + return ret; > >> + } > > > > This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz > > via the device which may have potential risks. > > I get what you mean, but I'm not sure to see how we could avoid that. > AFAIU, we need to map the control queue in the device IOMMU, otherwise > how could the host (in case of virtual device) or the NIC (in case of > Virtio offload), could access the ring? > Any thoughts? I also don't see a way to avoid that. That's why I said in below thread that I think the control queue based interface seems not a quite good interface for a backend device: https://lkml.org/lkml/2019/9/2/934 In IFCVF NIC, we added a MMIO based interface to replace control queue for the multiqueue setup in vDPA mode. Jason is proposing some changes to make virtio device suitable for backend device. I'm not sure whether it's possible to cover this case as well.. Regards, Tiwei > > Thanks, > Maxime > > Regards, > > Tiwei > > > >> + > >> + dev->cvq->cq.virtio_net_hdr_mem = iova; > >> + } else { > >> + mz = dev->cvq->cq.mz; > >> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > >> + (uint64_t)(uintptr_t)mz->addr, > >> + iova, mz->len); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); > >> + return ret; > >> + } > >> + > >> + dev->cvq->vq_ring_mem = 0; > >> + iova += mz->len; > >> + > >> + mz = dev->cvq->cq.virtio_net_hdr_mz; > >> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > >> + (uint64_t)(uintptr_t)mz->addr, > >> + iova, mz->len); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); > >> + return ret; > >> + } > >> + > >> + dev->cvq->cq.virtio_net_hdr_mem = 0; > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static int > >> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) > >> +{ > >> + uint32_t i; > >> + int ret; > >> + struct rte_vhost_memory *mem = NULL; > >> + int vfio_container_fd; > >> + uint64_t avail_iova = 0; > >> + > >> + ret = rte_vhost_get_mem_table(dev->vid, &mem); > >> + if (ret < 0 || !mem) { > >> + DRV_LOG(ERR, "failed to get VM memory layout."); > >> + return ret; > >> + } > >> + > >> + vfio_container_fd = dev->vfio_container_fd; > >> + > >> + for (i = 0; i < mem->nregions; i++) { > >> + struct rte_vhost_mem_region *reg; > >> + > >> + reg = &mem->regions[i]; > >> + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " > >> + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", > >> + do_map ? "DMA map" : "DMA unmap", i, > >> + reg->host_user_addr, reg->guest_phys_addr, reg->size); > >> + > >> + if (reg->guest_phys_addr + reg->size > avail_iova) > >> + avail_iova = reg->guest_phys_addr + reg->size; > >> + > >> + if (do_map) { > >> + ret = rte_vfio_container_dma_map(vfio_container_fd, > >> + reg->host_user_addr, reg->guest_phys_addr, > >> + reg->size); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "DMA map failed."); > >> + goto exit; > >> + } > >> + } else { > >> + ret = rte_vfio_container_dma_unmap(vfio_container_fd, > >> + reg->host_user_addr, reg->guest_phys_addr, > >> + reg->size); > >> + if (ret < 0) { > >> + DRV_LOG(ERR, "DMA unmap failed."); > >> + goto exit; > >> + } > >> + } > >> + } > >> + > >> + if (dev->cvq) > >> + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova); > >> + > >> +exit: > >> + free(mem); > >> + > >> + return ret; > >> +} > >> + > >> static int > >> virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) > >> { > >> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) > >> return 0; > >> } > >> > >> +static uint64_t > >> +hva_to_gpa(int vid, uint64_t hva) > >> +{ > >> + struct rte_vhost_memory *mem = NULL; > >> + struct rte_vhost_mem_region *reg; > >> + uint32_t i; > >> + uint64_t gpa = 0; > >> + > >> + if (rte_vhost_get_mem_table(vid, &mem) < 0) > >> + goto exit; > >> + > >> + for (i = 0; i < mem->nregions; i++) { > >> + reg = &mem->regions[i]; > >> + > >> + if (hva >= reg->host_user_addr && > >> + hva < reg->host_user_addr + reg->size) { > >> + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; > >> + break; > >> + } > >> + } > >> + > >> +exit: > >> + if (mem) > >> + free(mem); > >> + return gpa; > >> +} > >> + > >> +static int > >> +virtio_vdpa_start(struct virtio_vdpa_device *dev) > >> +{ > >> + struct virtio_hw *hw = &dev->hw; > >> + int i, vid, nr_vring, ret; > >> + struct rte_vhost_vring vr; > >> + struct virtio_pmd_ctrl ctrl; > >> + int dlen[1]; > >> + > >> + vid = dev->vid; > >> + nr_vring = rte_vhost_get_vring_num(vid); > >> + > >> + if (dev->vqs) > >> + rte_free(dev->vqs); > >> + > >> + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0); > >> + > >> + for (i = 0; i < nr_vring; i++) { > >> + struct virtqueue *vq = &dev->vqs[i]; > >> + > >> + rte_vhost_get_vhost_vring(vid, i, &vr); > >> + > >> + vq->vq_queue_index = i; > >> + vq->vq_nentries = vr.size; > >> + vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc); > >> + if (vq->vq_ring_mem == 0) { > >> + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > >> + ret = -1; > >> + goto out_free_vqs; > >> + } > >> + > >> + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); > >> + if (ret) { > >> + DRV_LOG(ERR, "Fail to setup queue."); > >> + goto out_free_vqs; > >> + } > >> + } > >> + > >> + if (dev->cvq) { > >> + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); > >> + if (ret) { > >> + DRV_LOG(ERR, "Fail to setup ctrl queue."); > >> + goto out_free_vqs; > >> + } > >> + } > >> + > >> + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); > >> + > >> + if (!dev->cvq) > >> + return 0; > >> + > >> + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; > >> + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; > >> + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); > >> + > >> + dlen[0] = sizeof(uint16_t); > >> + > >> + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); > >> + if (ret) { > >> + DRV_LOG(ERR, "Multiqueue configured but send command " > >> + "failed, this is too late now..."); > >> + ret = -EINVAL; > >> + goto out_free_vqs; > >> + } > >> + > >> + return 0; > >> +out_free_vqs: > >> + rte_free(dev->vqs); > >> + > >> + return ret; > >> +} > >> + > >> +static int > >> +virtio_vdpa_dev_config(int vid) > >> +{ > >> + int did, ret; > >> + struct internal_list *list; > >> + struct virtio_vdpa_device *dev; > >> + > >> + did = rte_vhost_get_vdpa_device_id(vid); > >> + list = find_internal_resource_by_did(did); > >> + if (list == NULL) { > >> + DRV_LOG(ERR, "Invalid device id: %d", did); > >> + return -1; > >> + } > >> + > >> + dev = list->dev; > >> + dev->vid = vid; > >> + > >> + rte_spinlock_lock(&dev->lock); > >> + > >> + ret = virtio_vdpa_dma_map(dev, 1); > >> + if (ret) > >> + goto out_unlock; > >> + > >> + ret = virtio_vdpa_start(dev); > >> + > >> + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) > >> + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); > >> + > >> +out_unlock: > >> + rte_spinlock_unlock(&dev->lock); > >> + > >> + return ret; > >> +} > >> + > >> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > >> .get_queue_num = virtio_vdpa_get_queue_num, > >> .get_features = virtio_vdpa_get_features, > >> .get_protocol_features = virtio_vdpa_get_protocol_features, > >> + .dev_conf = virtio_vdpa_dev_config, > >> }; > >> > >> static inline int > >> -- > >> 2.21.0 > >>
Hi > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Tuesday, September 3, 2019 3:35 PM > To: Wang, Xiao W <xiao.w.wang@intel.com>; Ye, Xiaolong > <xiaolong.ye@intel.com> > Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong > <zhihong.wang@intel.com>; amorenoz@redhat.com; dev@dpdk.org; > jfreimann@redhat.com; stable@dpdk.org > Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is > ready > > > > On 9/2/19 11:02 AM, Wang, Xiao W wrote: > > Hi, > > > >> -----Original Message----- > >> From: Ye, Xiaolong > >> Sent: Monday, September 2, 2019 4:35 PM > >> To: Maxime Coquelin <maxime.coquelin@redhat.com> > >> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong > >> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W > >> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com; > >> stable@dpdk.org > >> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is > >> ready > >> > >> On 08/29, Maxime Coquelin wrote: > >>> There might not have any VHOST_USER_SET_VRING_CALL requests > >>> sent once virtio device is ready. When it happens, the vDPA > >>> device's dev_conf() callback may never be called. > >>> > >>> Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call > >> message") > >>> Cc: stable@dpdk.org > >>> Cc: xiaolong.ye@intel.com > >>> > >>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > >>> --- > >>> lib/librte_vhost/vhost_user.c | 3 +-- > >>> 1 file changed, 1 insertion(+), 2 deletions(-) > >>> > >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > >>> index 0b72648a5..b1ea80c52 100644 > >>> --- a/lib/librte_vhost/vhost_user.c > >>> +++ b/lib/librte_vhost/vhost_user.c > >>> @@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd) > >>> did = dev->vdpa_dev_id; > >>> vdpa_dev = rte_vdpa_get_device(did); > >>> if (vdpa_dev && virtio_is_ready(dev) && > >>> - !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) && > >>> - msg.request.master == > >> VHOST_USER_SET_VRING_CALL) { > >>> + !(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) { > > > > In the early beginning of vhost user messages, there seems to be a > VHOST_USER_SET_VRING_CALL with invalid call fd, > > not sure if QEMU has any update on this point. > > If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf() > cannot setup interrupt properly. > > I think that's why in our previous implementation, we wait for the real call fd > and then call dev_conf(). > > I think that when we receive the first SET_VRING_CALL request from Qemu, > virtio_is_ready() should be false because the rings addresses won't be > received yet. In the last phase of vhost communication, there're usually a sequence of messages "SET_VRING_KICK" and "SET_VRING_CALL". With this patch, at the arrival of "SET_VRING_KICK", virtio_is_ready() will return true, and the invalid call fd will get involved into dev_con(). I'm not sure if the invalid callfd implementation still exist in latest QEMU, if yes, We have to handle this case. > > Did it fixed a real issue, or was it based on code/logs review? In an old implementation with QEMU 2.6, I met the issue. To verify it, you need to Setup a VM (maybe nested) and verify if virtio can get an interrupt. BRs, Xiao > > Thanks for the explanations, > Maxime > > > BRs, > > Xiao > >
On 2019/9/3 下午4:49, Tiwei Bie wrote: > On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote: >> On 9/3/19 7:30 AM, Tiwei Bie wrote: >>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: >>>> In order to support multi-queue, we need to implement the control >>>> path. The problem is that both the Vhost-user master and slave use >>>> VAs in their processes address spaces as IOVAs, which creates >>>> collusions between the data rings IOVAs managed the master, and >>>> the Control ring IOVAs. The trick here is to remmap the Control >>>> ring memory to another range, after the slave is aware of master's >>>> ranges. >>>> >>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >>>> --- >>>> drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++ >>>> 1 file changed, 255 insertions(+) >>>> >>>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c >>>> index fc52a8e92..13b4dd07d 100644 >>>> --- a/drivers/net/virtio/virtio_vdpa.c >>>> +++ b/drivers/net/virtio/virtio_vdpa.c >>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev) >>>> return list; >>>> } >>>> >>>> +static int >>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map, >>>> + uint64_t iova) >>>> +{ >>>> + const struct rte_memzone *mz; >>>> + int ret; >>>> + >>>> + /* >>>> + * IOVAs are processes VAs. We cannot use them as the Data and Control >>>> + * paths are run in different processes, which may (does) lead to >>>> + * collusions. The trick here is to fixup Ctrl path IOVAs so that they >>>> + * start after the Data path ranges. >>>> + */ >>>> + if (do_map) { >>>> + mz = dev->cvq->cq.mz; >>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >>>> + (uint64_t)(uintptr_t)mz->addr, >>>> + iova, mz->len); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); >>>> + return ret; >>>> + } >>>> + >>>> + dev->cvq->vq_ring_mem = iova; >>>> + iova += mz->len; >>>> + >>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; >>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >>>> + (uint64_t)(uintptr_t)mz->addr, >>>> + iova, mz->len); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); >>>> + return ret; >>>> + } >>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz >>> via the device which may have potential risks. >> I get what you mean, but I'm not sure to see how we could avoid that. >> AFAIU, we need to map the control queue in the device IOMMU, otherwise >> how could the host (in case of virtual device) or the NIC (in case of >> Virtio offload), could access the ring? >> Any thoughts? > I also don't see a way to avoid that. That's why I said in below > thread that I think the control queue based interface seems not a > quite good interface for a backend device: > > https://lkml.org/lkml/2019/9/2/934 > > In IFCVF NIC, we added a MMIO based interface to replace control > queue for the multiqueue setup in vDPA mode. > > Jason is proposing some changes to make virtio device suitable > for backend device. I'm not sure whether it's possible to cover > this case as well.. A silly question, can we do dynamic mapping like what kernel driver did here? Thanks > > Regards, > Tiwei > >> Thanks, >> Maxime >>> Regards, >>> Tiwei >>> >>>> + >>>> + dev->cvq->cq.virtio_net_hdr_mem = iova; >>>> + } else { >>>> + mz = dev->cvq->cq.mz; >>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >>>> + (uint64_t)(uintptr_t)mz->addr, >>>> + iova, mz->len); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); >>>> + return ret; >>>> + } >>>> + >>>> + dev->cvq->vq_ring_mem = 0; >>>> + iova += mz->len; >>>> + >>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; >>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >>>> + (uint64_t)(uintptr_t)mz->addr, >>>> + iova, mz->len); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); >>>> + return ret; >>>> + } >>>> + >>>> + dev->cvq->cq.virtio_net_hdr_mem = 0; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int >>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) >>>> +{ >>>> + uint32_t i; >>>> + int ret; >>>> + struct rte_vhost_memory *mem = NULL; >>>> + int vfio_container_fd; >>>> + uint64_t avail_iova = 0; >>>> + >>>> + ret = rte_vhost_get_mem_table(dev->vid, &mem); >>>> + if (ret < 0 || !mem) { >>>> + DRV_LOG(ERR, "failed to get VM memory layout."); >>>> + return ret; >>>> + } >>>> + >>>> + vfio_container_fd = dev->vfio_container_fd; >>>> + >>>> + for (i = 0; i < mem->nregions; i++) { >>>> + struct rte_vhost_mem_region *reg; >>>> + >>>> + reg = &mem->regions[i]; >>>> + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " >>>> + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", >>>> + do_map ? "DMA map" : "DMA unmap", i, >>>> + reg->host_user_addr, reg->guest_phys_addr, reg->size); >>>> + >>>> + if (reg->guest_phys_addr + reg->size > avail_iova) >>>> + avail_iova = reg->guest_phys_addr + reg->size; >>>> + >>>> + if (do_map) { >>>> + ret = rte_vfio_container_dma_map(vfio_container_fd, >>>> + reg->host_user_addr, reg->guest_phys_addr, >>>> + reg->size); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "DMA map failed."); >>>> + goto exit; >>>> + } >>>> + } else { >>>> + ret = rte_vfio_container_dma_unmap(vfio_container_fd, >>>> + reg->host_user_addr, reg->guest_phys_addr, >>>> + reg->size); >>>> + if (ret < 0) { >>>> + DRV_LOG(ERR, "DMA unmap failed."); >>>> + goto exit; >>>> + } >>>> + } >>>> + } >>>> + >>>> + if (dev->cvq) >>>> + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova); >>>> + >>>> +exit: >>>> + free(mem); >>>> + >>>> + return ret; >>>> +} >>>> + >>>> static int >>>> virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) >>>> { >>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features) >>>> return 0; >>>> } >>>> >>>> +static uint64_t >>>> +hva_to_gpa(int vid, uint64_t hva) >>>> +{ >>>> + struct rte_vhost_memory *mem = NULL; >>>> + struct rte_vhost_mem_region *reg; >>>> + uint32_t i; >>>> + uint64_t gpa = 0; >>>> + >>>> + if (rte_vhost_get_mem_table(vid, &mem) < 0) >>>> + goto exit; >>>> + >>>> + for (i = 0; i < mem->nregions; i++) { >>>> + reg = &mem->regions[i]; >>>> + >>>> + if (hva >= reg->host_user_addr && >>>> + hva < reg->host_user_addr + reg->size) { >>>> + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; >>>> + break; >>>> + } >>>> + } >>>> + >>>> +exit: >>>> + if (mem) >>>> + free(mem); >>>> + return gpa; >>>> +} >>>> + >>>> +static int >>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev) >>>> +{ >>>> + struct virtio_hw *hw = &dev->hw; >>>> + int i, vid, nr_vring, ret; >>>> + struct rte_vhost_vring vr; >>>> + struct virtio_pmd_ctrl ctrl; >>>> + int dlen[1]; >>>> + >>>> + vid = dev->vid; >>>> + nr_vring = rte_vhost_get_vring_num(vid); >>>> + >>>> + if (dev->vqs) >>>> + rte_free(dev->vqs); >>>> + >>>> + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0); >>>> + >>>> + for (i = 0; i < nr_vring; i++) { >>>> + struct virtqueue *vq = &dev->vqs[i]; >>>> + >>>> + rte_vhost_get_vhost_vring(vid, i, &vr); >>>> + >>>> + vq->vq_queue_index = i; >>>> + vq->vq_nentries = vr.size; >>>> + vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc); >>>> + if (vq->vq_ring_mem == 0) { >>>> + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); >>>> + ret = -1; >>>> + goto out_free_vqs; >>>> + } >>>> + >>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); >>>> + if (ret) { >>>> + DRV_LOG(ERR, "Fail to setup queue."); >>>> + goto out_free_vqs; >>>> + } >>>> + } >>>> + >>>> + if (dev->cvq) { >>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); >>>> + if (ret) { >>>> + DRV_LOG(ERR, "Fail to setup ctrl queue."); >>>> + goto out_free_vqs; >>>> + } >>>> + } >>>> + >>>> + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); >>>> + >>>> + if (!dev->cvq) >>>> + return 0; >>>> + >>>> + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; >>>> + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; >>>> + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); >>>> + >>>> + dlen[0] = sizeof(uint16_t); >>>> + >>>> + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); >>>> + if (ret) { >>>> + DRV_LOG(ERR, "Multiqueue configured but send command " >>>> + "failed, this is too late now..."); >>>> + ret = -EINVAL; >>>> + goto out_free_vqs; >>>> + } >>>> + >>>> + return 0; >>>> +out_free_vqs: >>>> + rte_free(dev->vqs); >>>> + >>>> + return ret; >>>> +} >>>> + >>>> +static int >>>> +virtio_vdpa_dev_config(int vid) >>>> +{ >>>> + int did, ret; >>>> + struct internal_list *list; >>>> + struct virtio_vdpa_device *dev; >>>> + >>>> + did = rte_vhost_get_vdpa_device_id(vid); >>>> + list = find_internal_resource_by_did(did); >>>> + if (list == NULL) { >>>> + DRV_LOG(ERR, "Invalid device id: %d", did); >>>> + return -1; >>>> + } >>>> + >>>> + dev = list->dev; >>>> + dev->vid = vid; >>>> + >>>> + rte_spinlock_lock(&dev->lock); >>>> + >>>> + ret = virtio_vdpa_dma_map(dev, 1); >>>> + if (ret) >>>> + goto out_unlock; >>>> + >>>> + ret = virtio_vdpa_start(dev); >>>> + >>>> + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) >>>> + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); >>>> + >>>> +out_unlock: >>>> + rte_spinlock_unlock(&dev->lock); >>>> + >>>> + return ret; >>>> +} >>>> + >>>> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >>>> .get_queue_num = virtio_vdpa_get_queue_num, >>>> .get_features = virtio_vdpa_get_features, >>>> .get_protocol_features = virtio_vdpa_get_protocol_features, >>>> + .dev_conf = virtio_vdpa_dev_config, >>>> }; >>>> >>>> static inline int >>>> -- >>>> 2.21.0 >>>>
On 9/4/19 6:06 AM, Jason Wang wrote: > > On 2019/9/3 下午4:49, Tiwei Bie wrote: >> On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote: >>> On 9/3/19 7:30 AM, Tiwei Bie wrote: >>>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: >>>>> In order to support multi-queue, we need to implement the control >>>>> path. The problem is that both the Vhost-user master and slave use >>>>> VAs in their processes address spaces as IOVAs, which creates >>>>> collusions between the data rings IOVAs managed the master, and >>>>> the Control ring IOVAs. The trick here is to remmap the Control >>>>> ring memory to another range, after the slave is aware of master's >>>>> ranges. >>>>> >>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >>>>> --- >>>>> drivers/net/virtio/virtio_vdpa.c | 255 >>>>> +++++++++++++++++++++++++++++++ >>>>> 1 file changed, 255 insertions(+) >>>>> >>>>> diff --git a/drivers/net/virtio/virtio_vdpa.c >>>>> b/drivers/net/virtio/virtio_vdpa.c >>>>> index fc52a8e92..13b4dd07d 100644 >>>>> --- a/drivers/net/virtio/virtio_vdpa.c >>>>> +++ b/drivers/net/virtio/virtio_vdpa.c >>>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct >>>>> rte_pci_device *pdev) >>>>> return list; >>>>> } >>>>> +static int >>>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int >>>>> do_map, >>>>> + uint64_t iova) >>>>> +{ >>>>> + const struct rte_memzone *mz; >>>>> + int ret; >>>>> + >>>>> + /* >>>>> + * IOVAs are processes VAs. We cannot use them as the Data and >>>>> Control >>>>> + * paths are run in different processes, which may (does) lead to >>>>> + * collusions. The trick here is to fixup Ctrl path IOVAs so >>>>> that they >>>>> + * start after the Data path ranges. >>>>> + */ >>>>> + if (do_map) { >>>>> + mz = dev->cvq->cq.mz; >>>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >>>>> + (uint64_t)(uintptr_t)mz->addr, >>>>> + iova, mz->len); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); >>>>> + return ret; >>>>> + } >>>>> + >>>>> + dev->cvq->vq_ring_mem = iova; >>>>> + iova += mz->len; >>>>> + >>>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; >>>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, >>>>> + (uint64_t)(uintptr_t)mz->addr, >>>>> + iova, mz->len); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); >>>>> + return ret; >>>>> + } >>>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz >>>> via the device which may have potential risks. >>> I get what you mean, but I'm not sure to see how we could avoid that. >>> AFAIU, we need to map the control queue in the device IOMMU, otherwise >>> how could the host (in case of virtual device) or the NIC (in case of >>> Virtio offload), could access the ring? >>> Any thoughts? >> I also don't see a way to avoid that. That's why I said in below >> thread that I think the control queue based interface seems not a >> quite good interface for a backend device: >> >> https://lkml.org/lkml/2019/9/2/934 >> >> In IFCVF NIC, we added a MMIO based interface to replace control >> queue for the multiqueue setup in vDPA mode. >> >> Jason is proposing some changes to make virtio device suitable >> for backend device. I'm not sure whether it's possible to cover >> this case as well.. > > > A silly question, can we do dynamic mapping like what kernel driver did > here? Not silly at all, it is of course possible. I will implement that in my v2. Thanks! Maxime > Thanks > > >> >> Regards, >> Tiwei >> >>> Thanks, >>> Maxime >>>> Regards, >>>> Tiwei >>>> >>>>> + >>>>> + dev->cvq->cq.virtio_net_hdr_mem = iova; >>>>> + } else { >>>>> + mz = dev->cvq->cq.mz; >>>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >>>>> + (uint64_t)(uintptr_t)mz->addr, >>>>> + iova, mz->len); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); >>>>> + return ret; >>>>> + } >>>>> + >>>>> + dev->cvq->vq_ring_mem = 0; >>>>> + iova += mz->len; >>>>> + >>>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; >>>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, >>>>> + (uint64_t)(uintptr_t)mz->addr, >>>>> + iova, mz->len); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); >>>>> + return ret; >>>>> + } >>>>> + >>>>> + dev->cvq->cq.virtio_net_hdr_mem = 0; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int >>>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) >>>>> +{ >>>>> + uint32_t i; >>>>> + int ret; >>>>> + struct rte_vhost_memory *mem = NULL; >>>>> + int vfio_container_fd; >>>>> + uint64_t avail_iova = 0; >>>>> + >>>>> + ret = rte_vhost_get_mem_table(dev->vid, &mem); >>>>> + if (ret < 0 || !mem) { >>>>> + DRV_LOG(ERR, "failed to get VM memory layout."); >>>>> + return ret; >>>>> + } >>>>> + >>>>> + vfio_container_fd = dev->vfio_container_fd; >>>>> + >>>>> + for (i = 0; i < mem->nregions; i++) { >>>>> + struct rte_vhost_mem_region *reg; >>>>> + >>>>> + reg = &mem->regions[i]; >>>>> + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " >>>>> + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", >>>>> + do_map ? "DMA map" : "DMA unmap", i, >>>>> + reg->host_user_addr, reg->guest_phys_addr, reg->size); >>>>> + >>>>> + if (reg->guest_phys_addr + reg->size > avail_iova) >>>>> + avail_iova = reg->guest_phys_addr + reg->size; >>>>> + >>>>> + if (do_map) { >>>>> + ret = rte_vfio_container_dma_map(vfio_container_fd, >>>>> + reg->host_user_addr, reg->guest_phys_addr, >>>>> + reg->size); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "DMA map failed."); >>>>> + goto exit; >>>>> + } >>>>> + } else { >>>>> + ret = rte_vfio_container_dma_unmap(vfio_container_fd, >>>>> + reg->host_user_addr, reg->guest_phys_addr, >>>>> + reg->size); >>>>> + if (ret < 0) { >>>>> + DRV_LOG(ERR, "DMA unmap failed."); >>>>> + goto exit; >>>>> + } >>>>> + } >>>>> + } >>>>> + >>>>> + if (dev->cvq) >>>>> + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, >>>>> avail_iova); >>>>> + >>>>> +exit: >>>>> + free(mem); >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> static int >>>>> virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) >>>>> { >>>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did >>>>> __rte_unused, uint64_t *features) >>>>> return 0; >>>>> } >>>>> +static uint64_t >>>>> +hva_to_gpa(int vid, uint64_t hva) >>>>> +{ >>>>> + struct rte_vhost_memory *mem = NULL; >>>>> + struct rte_vhost_mem_region *reg; >>>>> + uint32_t i; >>>>> + uint64_t gpa = 0; >>>>> + >>>>> + if (rte_vhost_get_mem_table(vid, &mem) < 0) >>>>> + goto exit; >>>>> + >>>>> + for (i = 0; i < mem->nregions; i++) { >>>>> + reg = &mem->regions[i]; >>>>> + >>>>> + if (hva >= reg->host_user_addr && >>>>> + hva < reg->host_user_addr + reg->size) { >>>>> + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> +exit: >>>>> + if (mem) >>>>> + free(mem); >>>>> + return gpa; >>>>> +} >>>>> + >>>>> +static int >>>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev) >>>>> +{ >>>>> + struct virtio_hw *hw = &dev->hw; >>>>> + int i, vid, nr_vring, ret; >>>>> + struct rte_vhost_vring vr; >>>>> + struct virtio_pmd_ctrl ctrl; >>>>> + int dlen[1]; >>>>> + >>>>> + vid = dev->vid; >>>>> + nr_vring = rte_vhost_get_vring_num(vid); >>>>> + >>>>> + if (dev->vqs) >>>>> + rte_free(dev->vqs); >>>>> + >>>>> + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * >>>>> nr_vring, 0); >>>>> + >>>>> + for (i = 0; i < nr_vring; i++) { >>>>> + struct virtqueue *vq = &dev->vqs[i]; >>>>> + >>>>> + rte_vhost_get_vhost_vring(vid, i, &vr); >>>>> + >>>>> + vq->vq_queue_index = i; >>>>> + vq->vq_nentries = vr.size; >>>>> + vq->vq_ring_mem = hva_to_gpa(vid, >>>>> (uint64_t)(uintptr_t)vr.desc); >>>>> + if (vq->vq_ring_mem == 0) { >>>>> + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); >>>>> + ret = -1; >>>>> + goto out_free_vqs; >>>>> + } >>>>> + >>>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); >>>>> + if (ret) { >>>>> + DRV_LOG(ERR, "Fail to setup queue."); >>>>> + goto out_free_vqs; >>>>> + } >>>>> + } >>>>> + >>>>> + if (dev->cvq) { >>>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); >>>>> + if (ret) { >>>>> + DRV_LOG(ERR, "Fail to setup ctrl queue."); >>>>> + goto out_free_vqs; >>>>> + } >>>>> + } >>>>> + >>>>> + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); >>>>> + >>>>> + if (!dev->cvq) >>>>> + return 0; >>>>> + >>>>> + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; >>>>> + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; >>>>> + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); >>>>> + >>>>> + dlen[0] = sizeof(uint16_t); >>>>> + >>>>> + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); >>>>> + if (ret) { >>>>> + DRV_LOG(ERR, "Multiqueue configured but send command " >>>>> + "failed, this is too late now..."); >>>>> + ret = -EINVAL; >>>>> + goto out_free_vqs; >>>>> + } >>>>> + >>>>> + return 0; >>>>> +out_free_vqs: >>>>> + rte_free(dev->vqs); >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +static int >>>>> +virtio_vdpa_dev_config(int vid) >>>>> +{ >>>>> + int did, ret; >>>>> + struct internal_list *list; >>>>> + struct virtio_vdpa_device *dev; >>>>> + >>>>> + did = rte_vhost_get_vdpa_device_id(vid); >>>>> + list = find_internal_resource_by_did(did); >>>>> + if (list == NULL) { >>>>> + DRV_LOG(ERR, "Invalid device id: %d", did); >>>>> + return -1; >>>>> + } >>>>> + >>>>> + dev = list->dev; >>>>> + dev->vid = vid; >>>>> + >>>>> + rte_spinlock_lock(&dev->lock); >>>>> + >>>>> + ret = virtio_vdpa_dma_map(dev, 1); >>>>> + if (ret) >>>>> + goto out_unlock; >>>>> + >>>>> + ret = virtio_vdpa_start(dev); >>>>> + >>>>> + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) >>>>> + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); >>>>> + >>>>> +out_unlock: >>>>> + rte_spinlock_unlock(&dev->lock); >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { >>>>> .get_queue_num = virtio_vdpa_get_queue_num, >>>>> .get_features = virtio_vdpa_get_features, >>>>> .get_protocol_features = virtio_vdpa_get_protocol_features, >>>>> + .dev_conf = virtio_vdpa_dev_config, >>>>> }; >>>>> static inline int >>>>> -- >>>>> 2.21.0 >>>>>
On Wed, Sep 04, 2019 at 08:56:32AM +0200, Maxime Coquelin wrote: > On 9/4/19 6:06 AM, Jason Wang wrote: > > On 2019/9/3 下午4:49, Tiwei Bie wrote: > >> On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote: > >>> On 9/3/19 7:30 AM, Tiwei Bie wrote: > >>>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote: > >>>>> In order to support multi-queue, we need to implement the control > >>>>> path. The problem is that both the Vhost-user master and slave use > >>>>> VAs in their processes address spaces as IOVAs, which creates > >>>>> collusions between the data rings IOVAs managed the master, and > >>>>> the Control ring IOVAs. The trick here is to remmap the Control > >>>>> ring memory to another range, after the slave is aware of master's > >>>>> ranges. > >>>>> > >>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > >>>>> --- > >>>>> drivers/net/virtio/virtio_vdpa.c | 255 > >>>>> +++++++++++++++++++++++++++++++ > >>>>> 1 file changed, 255 insertions(+) > >>>>> > >>>>> diff --git a/drivers/net/virtio/virtio_vdpa.c > >>>>> b/drivers/net/virtio/virtio_vdpa.c > >>>>> index fc52a8e92..13b4dd07d 100644 > >>>>> --- a/drivers/net/virtio/virtio_vdpa.c > >>>>> +++ b/drivers/net/virtio/virtio_vdpa.c > >>>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct > >>>>> rte_pci_device *pdev) > >>>>> return list; > >>>>> } > >>>>> +static int > >>>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int > >>>>> do_map, > >>>>> + uint64_t iova) > >>>>> +{ > >>>>> + const struct rte_memzone *mz; > >>>>> + int ret; > >>>>> + > >>>>> + /* > >>>>> + * IOVAs are processes VAs. We cannot use them as the Data and > >>>>> Control > >>>>> + * paths are run in different processes, which may (does) lead to > >>>>> + * collusions. The trick here is to fixup Ctrl path IOVAs so > >>>>> that they > >>>>> + * start after the Data path ranges. > >>>>> + */ > >>>>> + if (do_map) { > >>>>> + mz = dev->cvq->cq.mz; > >>>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > >>>>> + (uint64_t)(uintptr_t)mz->addr, > >>>>> + iova, mz->len); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret); > >>>>> + return ret; > >>>>> + } > >>>>> + > >>>>> + dev->cvq->vq_ring_mem = iova; > >>>>> + iova += mz->len; > >>>>> + > >>>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; > >>>>> + ret = rte_vfio_container_dma_map(dev->vfio_container_fd, > >>>>> + (uint64_t)(uintptr_t)mz->addr, > >>>>> + iova, mz->len); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret); > >>>>> + return ret; > >>>>> + } > >>>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz > >>>> via the device which may have potential risks. > >>> I get what you mean, but I'm not sure to see how we could avoid that. > >>> AFAIU, we need to map the control queue in the device IOMMU, otherwise > >>> how could the host (in case of virtual device) or the NIC (in case of > >>> Virtio offload), could access the ring? > >>> Any thoughts? > >> I also don't see a way to avoid that. That's why I said in below > >> thread that I think the control queue based interface seems not a > >> quite good interface for a backend device: > >> > >> https://lkml.org/lkml/2019/9/2/934 > >> > >> In IFCVF NIC, we added a MMIO based interface to replace control > >> queue for the multiqueue setup in vDPA mode. > >> > >> Jason is proposing some changes to make virtio device suitable > >> for backend device. I'm not sure whether it's possible to cover > >> this case as well.. > > > > > > A silly question, can we do dynamic mapping like what kernel driver did > > here? > > Not silly at all, it is of course possible. +1. It's a good idea to mitigate the risks (if possible, we should make the Rx/Tx held while cvq is being used, or try to make cvq's iova unpredictable each time from guest side). > I will implement that in my v2. Thanks! Tiwei > > Thanks! > Maxime > > > Thanks > > > > > >> > >> Regards, > >> Tiwei > >> > >>> Thanks, > >>> Maxime > >>>> Regards, > >>>> Tiwei > >>>> > >>>>> + > >>>>> + dev->cvq->cq.virtio_net_hdr_mem = iova; > >>>>> + } else { > >>>>> + mz = dev->cvq->cq.mz; > >>>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > >>>>> + (uint64_t)(uintptr_t)mz->addr, > >>>>> + iova, mz->len); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret); > >>>>> + return ret; > >>>>> + } > >>>>> + > >>>>> + dev->cvq->vq_ring_mem = 0; > >>>>> + iova += mz->len; > >>>>> + > >>>>> + mz = dev->cvq->cq.virtio_net_hdr_mz; > >>>>> + ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd, > >>>>> + (uint64_t)(uintptr_t)mz->addr, > >>>>> + iova, mz->len); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret); > >>>>> + return ret; > >>>>> + } > >>>>> + > >>>>> + dev->cvq->cq.virtio_net_hdr_mem = 0; > >>>>> + } > >>>>> + > >>>>> + return 0; > >>>>> +} > >>>>> + > >>>>> +static int > >>>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map) > >>>>> +{ > >>>>> + uint32_t i; > >>>>> + int ret; > >>>>> + struct rte_vhost_memory *mem = NULL; > >>>>> + int vfio_container_fd; > >>>>> + uint64_t avail_iova = 0; > >>>>> + > >>>>> + ret = rte_vhost_get_mem_table(dev->vid, &mem); > >>>>> + if (ret < 0 || !mem) { > >>>>> + DRV_LOG(ERR, "failed to get VM memory layout."); > >>>>> + return ret; > >>>>> + } > >>>>> + > >>>>> + vfio_container_fd = dev->vfio_container_fd; > >>>>> + > >>>>> + for (i = 0; i < mem->nregions; i++) { > >>>>> + struct rte_vhost_mem_region *reg; > >>>>> + > >>>>> + reg = &mem->regions[i]; > >>>>> + DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", " > >>>>> + "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".", > >>>>> + do_map ? "DMA map" : "DMA unmap", i, > >>>>> + reg->host_user_addr, reg->guest_phys_addr, reg->size); > >>>>> + > >>>>> + if (reg->guest_phys_addr + reg->size > avail_iova) > >>>>> + avail_iova = reg->guest_phys_addr + reg->size; > >>>>> + > >>>>> + if (do_map) { > >>>>> + ret = rte_vfio_container_dma_map(vfio_container_fd, > >>>>> + reg->host_user_addr, reg->guest_phys_addr, > >>>>> + reg->size); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "DMA map failed."); > >>>>> + goto exit; > >>>>> + } > >>>>> + } else { > >>>>> + ret = rte_vfio_container_dma_unmap(vfio_container_fd, > >>>>> + reg->host_user_addr, reg->guest_phys_addr, > >>>>> + reg->size); > >>>>> + if (ret < 0) { > >>>>> + DRV_LOG(ERR, "DMA unmap failed."); > >>>>> + goto exit; > >>>>> + } > >>>>> + } > >>>>> + } > >>>>> + > >>>>> + if (dev->cvq) > >>>>> + ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, > >>>>> avail_iova); > >>>>> + > >>>>> +exit: > >>>>> + free(mem); > >>>>> + > >>>>> + return ret; > >>>>> +} > >>>>> + > >>>>> static int > >>>>> virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev) > >>>>> { > >>>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did > >>>>> __rte_unused, uint64_t *features) > >>>>> return 0; > >>>>> } > >>>>> +static uint64_t > >>>>> +hva_to_gpa(int vid, uint64_t hva) > >>>>> +{ > >>>>> + struct rte_vhost_memory *mem = NULL; > >>>>> + struct rte_vhost_mem_region *reg; > >>>>> + uint32_t i; > >>>>> + uint64_t gpa = 0; > >>>>> + > >>>>> + if (rte_vhost_get_mem_table(vid, &mem) < 0) > >>>>> + goto exit; > >>>>> + > >>>>> + for (i = 0; i < mem->nregions; i++) { > >>>>> + reg = &mem->regions[i]; > >>>>> + > >>>>> + if (hva >= reg->host_user_addr && > >>>>> + hva < reg->host_user_addr + reg->size) { > >>>>> + gpa = hva - reg->host_user_addr + reg->guest_phys_addr; > >>>>> + break; > >>>>> + } > >>>>> + } > >>>>> + > >>>>> +exit: > >>>>> + if (mem) > >>>>> + free(mem); > >>>>> + return gpa; > >>>>> +} > >>>>> + > >>>>> +static int > >>>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev) > >>>>> +{ > >>>>> + struct virtio_hw *hw = &dev->hw; > >>>>> + int i, vid, nr_vring, ret; > >>>>> + struct rte_vhost_vring vr; > >>>>> + struct virtio_pmd_ctrl ctrl; > >>>>> + int dlen[1]; > >>>>> + > >>>>> + vid = dev->vid; > >>>>> + nr_vring = rte_vhost_get_vring_num(vid); > >>>>> + > >>>>> + if (dev->vqs) > >>>>> + rte_free(dev->vqs); > >>>>> + > >>>>> + dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * > >>>>> nr_vring, 0); > >>>>> + > >>>>> + for (i = 0; i < nr_vring; i++) { > >>>>> + struct virtqueue *vq = &dev->vqs[i]; > >>>>> + > >>>>> + rte_vhost_get_vhost_vring(vid, i, &vr); > >>>>> + > >>>>> + vq->vq_queue_index = i; > >>>>> + vq->vq_nentries = vr.size; > >>>>> + vq->vq_ring_mem = hva_to_gpa(vid, > >>>>> (uint64_t)(uintptr_t)vr.desc); > >>>>> + if (vq->vq_ring_mem == 0) { > >>>>> + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); > >>>>> + ret = -1; > >>>>> + goto out_free_vqs; > >>>>> + } > >>>>> + > >>>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, vq); > >>>>> + if (ret) { > >>>>> + DRV_LOG(ERR, "Fail to setup queue."); > >>>>> + goto out_free_vqs; > >>>>> + } > >>>>> + } > >>>>> + > >>>>> + if (dev->cvq) { > >>>>> + ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq); > >>>>> + if (ret) { > >>>>> + DRV_LOG(ERR, "Fail to setup ctrl queue."); > >>>>> + goto out_free_vqs; > >>>>> + } > >>>>> + } > >>>>> + > >>>>> + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK); > >>>>> + > >>>>> + if (!dev->cvq) > >>>>> + return 0; > >>>>> + > >>>>> + ctrl.hdr.class = VIRTIO_NET_CTRL_MQ; > >>>>> + ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET; > >>>>> + memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t)); > >>>>> + > >>>>> + dlen[0] = sizeof(uint16_t); > >>>>> + > >>>>> + ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1); > >>>>> + if (ret) { > >>>>> + DRV_LOG(ERR, "Multiqueue configured but send command " > >>>>> + "failed, this is too late now..."); > >>>>> + ret = -EINVAL; > >>>>> + goto out_free_vqs; > >>>>> + } > >>>>> + > >>>>> + return 0; > >>>>> +out_free_vqs: > >>>>> + rte_free(dev->vqs); > >>>>> + > >>>>> + return ret; > >>>>> +} > >>>>> + > >>>>> +static int > >>>>> +virtio_vdpa_dev_config(int vid) > >>>>> +{ > >>>>> + int did, ret; > >>>>> + struct internal_list *list; > >>>>> + struct virtio_vdpa_device *dev; > >>>>> + > >>>>> + did = rte_vhost_get_vdpa_device_id(vid); > >>>>> + list = find_internal_resource_by_did(did); > >>>>> + if (list == NULL) { > >>>>> + DRV_LOG(ERR, "Invalid device id: %d", did); > >>>>> + return -1; > >>>>> + } > >>>>> + > >>>>> + dev = list->dev; > >>>>> + dev->vid = vid; > >>>>> + > >>>>> + rte_spinlock_lock(&dev->lock); > >>>>> + > >>>>> + ret = virtio_vdpa_dma_map(dev, 1); > >>>>> + if (ret) > >>>>> + goto out_unlock; > >>>>> + > >>>>> + ret = virtio_vdpa_start(dev); > >>>>> + > >>>>> + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) > >>>>> + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); > >>>>> + > >>>>> +out_unlock: > >>>>> + rte_spinlock_unlock(&dev->lock); > >>>>> + > >>>>> + return ret; > >>>>> +} > >>>>> + > >>>>> static struct rte_vdpa_dev_ops virtio_vdpa_ops = { > >>>>> .get_queue_num = virtio_vdpa_get_queue_num, > >>>>> .get_features = virtio_vdpa_get_features, > >>>>> .get_protocol_features = virtio_vdpa_get_protocol_features, > >>>>> + .dev_conf = virtio_vdpa_dev_config, > >>>>> }; > >>>>> static inline int > >>>>> -- > >>>>> 2.21.0 > >>>>>
Hi Maxime, Thursday, August 29, 2019 11:00 AM, Maxime Coquelin: > Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver > > vDPA allows to offload Virtio Datapath processing by supported NICs, like > IFCVF for example. > > The control path has to be handled by a dedicated vDPA driver, so that it can > translate Vhost-user protocol requests to proprietary NICs registers > accesses. > > This driver is the vDPA driver for Virtio devices, meaning that Vhost-user > protocol requests get translated to Virtio registers accesses as per defined in > the Virtio spec. > > Basically, it can be used within a guest with a para-virtualized Virtio-net > device, or even with a full Virtio HW offload NIC directly on host. Can you elaborate more on the use cases to use such driver? 1. If the underlying HW can support full virtio device, why we need to work w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one? 2. why it is preferable to work w/ virtio device as the backend device to be used w/ vDPA v.s. working w/ the underlying HW VF? Is nested virtualization is what you have in mind? > > Amongst the main features, all currently supported Virtio spec versions are > supported (split & packed rings, but only tested with split ring for now) and > also multiqueue support is added by implementing the cotnrol virtqueue in > the driver. > > The structure of this driver is heavily based on IFCVF vDPA. > > Maxime Coquelin (15): > vhost: remove vhost kernel header inclusion > vhost: configure vDPA as soon as the device is ready > net/virtio: move control path fonctions in virtqueue file > net/virtio: add virtio PCI subsystem device ID declaration > net/virtio: save notify bar ID in virtio HW struct > net/virtio: add skeleton for virtio vDPA driver > net/virtio: add vDPA ops to get number of queue > net/virtio: add virtio vDPA op to get features > net/virtio: add virtio vDPA op to get protocol features > net/virtio: add vDPA op to configure and start the device > net/virtio: add vDPA op to stop and close the device > net/virtio: add vDPA op to set features > net/virtio: add vDPA ops to get VFIO FDs > net/virtio: add vDPA op to get notification area > doc: add documentation for Virtio vDPA driver > > config/common_linux | 1 + > doc/guides/nics/index.rst | 1 + > doc/guides/nics/virtio_vdpa.rst | 45 ++ > drivers/net/ifc/ifcvf_vdpa.c | 1 + > drivers/net/virtio/Makefile | 4 + > drivers/net/virtio/meson.build | 3 +- > drivers/net/virtio/virtio_ethdev.c | 252 -------- > drivers/net/virtio/virtio_pci.c | 6 +- > drivers/net/virtio/virtio_pci.h | 2 + > drivers/net/virtio/virtio_vdpa.c | 918 +++++++++++++++++++++++++++++ > drivers/net/virtio/virtqueue.c | 255 ++++++++ > drivers/net/virtio/virtqueue.h | 5 + > lib/librte_vhost/rte_vdpa.h | 1 - > lib/librte_vhost/rte_vhost.h | 9 +- > lib/librte_vhost/vhost_user.c | 3 +- > 15 files changed, 1243 insertions(+), 263 deletions(-) create mode 100644 > doc/guides/nics/virtio_vdpa.rst create mode 100644 > drivers/net/virtio/virtio_vdpa.c > > -- > 2.21.0
Hi Shahaf, On 9/9/19 1:55 PM, Shahaf Shuler wrote: > Hi Maxime, > > Thursday, August 29, 2019 11:00 AM, Maxime Coquelin: >> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver >> >> vDPA allows to offload Virtio Datapath processing by supported NICs, like >> IFCVF for example. >> >> The control path has to be handled by a dedicated vDPA driver, so that it can >> translate Vhost-user protocol requests to proprietary NICs registers >> accesses. >> >> This driver is the vDPA driver for Virtio devices, meaning that Vhost-user >> protocol requests get translated to Virtio registers accesses as per defined in >> the Virtio spec. >> >> Basically, it can be used within a guest with a para-virtualized Virtio-net >> device, or even with a full Virtio HW offload NIC directly on host. > > Can you elaborate more on the use cases to use such driver? > > 1. If the underlying HW can support full virtio device, why we need to work w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one? > 2. why it is preferable to work w/ virtio device as the backend device to be used w/ vDPA v.s. working w/ the underlying HW VF? IMHO, I see two uses cases where it can make sense to use vDPA with a full offload HW device: 1. Live-migration support: It makes it possible to switch to rings processing in SW during the migration as Virtio HH does not support dirty pages logging. 2. Can be used to provide a single standard interface (the vhost-user socket) to containers in the scope of CNFs. Doing so, the container does not need to be modified, whatever the HW NIC: Virtio datapath offload only, full Virtio offload, or no offload at all. In the latter case, it would not be optimal as it implies forwarding between the Vhost PMD and the HW NIC PMD but it would work. > Is nested virtualization is what you have in mind? For the para-virtualized virtio device, either nested virtualization, or container running within the guest. Maxime
Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin: > Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver > > Hi Shahaf, > > On 9/9/19 1:55 PM, Shahaf Shuler wrote: > > Hi Maxime, > > > > Thursday, August 29, 2019 11:00 AM, Maxime Coquelin: > >> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver > >> > >> vDPA allows to offload Virtio Datapath processing by supported NICs, > >> like IFCVF for example. > >> > >> The control path has to be handled by a dedicated vDPA driver, so > >> that it can translate Vhost-user protocol requests to proprietary > >> NICs registers accesses. > >> > >> This driver is the vDPA driver for Virtio devices, meaning that > >> Vhost-user protocol requests get translated to Virtio registers > >> accesses as per defined in the Virtio spec. > >> > >> Basically, it can be used within a guest with a para-virtualized > >> Virtio-net device, or even with a full Virtio HW offload NIC directly on > host. > > > > Can you elaborate more on the use cases to use such driver? > > > > 1. If the underlying HW can support full virtio device, why we need to work > w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one? > > 2. why it is preferable to work w/ virtio device as the backend device to be > used w/ vDPA v.s. working w/ the underlying HW VF? > > > IMHO, I see two uses cases where it can make sense to use vDPA with a full > offload HW device: > 1. Live-migration support: It makes it possible to switch to rings > processing in SW during the migration as Virtio HH does not support > dirty pages logging. Can you elaborate why specifically using virtio_vdpa PMD enables this SW relay during migration? e.g. the vdpa PMD of intel that runs on top of VF do that today as well. > > 2. Can be used to provide a single standard interface (the vhost-user > socket) to containers in the scope of CNFs. Doing so, the container > does not need to be modified, whatever the HW NIC: Virtio datapath > offload only, full Virtio offload, or no offload at all. In the > latter case, it would not be optimal as it implies forwarding between > the Vhost PMD and the HW NIC PMD but it would work. It is not clear to me the interface map in such system. From what I understand the container will have virtio-user i/f and the host will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or not. For full emulation I guess you will need to expose the netdev of the fully emulated virtio device to the container? Am trying to map when it is beneficial to use this virtio_vdpa PMD and when it is better to use the vendor specific vDPA PMD on top of VF. > > > Is nested virtualization is what you have in mind? > > For the para-virtualized virtio device, either nested virtualization, or > container running within the guest. > > Maxime
On 9/10/19 3:44 PM, Shahaf Shuler wrote: > Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin: >> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver >> >> Hi Shahaf, >> >> On 9/9/19 1:55 PM, Shahaf Shuler wrote: >>> Hi Maxime, >>> >>> Thursday, August 29, 2019 11:00 AM, Maxime Coquelin: >>>> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver >>>> >>>> vDPA allows to offload Virtio Datapath processing by supported NICs, >>>> like IFCVF for example. >>>> >>>> The control path has to be handled by a dedicated vDPA driver, so >>>> that it can translate Vhost-user protocol requests to proprietary >>>> NICs registers accesses. >>>> >>>> This driver is the vDPA driver for Virtio devices, meaning that >>>> Vhost-user protocol requests get translated to Virtio registers >>>> accesses as per defined in the Virtio spec. >>>> >>>> Basically, it can be used within a guest with a para-virtualized >>>> Virtio-net device, or even with a full Virtio HW offload NIC directly on >> host. >>> >>> Can you elaborate more on the use cases to use such driver? >>> >>> 1. If the underlying HW can support full virtio device, why we need to work >> w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one? >>> 2. why it is preferable to work w/ virtio device as the backend device to be >> used w/ vDPA v.s. working w/ the underlying HW VF? >> >> >> IMHO, I see two uses cases where it can make sense to use vDPA with a full >> offload HW device: >> 1. Live-migration support: It makes it possible to switch to rings >> processing in SW during the migration as Virtio HH does not support >> dirty pages logging. > > Can you elaborate why specifically using virtio_vdpa PMD enables this SW relay during migration? > e.g. the vdpa PMD of intel that runs on top of VF do that today as well. I think there were a misunderstanding. When I said: " I see two uses cases where it can make sense to use vDPA with a full offload HW device " I meant, I see two uses cases where it can make sense to use vDPA with a full offload HW device, instead of the full offload HW device to use Virtio PMD. In other words, I think it is preferable to only offload the datapath, so that it is possible to support SW live-migration. >> >> 2. Can be used to provide a single standard interface (the vhost-user >> socket) to containers in the scope of CNFs. Doing so, the container >> does not need to be modified, whatever the HW NIC: Virtio datapath >> offload only, full Virtio offload, or no offload at all. In the >> latter case, it would not be optimal as it implies forwarding between >> the Vhost PMD and the HW NIC PMD but it would work. > > It is not clear to me the interface map in such system. > From what I understand the container will have virtio-user i/f and the host will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or not. > For full emulation I guess you will need to expose the netdev of the fully emulated virtio device to the container? > > Am trying to map when it is beneficial to use this virtio_vdpa PMD and when it is better to use the vendor specific vDPA PMD on top of VF. I think that with above clarification, I made it clear that the goal of this driver is not to replace vendors vDPA drivers (their control path maybe not even be compatible), but instead to provide a generic driver that can be used either within a guest with a para-virtualized Virtio- net device or with HW NIC that fully offloads Virtio (both data and control paths).
Tuesday, September 10, 2019 4:56 PM, Maxime Coquelin: > Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver > On 9/10/19 3:44 PM, Shahaf Shuler wrote: > > Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin: > >> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver [...] > >> > >> Hi Shahaf, > >> > >> > >> IMHO, I see two uses cases where it can make sense to use vDPA with a > >> full offload HW device: > >> 1. Live-migration support: It makes it possible to switch to rings > >> processing in SW during the migration as Virtio HH does not support > >> dirty pages logging. > > > > Can you elaborate why specifically using virtio_vdpa PMD enables this SW > relay during migration? > > e.g. the vdpa PMD of intel that runs on top of VF do that today as well. > > I think there were a misunderstanding. When I said: > " > I see two uses cases where it can make sense to use vDPA with a full offload > HW device " > > I meant, I see two uses cases where it can make sense to use vDPA with a full > offload HW device, instead of the full offload HW device to use Virtio PMD. > > In other words, I think it is preferable to only offload the datapath, so that it > is possible to support SW live-migration. > > >> > >> 2. Can be used to provide a single standard interface (the vhost-user > >> socket) to containers in the scope of CNFs. Doing so, the container > >> does not need to be modified, whatever the HW NIC: Virtio datapath > >> offload only, full Virtio offload, or no offload at all. In the > >> latter case, it would not be optimal as it implies forwarding between > >> the Vhost PMD and the HW NIC PMD but it would work. > > > > It is not clear to me the interface map in such system. > > From what I understand the container will have virtio-user i/f and the host > will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or > not. > > For full emulation I guess you will need to expose the netdev of the fully > emulated virtio device to the container? > > > > Am trying to map when it is beneficial to use this virtio_vdpa PMD and > when it is better to use the vendor specific vDPA PMD on top of VF. > > I think that with above clarification, I made it clear that the goal of this driver > is not to replace vendors vDPA drivers (their control path maybe not even be > compatible), but instead to provide a generic driver that can be used either > within a guest with a para-virtualized Virtio- net device or with HW NIC that > fully offloads Virtio (both data and control paths). Thanks Maxim, It is clearer now. From what I understand this driver is to be used w/ vDPA when the underlying device is virtio. I can perfectly understand the para-virt ( + nested virtualization / container inside VM) use case. Regarding the fully emulated virtio device on the host (instead of a plain VF) - for me the benefit still not clear - if you have HW that can expose VF why not use VF + vendor specific vDPA driver. Anyway - for the series, Acked-by: Shahaf Shuler <shahafs@mellanox.com>
On 9/11/19 7:15 AM, Shahaf Shuler wrote: > Tuesday, September 10, 2019 4:56 PM, Maxime Coquelin: >> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver >> On 9/10/19 3:44 PM, Shahaf Shuler wrote: >>> Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin: >>>> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver > > [...] > >>>> >>>> Hi Shahaf, >>>> >>>> >>>> IMHO, I see two uses cases where it can make sense to use vDPA with a >>>> full offload HW device: >>>> 1. Live-migration support: It makes it possible to switch to rings >>>> processing in SW during the migration as Virtio HH does not support >>>> dirty pages logging. >>> >>> Can you elaborate why specifically using virtio_vdpa PMD enables this SW >> relay during migration? >>> e.g. the vdpa PMD of intel that runs on top of VF do that today as well. >> >> I think there were a misunderstanding. When I said: >> " >> I see two uses cases where it can make sense to use vDPA with a full offload >> HW device " >> >> I meant, I see two uses cases where it can make sense to use vDPA with a full >> offload HW device, instead of the full offload HW device to use Virtio PMD. >> >> In other words, I think it is preferable to only offload the datapath, so that it >> is possible to support SW live-migration. >> >>>> >>>> 2. Can be used to provide a single standard interface (the vhost-user >>>> socket) to containers in the scope of CNFs. Doing so, the container >>>> does not need to be modified, whatever the HW NIC: Virtio datapath >>>> offload only, full Virtio offload, or no offload at all. In the >>>> latter case, it would not be optimal as it implies forwarding between >>>> the Vhost PMD and the HW NIC PMD but it would work. >>> >>> It is not clear to me the interface map in such system. >>> From what I understand the container will have virtio-user i/f and the host >> will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or >> not. >>> For full emulation I guess you will need to expose the netdev of the fully >> emulated virtio device to the container? >>> >>> Am trying to map when it is beneficial to use this virtio_vdpa PMD and >> when it is better to use the vendor specific vDPA PMD on top of VF. >> >> I think that with above clarification, I made it clear that the goal of this driver >> is not to replace vendors vDPA drivers (their control path maybe not even be >> compatible), but instead to provide a generic driver that can be used either >> within a guest with a para-virtualized Virtio- net device or with HW NIC that >> fully offloads Virtio (both data and control paths). > > Thanks Maxim, It is clearer now. > From what I understand this driver is to be used w/ vDPA when the underlying device is virtio. > > I can perfectly understand the para-virt ( + nested virtualization / container inside VM) use case. > > Regarding the fully emulated virtio device on the host (instead of a plain VF) - for me the benefit still not clear - if you have HW that can expose VF why not use VF + vendor specific vDPA driver. If you need a vendor specific vDPA driver for the VF, then you definitely want to use the vendor specific driver. However, if there is a HW or VF that implements the Virtio Spec even for the control path (i.e. the PCI registers layout), one may be tempted to do device assignment directly to the guest and use Virtio PMD. The downside of doing that is it won't support live-migration. The benefit of using vDPA with virtio vDPA driver in this case is provide a way to support live-migration (by switch to SW ring processing and perform dirty pages logging). > > Anyway - for the series, > Acked-by: Shahaf Shuler <shahafs@mellanox.com> > Thanks! Maxime
On 8/29/19 9:59 AM, Maxime Coquelin wrote:
> vDPA allows to offload Virtio Datapath processing by supported
> NICs, like IFCVF for example.
>
> The control path has to be handled by a dedicated vDPA driver,
> so that it can translate Vhost-user protocol requests to
> proprietary NICs registers accesses.
>
> This driver is the vDPA driver for Virtio devices, meaning
> that Vhost-user protocol requests get translated to Virtio
> registers accesses as per defined in the Virtio spec.
>
> Basically, it can be used within a guest with a para-virtualized
> Virtio-net device, or even with a full Virtio HW offload NIC
> directly on host.
>
> Amongst the main features, all currently supported Virtio spec
> versions are supported (split & packed rings, but only tested
> with split ring for now) and also multiqueue support is added
> by implementing the cotnrol virtqueue in the driver.
>
> The structure of this driver is heavily based on IFCVF vDPA.
>
> Maxime Coquelin (15):
> vhost: remove vhost kernel header inclusion
> vhost: configure vDPA as soon as the device is ready
> net/virtio: move control path fonctions in virtqueue file
> net/virtio: add virtio PCI subsystem device ID declaration
> net/virtio: save notify bar ID in virtio HW struct
> net/virtio: add skeleton for virtio vDPA driver
> net/virtio: add vDPA ops to get number of queue
> net/virtio: add virtio vDPA op to get features
> net/virtio: add virtio vDPA op to get protocol features
> net/virtio: add vDPA op to configure and start the device
> net/virtio: add vDPA op to stop and close the device
> net/virtio: add vDPA op to set features
> net/virtio: add vDPA ops to get VFIO FDs
> net/virtio: add vDPA op to get notification area
> doc: add documentation for Virtio vDPA driver
>
> config/common_linux | 1 +
> doc/guides/nics/index.rst | 1 +
> doc/guides/nics/virtio_vdpa.rst | 45 ++
> drivers/net/ifc/ifcvf_vdpa.c | 1 +
> drivers/net/virtio/Makefile | 4 +
> drivers/net/virtio/meson.build | 3 +-
> drivers/net/virtio/virtio_ethdev.c | 252 --------
> drivers/net/virtio/virtio_pci.c | 6 +-
> drivers/net/virtio/virtio_pci.h | 2 +
> drivers/net/virtio/virtio_vdpa.c | 918 +++++++++++++++++++++++++++++
> drivers/net/virtio/virtqueue.c | 255 ++++++++
> drivers/net/virtio/virtqueue.h | 5 +
> lib/librte_vhost/rte_vdpa.h | 1 -
> lib/librte_vhost/rte_vhost.h | 9 +-
> lib/librte_vhost/vhost_user.c | 3 +-
> 15 files changed, 1243 insertions(+), 263 deletions(-)
> create mode 100644 doc/guides/nics/virtio_vdpa.rst
> create mode 100644 drivers/net/virtio/virtio_vdpa.c
>
Deferring the series to v20.02.