DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
@ 2019-08-29  7:59 Maxime Coquelin
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion Maxime Coquelin
                   ` (15 more replies)
  0 siblings, 16 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

vDPA allows to offload Virtio Datapath processing by supported
NICs, like IFCVF for example.

The control path has to be handled by a dedicated vDPA driver,
so that it can translate Vhost-user protocol requests to
proprietary NICs registers accesses.

This driver is the vDPA driver for Virtio devices, meaning
that Vhost-user protocol requests get translated to Virtio
registers accesses as per defined in the Virtio spec.

Basically, it can be used within a guest with a para-virtualized
Virtio-net device, or even with a full Virtio HW offload NIC
directly on host.

Amongst the main features, all currently supported Virtio spec
versions are supported (split & packed rings, but only tested
with split ring for now) and also multiqueue support is added
by implementing the cotnrol virtqueue in the driver.

The structure of this driver is heavily based on IFCVF vDPA.

Maxime Coquelin (15):
  vhost: remove vhost kernel header inclusion
  vhost: configure vDPA as soon as the device is ready
  net/virtio: move control path fonctions in virtqueue file
  net/virtio: add virtio PCI subsystem device ID declaration
  net/virtio: save notify bar ID in virtio HW struct
  net/virtio: add skeleton for virtio vDPA driver
  net/virtio: add vDPA ops to get number of queue
  net/virtio: add virtio vDPA op to get features
  net/virtio: add virtio vDPA op to get protocol features
  net/virtio: add vDPA op to configure and start the device
  net/virtio: add vDPA op to stop and close the device
  net/virtio: add vDPA op to set features
  net/virtio: add vDPA ops to get VFIO FDs
  net/virtio: add vDPA op to get notification area
  doc: add documentation for Virtio vDPA driver

 config/common_linux                |   1 +
 doc/guides/nics/index.rst          |   1 +
 doc/guides/nics/virtio_vdpa.rst    |  45 ++
 drivers/net/ifc/ifcvf_vdpa.c       |   1 +
 drivers/net/virtio/Makefile        |   4 +
 drivers/net/virtio/meson.build     |   3 +-
 drivers/net/virtio/virtio_ethdev.c | 252 --------
 drivers/net/virtio/virtio_pci.c    |   6 +-
 drivers/net/virtio/virtio_pci.h    |   2 +
 drivers/net/virtio/virtio_vdpa.c   | 918 +++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.c     | 255 ++++++++
 drivers/net/virtio/virtqueue.h     |   5 +
 lib/librte_vhost/rte_vdpa.h        |   1 -
 lib/librte_vhost/rte_vhost.h       |   9 +-
 lib/librte_vhost/vhost_user.c      |   3 +-
 15 files changed, 1243 insertions(+), 263 deletions(-)
 create mode 100644 doc/guides/nics/virtio_vdpa.rst
 create mode 100644 drivers/net/virtio/virtio_vdpa.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:03   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready Maxime Coquelin
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This is preliminary rework for virtio-vdpa driver, in
order to avoid conflicts with Virtio PMD headers.

Generally, I think it is better not to include kernel
headers in RTE headers, especially in the case of Vhost
and Virtio which just re-use the kernel definitions,
and has no runtime dependencies.

In order to not break IFC driver build, the vhost kernel
header is now included directly in the driver.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 1 +
 lib/librte_vhost/rte_vdpa.h  | 1 -
 lib/librte_vhost/rte_vhost.h | 9 ++++-----
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 8de9ef199..40cb15ca8 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -7,6 +7,7 @@
 #include <fcntl.h>
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
+#include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <stdbool.h>
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index 9a3deb31d..69438210b 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -14,7 +14,6 @@
 #include <stdbool.h>
 
 #include <rte_pci.h>
-#include "rte_vhost.h"
 
 #define MAX_VDPA_NAME_LEN 128
 
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 7fb172912..62d3c3c36 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -20,11 +20,6 @@
 extern "C" {
 #endif
 
-/* These are not C++-aware. */
-#include <linux/vhost.h>
-#include <linux/virtio_ring.h>
-#include <linux/virtio_net.h>
-
 #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
@@ -72,6 +67,10 @@ extern "C" {
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
 #endif
 
+#ifndef VHOST_F_LOG_ALL
+#define VHOST_F_LOG_ALL			26
+#endif
+
 /** Indicate whether protocol features negotiation is supported. */
 #ifndef VHOST_USER_F_PROTOCOL_FEATURES
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  8:34   ` Ye Xiaolong
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file Maxime Coquelin
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin, xiaolong.ye

There might not have any VHOST_USER_SET_VRING_CALL requests
sent once virtio device is ready. When it happens, the vDPA
device's dev_conf() callback may never be called.

Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call message")
Cc: stable@dpdk.org
Cc: xiaolong.ye@intel.com

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 0b72648a5..b1ea80c52 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
 	did = dev->vdpa_dev_id;
 	vdpa_dev = rte_vdpa_get_device(did);
 	if (vdpa_dev && virtio_is_ready(dev) &&
-			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
-			msg.request.master == VHOST_USER_SET_VRING_CALL) {
+			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion Maxime Coquelin
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:05   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration Maxime Coquelin
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

Virtio-vdpa driver needs to implement the control path,
so move related functions to virtqueue file so that it
can be used by both Virtio PMD and Virtio-vdpa drivers.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_ethdev.c | 252 ----------------------------
 drivers/net/virtio/virtqueue.c     | 255 +++++++++++++++++++++++++++++
 drivers/net/virtio/virtqueue.h     |   5 +
 3 files changed, 260 insertions(+), 252 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index f96588b9d..3682ee318 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -140,226 +140,6 @@ static const struct rte_virtio_xstats_name_off rte_virtio_txq_stat_strings[] = {
 
 struct virtio_hw_internal virtio_hw_internal[RTE_MAX_ETHPORTS];
 
-static struct virtio_pmd_ctrl *
-virtio_send_command_packed(struct virtnet_ctl *cvq,
-			   struct virtio_pmd_ctrl *ctrl,
-			   int *dlen, int pkt_num)
-{
-	struct virtqueue *vq = cvq->vq;
-	int head;
-	struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
-	struct virtio_pmd_ctrl *result;
-	uint16_t flags;
-	int sum = 0;
-	int nb_descs = 0;
-	int k;
-
-	/*
-	 * Format is enforced in qemu code:
-	 * One TX packet for header;
-	 * At least one TX packet per argument;
-	 * One RX packet for ACK.
-	 */
-	head = vq->vq_avail_idx;
-	flags = vq->vq_packed.cached_flags;
-	desc[head].addr = cvq->virtio_net_hdr_mem;
-	desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_free_cnt--;
-	nb_descs++;
-	if (++vq->vq_avail_idx >= vq->vq_nentries) {
-		vq->vq_avail_idx -= vq->vq_nentries;
-		vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED;
-	}
-
-	for (k = 0; k < pkt_num; k++) {
-		desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem
-			+ sizeof(struct virtio_net_ctrl_hdr)
-			+ sizeof(ctrl->status) + sizeof(uint8_t) * sum;
-		desc[vq->vq_avail_idx].len = dlen[k];
-		desc[vq->vq_avail_idx].flags = VRING_DESC_F_NEXT |
-			vq->vq_packed.cached_flags;
-		sum += dlen[k];
-		vq->vq_free_cnt--;
-		nb_descs++;
-		if (++vq->vq_avail_idx >= vq->vq_nentries) {
-			vq->vq_avail_idx -= vq->vq_nentries;
-			vq->vq_packed.cached_flags ^=
-				VRING_PACKED_DESC_F_AVAIL_USED;
-		}
-	}
-
-	desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem
-		+ sizeof(struct virtio_net_ctrl_hdr);
-	desc[vq->vq_avail_idx].len = sizeof(ctrl->status);
-	desc[vq->vq_avail_idx].flags = VRING_DESC_F_WRITE |
-		vq->vq_packed.cached_flags;
-	vq->vq_free_cnt--;
-	nb_descs++;
-	if (++vq->vq_avail_idx >= vq->vq_nentries) {
-		vq->vq_avail_idx -= vq->vq_nentries;
-		vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED;
-	}
-
-	virtio_wmb(vq->hw->weak_barriers);
-	desc[head].flags = VRING_DESC_F_NEXT | flags;
-
-	virtio_wmb(vq->hw->weak_barriers);
-	virtqueue_notify(vq);
-
-	/* wait for used descriptors in virtqueue */
-	while (!desc_is_used(&desc[head], vq))
-		usleep(100);
-
-	virtio_rmb(vq->hw->weak_barriers);
-
-	/* now get used descriptors */
-	vq->vq_free_cnt += nb_descs;
-	vq->vq_used_cons_idx += nb_descs;
-	if (vq->vq_used_cons_idx >= vq->vq_nentries) {
-		vq->vq_used_cons_idx -= vq->vq_nentries;
-		vq->vq_packed.used_wrap_counter ^= 1;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\n"
-			"vq->vq_avail_idx=%d\n"
-			"vq->vq_used_cons_idx=%d\n"
-			"vq->vq_packed.cached_flags=0x%x\n"
-			"vq->vq_packed.used_wrap_counter=%d\n",
-			vq->vq_free_cnt,
-			vq->vq_avail_idx,
-			vq->vq_used_cons_idx,
-			vq->vq_packed.cached_flags,
-			vq->vq_packed.used_wrap_counter);
-
-	result = cvq->virtio_net_hdr_mz->addr;
-	return result;
-}
-
-static struct virtio_pmd_ctrl *
-virtio_send_command_split(struct virtnet_ctl *cvq,
-			  struct virtio_pmd_ctrl *ctrl,
-			  int *dlen, int pkt_num)
-{
-	struct virtio_pmd_ctrl *result;
-	struct virtqueue *vq = cvq->vq;
-	uint32_t head, i;
-	int k, sum = 0;
-
-	head = vq->vq_desc_head_idx;
-
-	/*
-	 * Format is enforced in qemu code:
-	 * One TX packet for header;
-	 * At least one TX packet per argument;
-	 * One RX packet for ACK.
-	 */
-	vq->vq_split.ring.desc[head].flags = VRING_DESC_F_NEXT;
-	vq->vq_split.ring.desc[head].addr = cvq->virtio_net_hdr_mem;
-	vq->vq_split.ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_free_cnt--;
-	i = vq->vq_split.ring.desc[head].next;
-
-	for (k = 0; k < pkt_num; k++) {
-		vq->vq_split.ring.desc[i].flags = VRING_DESC_F_NEXT;
-		vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem
-			+ sizeof(struct virtio_net_ctrl_hdr)
-			+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
-		vq->vq_split.ring.desc[i].len = dlen[k];
-		sum += dlen[k];
-		vq->vq_free_cnt--;
-		i = vq->vq_split.ring.desc[i].next;
-	}
-
-	vq->vq_split.ring.desc[i].flags = VRING_DESC_F_WRITE;
-	vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem
-			+ sizeof(struct virtio_net_ctrl_hdr);
-	vq->vq_split.ring.desc[i].len = sizeof(ctrl->status);
-	vq->vq_free_cnt--;
-
-	vq->vq_desc_head_idx = vq->vq_split.ring.desc[i].next;
-
-	vq_update_avail_ring(vq, head);
-	vq_update_avail_idx(vq);
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
-
-	virtqueue_notify(vq);
-
-	rte_rmb();
-	while (VIRTQUEUE_NUSED(vq) == 0) {
-		rte_rmb();
-		usleep(100);
-	}
-
-	while (VIRTQUEUE_NUSED(vq)) {
-		uint32_t idx, desc_idx, used_idx;
-		struct vring_used_elem *uep;
-
-		used_idx = (uint32_t)(vq->vq_used_cons_idx
-				& (vq->vq_nentries - 1));
-		uep = &vq->vq_split.ring.used->ring[used_idx];
-		idx = (uint32_t) uep->id;
-		desc_idx = idx;
-
-		while (vq->vq_split.ring.desc[desc_idx].flags &
-				VRING_DESC_F_NEXT) {
-			desc_idx = vq->vq_split.ring.desc[desc_idx].next;
-			vq->vq_free_cnt++;
-		}
-
-		vq->vq_split.ring.desc[desc_idx].next = vq->vq_desc_head_idx;
-		vq->vq_desc_head_idx = idx;
-
-		vq->vq_used_cons_idx++;
-		vq->vq_free_cnt++;
-	}
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
-			vq->vq_free_cnt, vq->vq_desc_head_idx);
-
-	result = cvq->virtio_net_hdr_mz->addr;
-	return result;
-}
-
-static int
-virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl,
-		    int *dlen, int pkt_num)
-{
-	virtio_net_ctrl_ack status = ~0;
-	struct virtio_pmd_ctrl *result;
-	struct virtqueue *vq;
-
-	ctrl->status = status;
-
-	if (!cvq || !cvq->vq) {
-		PMD_INIT_LOG(ERR, "Control queue is not supported.");
-		return -1;
-	}
-
-	rte_spinlock_lock(&cvq->lock);
-	vq = cvq->vq;
-
-	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
-		"vq->hw->cvq = %p vq = %p",
-		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
-
-	if (vq->vq_free_cnt < pkt_num + 2 || pkt_num < 1) {
-		rte_spinlock_unlock(&cvq->lock);
-		return -1;
-	}
-
-	memcpy(cvq->virtio_net_hdr_mz->addr, ctrl,
-		sizeof(struct virtio_pmd_ctrl));
-
-	if (vtpci_packed_queue(vq->hw))
-		result = virtio_send_command_packed(cvq, ctrl, dlen, pkt_num);
-	else
-		result = virtio_send_command_split(cvq, ctrl, dlen, pkt_num);
-
-	rte_spinlock_unlock(&cvq->lock);
-	return result->status;
-}
-
 static int
 virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
@@ -401,38 +181,6 @@ virtio_get_nr_vq(struct virtio_hw *hw)
 	return nr_vq;
 }
 
-static void
-virtio_init_vring(struct virtqueue *vq)
-{
-	int size = vq->vq_nentries;
-	uint8_t *ring_mem = vq->vq_ring_virt_mem;
-
-	PMD_INIT_FUNC_TRACE();
-
-	memset(ring_mem, 0, vq->vq_ring_size);
-
-	vq->vq_used_cons_idx = 0;
-	vq->vq_desc_head_idx = 0;
-	vq->vq_avail_idx = 0;
-	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
-	vq->vq_free_cnt = vq->vq_nentries;
-	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
-	if (vtpci_packed_queue(vq->hw)) {
-		vring_init_packed(&vq->vq_packed.ring, ring_mem,
-				  VIRTIO_PCI_VRING_ALIGN, size);
-		vring_desc_init_packed(vq, size);
-	} else {
-		struct vring *vr = &vq->vq_split.ring;
-
-		vring_init_split(vr, ring_mem, VIRTIO_PCI_VRING_ALIGN, size);
-		vring_desc_init_split(vr->desc, size);
-	}
-	/*
-	 * Disable device(host) interrupting guest
-	 */
-	virtqueue_disable_intr(vq);
-}
-
 static int
 virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_idx)
 {
diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
index 5ff1e3587..db630e07c 100644
--- a/drivers/net/virtio/virtqueue.c
+++ b/drivers/net/virtio/virtqueue.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2015 Intel Corporation
  */
 #include <stdint.h>
+#include <unistd.h>
 
 #include <rte_mbuf.h>
 
@@ -141,3 +142,257 @@ virtqueue_rxvq_flush(struct virtqueue *vq)
 	else
 		virtqueue_rxvq_flush_split(vq);
 }
+
+static struct virtio_pmd_ctrl *
+virtio_send_command_packed(struct virtnet_ctl *cvq,
+			   struct virtio_pmd_ctrl *ctrl,
+			   int *dlen, int pkt_num)
+{
+	struct virtqueue *vq = cvq->vq;
+	int head;
+	struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
+	struct virtio_pmd_ctrl *result;
+	uint16_t flags;
+	int sum = 0;
+	int nb_descs = 0;
+	int k;
+
+	/*
+	 * Format is enforced in qemu code:
+	 * One TX packet for header;
+	 * At least one TX packet per argument;
+	 * One RX packet for ACK.
+	 */
+	head = vq->vq_avail_idx;
+	flags = vq->vq_packed.cached_flags;
+	desc[head].addr = cvq->virtio_net_hdr_mem;
+	desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_free_cnt--;
+	nb_descs++;
+	if (++vq->vq_avail_idx >= vq->vq_nentries) {
+		vq->vq_avail_idx -= vq->vq_nentries;
+		vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED;
+	}
+
+	for (k = 0; k < pkt_num; k++) {
+		desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem
+			+ sizeof(struct virtio_net_ctrl_hdr)
+			+ sizeof(ctrl->status) + sizeof(uint8_t) * sum;
+		desc[vq->vq_avail_idx].len = dlen[k];
+		desc[vq->vq_avail_idx].flags = VRING_DESC_F_NEXT |
+			vq->vq_packed.cached_flags;
+		sum += dlen[k];
+		vq->vq_free_cnt--;
+		nb_descs++;
+		if (++vq->vq_avail_idx >= vq->vq_nentries) {
+			vq->vq_avail_idx -= vq->vq_nentries;
+			vq->vq_packed.cached_flags ^=
+				VRING_PACKED_DESC_F_AVAIL_USED;
+		}
+	}
+
+	desc[vq->vq_avail_idx].addr = cvq->virtio_net_hdr_mem
+		+ sizeof(struct virtio_net_ctrl_hdr);
+	desc[vq->vq_avail_idx].len = sizeof(ctrl->status);
+	desc[vq->vq_avail_idx].flags = VRING_DESC_F_WRITE |
+		vq->vq_packed.cached_flags;
+	vq->vq_free_cnt--;
+	nb_descs++;
+	if (++vq->vq_avail_idx >= vq->vq_nentries) {
+		vq->vq_avail_idx -= vq->vq_nentries;
+		vq->vq_packed.cached_flags ^= VRING_PACKED_DESC_F_AVAIL_USED;
+	}
+
+	virtio_wmb(vq->hw->weak_barriers);
+	desc[head].flags = VRING_DESC_F_NEXT | flags;
+
+	virtio_wmb(vq->hw->weak_barriers);
+	virtqueue_notify(vq);
+
+	/* wait for used descriptors in virtqueue */
+	while (!desc_is_used(&desc[head], vq))
+		usleep(100);
+
+	virtio_rmb(vq->hw->weak_barriers);
+
+	/* now get used descriptors */
+	vq->vq_free_cnt += nb_descs;
+	vq->vq_used_cons_idx += nb_descs;
+	if (vq->vq_used_cons_idx >= vq->vq_nentries) {
+		vq->vq_used_cons_idx -= vq->vq_nentries;
+		vq->vq_packed.used_wrap_counter ^= 1;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\n"
+			"vq->vq_avail_idx=%d\n"
+			"vq->vq_used_cons_idx=%d\n"
+			"vq->vq_packed.cached_flags=0x%x\n"
+			"vq->vq_packed.used_wrap_counter=%d\n",
+			vq->vq_free_cnt,
+			vq->vq_avail_idx,
+			vq->vq_used_cons_idx,
+			vq->vq_packed.cached_flags,
+			vq->vq_packed.used_wrap_counter);
+
+	result = cvq->virtio_net_hdr_mz->addr;
+	return result;
+}
+
+static struct virtio_pmd_ctrl *
+virtio_send_command_split(struct virtnet_ctl *cvq,
+			  struct virtio_pmd_ctrl *ctrl,
+			  int *dlen, int pkt_num)
+{
+	struct virtio_pmd_ctrl *result;
+	struct virtqueue *vq = cvq->vq;
+	uint32_t head, i;
+	int k, sum = 0;
+
+	head = vq->vq_desc_head_idx;
+
+	/*
+	 * Format is enforced in qemu code:
+	 * One TX packet for header;
+	 * At least one TX packet per argument;
+	 * One RX packet for ACK.
+	 */
+	vq->vq_split.ring.desc[head].flags = VRING_DESC_F_NEXT;
+	vq->vq_split.ring.desc[head].addr = cvq->virtio_net_hdr_mem;
+	vq->vq_split.ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_free_cnt--;
+	i = vq->vq_split.ring.desc[head].next;
+
+	for (k = 0; k < pkt_num; k++) {
+		vq->vq_split.ring.desc[i].flags = VRING_DESC_F_NEXT;
+		vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem
+			+ sizeof(struct virtio_net_ctrl_hdr)
+			+ sizeof(ctrl->status) + sizeof(uint8_t) * sum;
+		vq->vq_split.ring.desc[i].len = dlen[k];
+		sum += dlen[k];
+		vq->vq_free_cnt--;
+		i = vq->vq_split.ring.desc[i].next;
+	}
+
+	vq->vq_split.ring.desc[i].flags = VRING_DESC_F_WRITE;
+	vq->vq_split.ring.desc[i].addr = cvq->virtio_net_hdr_mem
+			+ sizeof(struct virtio_net_ctrl_hdr);
+	vq->vq_split.ring.desc[i].len = sizeof(ctrl->status);
+	vq->vq_free_cnt--;
+
+	vq->vq_desc_head_idx = vq->vq_split.ring.desc[i].next;
+
+	vq_update_avail_ring(vq, head);
+	vq_update_avail_idx(vq);
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d", vq->vq_queue_index);
+
+	virtqueue_notify(vq);
+
+	rte_rmb();
+	while (VIRTQUEUE_NUSED(vq) == 0) {
+		rte_rmb();
+		usleep(100);
+	}
+
+	while (VIRTQUEUE_NUSED(vq)) {
+		uint32_t idx, desc_idx, used_idx;
+		struct vring_used_elem *uep;
+
+		used_idx = (uint32_t)(vq->vq_used_cons_idx
+				& (vq->vq_nentries - 1));
+		uep = &vq->vq_split.ring.used->ring[used_idx];
+		idx = (uint32_t)uep->id;
+		desc_idx = idx;
+
+		while (vq->vq_split.ring.desc[desc_idx].flags &
+				VRING_DESC_F_NEXT) {
+			desc_idx = vq->vq_split.ring.desc[desc_idx].next;
+			vq->vq_free_cnt++;
+		}
+
+		vq->vq_split.ring.desc[desc_idx].next = vq->vq_desc_head_idx;
+		vq->vq_desc_head_idx = idx;
+
+		vq->vq_used_cons_idx++;
+		vq->vq_free_cnt++;
+	}
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_free_cnt=%d\nvq->vq_desc_head_idx=%d",
+			vq->vq_free_cnt, vq->vq_desc_head_idx);
+
+	result = cvq->virtio_net_hdr_mz->addr;
+	return result;
+}
+
+int
+virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl,
+		    int *dlen, int pkt_num)
+{
+	virtio_net_ctrl_ack status = ~0;
+	struct virtio_pmd_ctrl *result;
+	struct virtqueue *vq;
+
+	ctrl->status = status;
+
+	if (!cvq || !cvq->vq) {
+		PMD_INIT_LOG(ERR, "Control queue is not supported.");
+		return -1;
+	}
+
+	rte_spinlock_lock(&cvq->lock);
+	vq = cvq->vq;
+
+	PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
+		"vq->hw->cvq = %p vq = %p",
+		vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+	if (vq->vq_free_cnt < pkt_num + 2 || pkt_num < 1) {
+		rte_spinlock_unlock(&cvq->lock);
+		return -1;
+	}
+
+	memcpy(cvq->virtio_net_hdr_mz->addr, ctrl,
+		sizeof(struct virtio_pmd_ctrl));
+
+	if (vtpci_packed_queue(vq->hw))
+		result = virtio_send_command_packed(cvq, ctrl, dlen, pkt_num);
+	else
+		result = virtio_send_command_split(cvq, ctrl, dlen, pkt_num);
+
+	rte_spinlock_unlock(&cvq->lock);
+	return result->status;
+}
+
+void
+virtio_init_vring(struct virtqueue *vq)
+{
+	int size = vq->vq_nentries;
+	uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+	PMD_INIT_FUNC_TRACE();
+
+	memset(ring_mem, 0, vq->vq_ring_size);
+
+	vq->vq_used_cons_idx = 0;
+	vq->vq_desc_head_idx = 0;
+	vq->vq_avail_idx = 0;
+	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+	vq->vq_free_cnt = vq->vq_nentries;
+	if (vq->vq_descx)
+		memset(vq->vq_descx, 0,
+			sizeof(struct vq_desc_extra) * vq->vq_nentries);
+	if (vtpci_packed_queue(vq->hw)) {
+		vring_init_packed(&vq->vq_packed.ring, ring_mem,
+				  VIRTIO_PCI_VRING_ALIGN, size);
+		vring_desc_init_packed(vq, size);
+	} else {
+		struct vring *vr = &vq->vq_split.ring;
+
+		vring_init_split(vr, ring_mem, VIRTIO_PCI_VRING_ALIGN, size);
+		vring_desc_init_split(vr->desc, size);
+	}
+	/*
+	 * Disable device(host) interrupting guest
+	 */
+	virtqueue_disable_intr(vq);
+}
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index c6dd4a347..4d8d069c7 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -480,6 +480,11 @@ virtqueue_notify(struct virtqueue *vq)
 	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
 }
 
+int virtio_send_command(struct virtnet_ctl *cvq, struct virtio_pmd_ctrl *ctrl,
+		    int *dlen, int pkt_num);
+
+void virtio_init_vring(struct virtqueue *vq);
+
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTQUEUE_DUMP(vq) do { \
 	uint16_t used_idx, nused; \
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (2 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:14   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct Maxime Coquelin
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

The Virtio PCI susbsytem IDs need to be specified to
prevent it to probe IFC vDPA VFs.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index a38cb45ad..56f89a454 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -19,6 +19,7 @@ struct virtnet_ctl;
 #define VIRTIO_PCI_VENDORID     0x1AF4
 #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000
 #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041
+#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100
 
 /* VirtIO ABI version, this must match exactly. */
 #define VIRTIO_PCI_ABI_VERSION 0
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (3 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:17   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver Maxime Coquelin
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

SAving the notify bar ID in the virtio HW struct will be
used by the virtio-vdpa driver in its .get_notify_area()
callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_pci.c | 6 ++++--
 drivers/net/virtio/virtio_pci.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 4468e89cb..762847d70 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -617,12 +617,14 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
 			ret = rte_pci_read_config(dev,
 					&hw->notify_off_multiplier,
 					4, pos + sizeof(cap));
-			if (ret != 4)
+			if (ret != 4) {
 				PMD_INIT_LOG(DEBUG,
 					"failed to read notify_off_multiplier, ret %d",
 					ret);
-			else
+			} else {
 				hw->notify_base = get_cfg_addr(dev, &cap);
+				hw->notify_region = cap.bar;
+			}
 			break;
 		case VIRTIO_PCI_CAP_DEVICE_CFG:
 			hw->dev_cfg = get_cfg_addr(dev, &cap);
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 56f89a454..afdcc906f 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -256,6 +256,7 @@ struct virtio_hw {
 	uint32_t    notify_off_multiplier;
 	uint8_t     *isr;
 	uint16_t    *notify_base;
+	uint8_t     notify_region;
 	struct virtio_pci_common_cfg *common_cfg;
 	struct virtio_net_config *dev_cfg;
 	void	    *virtio_user_dev;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (4 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:27   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue Maxime Coquelin
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch adds the base for the Virtio vDPA driver.
This driver can be used either for development purpose, when
probed with a para-virtualized Virtio device from a guest, or
it can be used with real HW supporting full Virtio offload
(both data and control paths).

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 config/common_linux              |   1 +
 drivers/net/virtio/Makefile      |   4 +
 drivers/net/virtio/meson.build   |   3 +-
 drivers/net/virtio/virtio_vdpa.c | 301 +++++++++++++++++++++++++++++++
 4 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/virtio/virtio_vdpa.c

diff --git a/config/common_linux b/config/common_linux
index 6e252553a..293a8ff48 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -27,6 +27,7 @@ CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y
 CONFIG_RTE_LIBRTE_NFP_PMD=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
+CONFIG_RTE_VIRTIO_VDPA=y
 CONFIG_RTE_PROC_INFO=y
 
 CONFIG_RTE_LIBRTE_VMBUS=y
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 6c2c9967b..0760074ad 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -45,4 +45,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
 endif
 
+ifeq ($(CONFIG_RTE_VIRTIO_VDPA),y)
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_vdpa.c
+endif
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build
index 794905401..f5f1f6e68 100644
--- a/drivers/net/virtio/meson.build
+++ b/drivers/net/virtio/meson.build
@@ -6,8 +6,9 @@ sources += files('virtio_ethdev.c',
 	'virtio_pci.c',
 	'virtio_rxtx.c',
 	'virtio_rxtx_simple.c',
+	'virtio_vdpa.c',
 	'virtqueue.c')
-deps += ['kvargs', 'bus_pci']
+deps += ['kvargs', 'bus_pci', 'vhost']
 
 if arch_subdir == 'x86'
 	sources += files('virtio_rxtx_simple_sse.c')
diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
new file mode 100644
index 000000000..bbb417b94
--- /dev/null
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -0,0 +1,301 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Red Hat, Inc.
+ */
+
+#include <unistd.h>
+#include <sys/ioctl.h>
+
+#include <rte_kvargs.h>
+#include <rte_malloc.h>
+#include <rte_vdpa.h>
+#include <rte_vfio.h>
+#include <rte_vhost.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
+#define DRV_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_vdpa_logtype, \
+		"VIRTIO_VDPA %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_VDPA_MODE		"vdpa"
+
+static const char * const virtio_vdpa_valid_arguments[] = {
+	VIRTIO_VDPA_MODE,
+	NULL
+};
+
+static int virtio_vdpa_logtype;
+
+struct virtio_vdpa_device {
+	struct rte_vdpa_dev_addr dev_addr;
+	struct rte_pci_device *pdev;
+	struct virtio_hw hw;
+	int vfio_container_fd;
+	int vfio_group_fd;
+	int vfio_dev_fd;
+	int iommu_group_num;
+	int vid;
+	int did;
+	uint16_t max_queue_pairs;
+	bool has_ctrl_vq;
+	struct virtqueue *vqs;
+	struct virtqueue *cvq;
+	rte_spinlock_t lock;
+};
+
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct virtio_vdpa_device *dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
+static struct internal_list *
+find_internal_resource_by_dev(struct rte_pci_device *pdev)
+{
+	int found = 0;
+	struct internal_list *list;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		if (pdev == list->dev->pdev) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+static int
+virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
+{
+	struct rte_pci_device *pdev = dev->pdev;
+	char devname[RTE_DEV_NAME_MAX_LEN] = {0};
+	int iommu_group_num;
+
+	dev->vfio_dev_fd = -1;
+	dev->vfio_group_fd = -1;
+	dev->vfio_container_fd = -1;
+	dev->iommu_group_num = -1;
+
+	rte_pci_device_name(&pdev->addr, devname, RTE_DEV_NAME_MAX_LEN);
+	rte_vfio_get_group_num(rte_pci_get_sysfs_path(), devname,
+			&iommu_group_num);
+
+	dev->vfio_container_fd = rte_vfio_container_create();
+	if (dev->vfio_container_fd < 0)
+		return -1;
+
+	dev->vfio_group_fd =
+		rte_vfio_container_group_bind(dev->vfio_container_fd,
+			iommu_group_num);
+	if (dev->vfio_group_fd < 0)
+		goto err_container_destroy;
+
+	if (rte_pci_map_device(pdev))
+		goto err_container_unbind;
+
+	dev->vfio_dev_fd = pdev->intr_handle.vfio_dev_fd;
+	dev->iommu_group_num = iommu_group_num;
+
+	return 0;
+
+err_container_unbind:
+	rte_vfio_container_group_unbind(dev->vfio_container_fd,
+			iommu_group_num);
+err_container_destroy:
+	rte_vfio_container_destroy(dev->vfio_container_fd);
+
+	dev->vfio_dev_fd = -1;
+	dev->vfio_group_fd = -1;
+	dev->vfio_container_fd = -1;
+	dev->iommu_group_num = -1;
+
+	return -1;
+}
+
+static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
+};
+
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
+static int
+virtio_vdpa_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+		struct rte_pci_device *pci_dev)
+{
+	struct virtio_vdpa_device *dev;
+	struct internal_list *list = NULL;
+	struct rte_kvargs *kvlist = NULL;
+	int ret, vdpa_mode = 0;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	if (!pci_dev->device.devargs)
+		return -1;
+
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			virtio_vdpa_valid_arguments);
+	if (kvlist == NULL)
+		return -1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, VIRTIO_VDPA_MODE) == 0)
+		goto err_free_kvargs;
+
+	ret = rte_kvargs_process(kvlist, VIRTIO_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0)
+		goto err_free_kvargs;
+
+	list = rte_zmalloc("virtio_vdpa", sizeof(*list), 0);
+	if (list == NULL)
+		goto err_free_kvargs;
+
+	dev = rte_zmalloc("virtio_vdpa", sizeof(*dev), 0);
+	if (!dev)
+		goto err_free_list;
+
+	dev->pdev = pci_dev;
+	rte_spinlock_init(&dev->lock);
+
+	if (virtio_vdpa_vfio_setup(dev) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto err_free_vvdev;
+	}
+
+	dev->dev_addr.pci_addr = pci_dev->addr;
+	dev->dev_addr.type = PCI_ADDR;
+	dev->max_queue_pairs = 1;
+	list->dev = dev;
+
+	if (vtpci_init(pci_dev, &dev->hw))
+		goto err_free_vfio;
+
+	dev->did = rte_vdpa_register_device(&dev->dev_addr,
+				&virtio_vdpa_ops);
+
+	if (dev->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
+		goto err_free_vfio;
+	}
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_kvargs_free(kvlist);
+
+	return 0;
+
+err_free_vfio:
+	rte_vfio_container_destroy(dev->vfio_container_fd);
+err_free_vvdev:
+	rte_free(dev);
+err_free_list:
+	rte_free(list);
+err_free_kvargs:
+	rte_kvargs_free(kvlist);
+
+	return 1;
+}
+
+static int
+virtio_vdpa_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct virtio_vdpa_device *dev;
+	struct internal_list *list;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	list = find_internal_resource_by_dev(pci_dev);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device: %s", pci_dev->name);
+		return -1;
+	}
+
+	dev = list->dev;
+
+	rte_vdpa_unregister_device(dev->did);
+	rte_pci_unmap_device(dev->pdev);
+	rte_vfio_container_group_unbind(dev->vfio_container_fd,
+			dev->iommu_group_num);
+	rte_vfio_container_destroy(dev->vfio_container_fd);
+
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_REMOVE(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
+	rte_free(dev->vqs);
+	rte_free(list);
+	rte_free(dev);
+
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_virtio_vdpa_map[] = {
+	{ .class_id = RTE_CLASS_ANY_ID,
+	  .vendor_id = VIRTIO_PCI_VENDORID,
+	  .device_id = VIRTIO_PCI_LEGACY_DEVICEID_NET,
+	  .subsystem_vendor_id = VIRTIO_PCI_VENDORID,
+	  .subsystem_device_id = VIRTIO_PCI_SUBSY_DEVICEID_NET,
+	},
+	{ .class_id = RTE_CLASS_ANY_ID,
+	  .vendor_id = VIRTIO_PCI_VENDORID,
+	  .device_id = VIRTIO_PCI_MODERN_DEVICEID_NET,
+	  .subsystem_vendor_id = VIRTIO_PCI_VENDORID,
+	  .subsystem_device_id = VIRTIO_PCI_SUBSY_DEVICEID_NET,
+	},
+	{ .vendor_id = 0, /* sentinel */
+	},
+};
+
+static struct rte_pci_driver rte_virtio_vdpa = {
+	.id_table = pci_id_virtio_vdpa_map,
+	.drv_flags = 0,
+	.probe = virtio_vdpa_pci_probe,
+	.remove = virtio_vdpa_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa);
+RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci");
+
+RTE_INIT(virtio_vdpa_init_log)
+{
+	virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa");
+	if (virtio_vdpa_logtype >= 0)
+		rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE);
+}
+
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (5 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:32   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features Maxime Coquelin
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .get_queue_num() callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index bbb417b94..07ff1e090 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -60,6 +60,29 @@ static struct internal_list_head internal_list =
 
 static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
+static struct internal_list *
+find_internal_resource_by_did(int did)
+{
+	int found = 0;
+	struct internal_list *list;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		if (did == list->dev->did) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
 static struct internal_list *
 find_internal_resource_by_dev(struct rte_pci_device *pdev)
 {
@@ -131,7 +154,27 @@ virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
 	return -1;
 }
 
+static int
+virtio_vdpa_get_queue_num(int did, uint32_t *queue_num)
+{
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+
+	*queue_num = dev->max_queue_pairs;
+
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
+	.get_queue_num = virtio_vdpa_get_queue_num,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (6 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:43   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features Maxime Coquelin
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .get_features() callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index 07ff1e090..9e2af8313 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num)
 	return 0;
 }
 
+static int
+virtio_vdpa_get_features(int did, uint64_t *features)
+{
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+
+	*features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw);
+
+	if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) {
+		DRV_LOG(ERR, "Device does not support IOMMU");
+		return -1;
+	}
+
+	if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ))
+		dev->has_ctrl_vq = true;
+	else
+		dev->has_ctrl_vq = false;
+
+	*features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES);
+
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
+	.get_features = virtio_vdpa_get_features,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (7 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-02  6:46   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device Maxime Coquelin
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .get_protocol_features()
callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index 9e2af8313..fc52a8e92 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -204,9 +204,22 @@ virtio_vdpa_get_features(int did, uint64_t *features)
 	return 0;
 }
 
+#define VDPA_SUPPORTED_PROTOCOL_FEATURES \
+		(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \
+		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \
+		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \
+		 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER)
+static int
+virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
+{
+	*features = VDPA_SUPPORTED_PROTOCOL_FEATURES;
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
+	.get_protocol_features = virtio_vdpa_get_protocol_features,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (8 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-03  5:30   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close " Maxime Coquelin
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

In order to support multi-queue, we need to implement the control
path. The problem is that both the Vhost-user master and slave use
VAs in their processes address spaces as IOVAs, which creates
collusions between the data rings IOVAs managed the master, and
the Control ring IOVAs. The trick here is to remmap the Control
ring memory to another range, after the slave is aware of master's
ranges.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++
 1 file changed, 255 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index fc52a8e92..13b4dd07d 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev)
 	return list;
 }
 
+static int
+virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map,
+		uint64_t iova)
+{
+	const struct rte_memzone *mz;
+	int ret;
+
+	/*
+	 * IOVAs are processes VAs. We cannot use them as the Data and Control
+	 * paths are run in different processes, which may (does) lead to
+	 * collusions. The trick here is to fixup Ctrl path IOVAs so that they
+	 * start after the Data path ranges.
+	 */
+	if (do_map) {
+		mz = dev->cvq->cq.mz;
+		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
+				(uint64_t)(uintptr_t)mz->addr,
+				iova, mz->len);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
+			return ret;
+		}
+
+		dev->cvq->vq_ring_mem = iova;
+		iova += mz->len;
+
+		mz = dev->cvq->cq.virtio_net_hdr_mz;
+		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
+				(uint64_t)(uintptr_t)mz->addr,
+				iova, mz->len);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
+			return ret;
+		}
+
+		dev->cvq->cq.virtio_net_hdr_mem = iova;
+	} else {
+		mz = dev->cvq->cq.mz;
+		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
+				(uint64_t)(uintptr_t)mz->addr,
+				iova, mz->len);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
+			return ret;
+		}
+
+		dev->cvq->vq_ring_mem = 0;
+		iova += mz->len;
+
+		mz = dev->cvq->cq.virtio_net_hdr_mz;
+		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
+				(uint64_t)(uintptr_t)mz->addr,
+				iova, mz->len);
+		if (ret < 0) {
+			DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
+			return ret;
+		}
+
+		dev->cvq->cq.virtio_net_hdr_mem = 0;
+	}
+
+	return 0;
+}
+
+static int
+virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
+{
+	uint32_t i;
+	int ret;
+	struct rte_vhost_memory *mem = NULL;
+	int vfio_container_fd;
+	uint64_t avail_iova = 0;
+
+	ret = rte_vhost_get_mem_table(dev->vid, &mem);
+	if (ret < 0 || !mem) {
+		DRV_LOG(ERR, "failed to get VM memory layout.");
+		return ret;
+	}
+
+	vfio_container_fd = dev->vfio_container_fd;
+
+	for (i = 0; i < mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg;
+
+		reg = &mem->regions[i];
+		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
+			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
+			do_map ? "DMA map" : "DMA unmap", i,
+			reg->host_user_addr, reg->guest_phys_addr, reg->size);
+
+		if (reg->guest_phys_addr + reg->size > avail_iova)
+			avail_iova = reg->guest_phys_addr + reg->size;
+
+		if (do_map) {
+			ret = rte_vfio_container_dma_map(vfio_container_fd,
+				reg->host_user_addr, reg->guest_phys_addr,
+				reg->size);
+			if (ret < 0) {
+				DRV_LOG(ERR, "DMA map failed.");
+				goto exit;
+			}
+		} else {
+			ret = rte_vfio_container_dma_unmap(vfio_container_fd,
+				reg->host_user_addr, reg->guest_phys_addr,
+				reg->size);
+			if (ret < 0) {
+				DRV_LOG(ERR, "DMA unmap failed.");
+				goto exit;
+			}
+		}
+	}
+
+	if (dev->cvq)
+		ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova);
+
+exit:
+	free(mem);
+
+	return ret;
+}
+
 static int
 virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
 {
@@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
 	return 0;
 }
 
+static uint64_t
+hva_to_gpa(int vid, uint64_t hva)
+{
+	struct rte_vhost_memory *mem = NULL;
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+	uint64_t gpa = 0;
+
+	if (rte_vhost_get_mem_table(vid, &mem) < 0)
+		goto exit;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+
+		if (hva >= reg->host_user_addr &&
+				hva < reg->host_user_addr + reg->size) {
+			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
+			break;
+		}
+	}
+
+exit:
+	if (mem)
+		free(mem);
+	return gpa;
+}
+
+static int
+virtio_vdpa_start(struct virtio_vdpa_device *dev)
+{
+	struct virtio_hw *hw = &dev->hw;
+	int i, vid, nr_vring, ret;
+	struct rte_vhost_vring vr;
+	struct virtio_pmd_ctrl ctrl;
+	int dlen[1];
+
+	vid = dev->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+
+	if (dev->vqs)
+		rte_free(dev->vqs);
+
+	dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0);
+
+	for (i = 0; i < nr_vring; i++) {
+		struct virtqueue *vq = &dev->vqs[i];
+
+		rte_vhost_get_vhost_vring(vid, i, &vr);
+
+		vq->vq_queue_index = i;
+		vq->vq_nentries = vr.size;
+		vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc);
+		if (vq->vq_ring_mem  == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			ret = -1;
+			goto out_free_vqs;
+		}
+
+		ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
+		if (ret) {
+			DRV_LOG(ERR, "Fail to setup queue.");
+			goto out_free_vqs;
+		}
+	}
+
+	if (dev->cvq) {
+		ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
+		if (ret) {
+			DRV_LOG(ERR, "Fail to setup ctrl queue.");
+			goto out_free_vqs;
+		}
+	}
+
+	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+
+	if (!dev->cvq)
+		return 0;
+
+	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
+	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
+	memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
+
+	dlen[0] = sizeof(uint16_t);
+
+	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+	if (ret) {
+		DRV_LOG(ERR, "Multiqueue configured but send command "
+			  "failed, this is too late now...");
+		ret = -EINVAL;
+		goto out_free_vqs;
+	}
+
+	return 0;
+out_free_vqs:
+	rte_free(dev->vqs);
+
+	return ret;
+}
+
+static int
+virtio_vdpa_dev_config(int vid)
+{
+	int did, ret;
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+	dev->vid = vid;
+
+	rte_spinlock_lock(&dev->lock);
+
+	ret = virtio_vdpa_dma_map(dev, 1);
+	if (ret)
+		goto out_unlock;
+
+	ret = virtio_vdpa_start(dev);
+
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
+out_unlock:
+	rte_spinlock_unlock(&dev->lock);
+
+	return ret;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
 	.get_protocol_features = virtio_vdpa_get_protocol_features,
+	.dev_conf = virtio_vdpa_dev_config,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close the device
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (9 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device Maxime Coquelin
@ 2019-08-29  7:59 ` " Maxime Coquelin
  2019-09-02  7:07   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 12/15] net/virtio: add vDPA op to set features Maxime Coquelin
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .dev_close() callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index 13b4dd07d..691844906 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev)
 	return ret;
 }
 
+static void
+virtio_vdpa_stop(struct virtio_vdpa_device *dev)
+{
+	struct virtio_hw *hw = &dev->hw;
+	uint32_t i, nr_vring;
+	int vid = dev->vid;
+	struct rte_vhost_vring vr;
+	uint16_t last_used_idx, last_avail_idx;
+
+	nr_vring = rte_vhost_get_vring_num(vid);
+
+	vtpci_reset(hw);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vr);
+
+		last_used_idx = vr.used->idx;
+		last_avail_idx = vr.avail->idx;
+
+		rte_vhost_set_vring_base(vid, i, last_avail_idx,
+				last_used_idx);
+	}
+
+	rte_free(dev->vqs);
+	dev->vqs = NULL;
+}
+
 static int
 virtio_vdpa_dev_config(int vid)
 {
@@ -470,11 +497,36 @@ virtio_vdpa_dev_config(int vid)
 	return ret;
 }
 
+static int
+virtio_vdpa_dev_close(int vid)
+{
+	int did;
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+
+	rte_spinlock_lock(&dev->lock);
+	virtio_vdpa_stop(dev);
+	virtio_vdpa_dma_map(dev, 0);
+	rte_spinlock_unlock(&dev->lock);
+
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
 	.get_protocol_features = virtio_vdpa_get_protocol_features,
 	.dev_conf = virtio_vdpa_dev_config,
+	.dev_close = virtio_vdpa_dev_close,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 12/15] net/virtio: add vDPA op to set features
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (10 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close " Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs Maxime Coquelin
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

On top of that, it allocates the control virtqueue if
multiqueue has been negotiated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 155 +++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index 691844906..9b61688a1 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -521,12 +521,167 @@ virtio_vdpa_dev_close(int vid)
 	return 0;
 }
 
+
+static void
+virtio_vdpa_free_ctrl_vq(struct virtio_vdpa_device *dev)
+{
+	if (!dev->cvq)
+		return;
+
+	rte_memzone_free(dev->cvq->cq.virtio_net_hdr_mz);
+	rte_memzone_free(dev->cvq->cq.mz);
+	rte_free(dev->cvq);
+
+	dev->hw.cvq = NULL;
+}
+
+static int
+virtio_vdpa_allocate_ctrl_vq(struct virtio_vdpa_device *dev)
+{
+	struct virtio_hw *hw = &dev->hw;
+	char vq_name[VIRTQUEUE_MAX_NAME_SZ];
+	char vq_hdr_name[VIRTQUEUE_MAX_NAME_SZ];
+	int numa_node = dev->pdev->device.numa_node;
+	const struct rte_memzone *mz = NULL, *hdr_mz = NULL;
+	uint16_t ctrl_queue_idx = dev->max_queue_pairs * 2;
+	uint16_t ctrl_queue_sz;
+	int size, ret;
+
+	if (dev->cvq)
+		virtio_vdpa_free_ctrl_vq(dev);
+
+	ctrl_queue_sz = VTPCI_OPS(hw)->get_queue_num(hw, ctrl_queue_idx);
+	if (ctrl_queue_sz == 0) {
+		DRV_LOG(ERR, "Ctrl VQ does not exist");
+		return -EINVAL;
+	}
+
+	dev->cvq = rte_zmalloc_socket(vq_name, sizeof(*dev->cvq),
+			RTE_CACHE_LINE_SIZE, numa_node);
+	if (!dev->cvq)
+		return -ENOMEM;
+
+	dev->cvq->hw = &dev->hw;
+	dev->cvq->vq_queue_index = ctrl_queue_idx;
+	dev->cvq->vq_nentries = ctrl_queue_sz;
+
+	if (vtpci_packed_queue(hw)) {
+		dev->cvq->vq_packed.used_wrap_counter = 1;
+		dev->cvq->vq_packed.cached_flags = VRING_PACKED_DESC_F_AVAIL;
+		dev->cvq->vq_packed.event_flags_shadow = 0;
+	}
+
+	size = vring_size(hw, ctrl_queue_sz, VIRTIO_PCI_VRING_ALIGN);
+	dev->cvq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
+
+	snprintf(vq_name, sizeof(vq_name), "vdpa_ctrlvq%d",
+		 dev->did);
+
+	mz = rte_memzone_reserve_aligned(vq_name, dev->cvq->vq_ring_size,
+			numa_node, RTE_MEMZONE_IOVA_CONTIG,
+			VIRTIO_PCI_VRING_ALIGN);
+	if (mz == NULL) {
+		if (rte_errno == EEXIST)
+			mz = rte_memzone_lookup(vq_name);
+		if (mz == NULL) {
+			ret = -ENOMEM;
+			goto out_free_cvq;
+		}
+	}
+
+	memset(mz->addr, 0, mz->len);
+	dev->cvq->vq_ring_virt_mem = mz->addr;
+
+	virtio_init_vring(dev->cvq);
+
+	snprintf(vq_hdr_name, sizeof(vq_hdr_name), "vdpa_ctrlvq%d_hdr",
+		 dev->did);
+
+	hdr_mz = rte_memzone_reserve_aligned(vq_hdr_name, PAGE_SIZE * 2,
+			numa_node, RTE_MEMZONE_IOVA_CONTIG,
+			VIRTIO_PCI_VRING_ALIGN);
+	if (hdr_mz == NULL) {
+		if (rte_errno == EEXIST)
+			hdr_mz = rte_memzone_lookup(vq_hdr_name);
+		if (hdr_mz == NULL) {
+			ret = -ENOMEM;
+			goto out_free_mz;
+		}
+	}
+
+	memset(hdr_mz->addr, 0, hdr_mz->len);
+
+	dev->cvq->cq.vq = dev->cvq;
+	dev->cvq->cq.mz = mz;
+	dev->cvq->cq.virtio_net_hdr_mz = hdr_mz;
+	dev->hw.cvq = &dev->cvq->cq;
+
+	return 0;
+
+out_free_mz:
+	rte_memzone_free(mz);
+out_free_cvq:
+	rte_free(dev->cvq);
+	dev->cvq = NULL;
+
+	return ret;
+}
+
+static int
+virtio_vdpa_set_features(int vid)
+{
+	uint64_t features;
+	int did, ret;
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+	struct virtio_hw *hw;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+	hw = &dev->hw;
+	rte_vhost_get_negotiated_features(vid, &features);
+
+	features |= (1ULL << VIRTIO_F_IOMMU_PLATFORM);
+	if (dev->has_ctrl_vq)
+		features |= (1ULL << VIRTIO_NET_F_CTRL_VQ);
+
+	VTPCI_OPS(&dev->hw)->set_features(&dev->hw, features);
+	hw->guest_features = features;
+
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
+		if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
+			vtpci_read_dev_config(hw,
+				offsetof(struct virtio_net_config,
+					max_virtqueue_pairs),
+				&dev->max_queue_pairs,
+				sizeof(dev->max_queue_pairs));
+		} else {
+			dev->max_queue_pairs = 1;
+		}
+
+		ret = virtio_vdpa_allocate_ctrl_vq(dev);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to allocate ctrl vq");
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
 	.get_protocol_features = virtio_vdpa_get_protocol_features,
 	.dev_conf = virtio_vdpa_dev_config,
 	.dev_close = virtio_vdpa_dev_close,
+	.set_features = virtio_vdpa_set_features,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (11 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 12/15] net/virtio: add vDPA op to set features Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-03  4:47   ` Tiwei Bie
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area Maxime Coquelin
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .get_vfio_group_fd() and
.get_vfio_device_fd() callbacks.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 34 ++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index 9b61688a1..e0b2f99ba 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -675,6 +675,38 @@ virtio_vdpa_set_features(int vid)
 	return 0;
 }
 
+static int
+virtio_vdpa_get_vfio_group_fd(int vid)
+{
+	int did;
+	struct internal_list *list;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	return list->dev->vfio_group_fd;
+}
+
+static int
+virtio_vdpa_get_vfio_device_fd(int vid)
+{
+	int did;
+	struct internal_list *list;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	return list->dev->vfio_dev_fd;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
@@ -682,6 +714,8 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.dev_conf = virtio_vdpa_dev_config,
 	.dev_close = virtio_vdpa_dev_close,
 	.set_features = virtio_vdpa_set_features,
+	.get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd,
+	.get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (12 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs Maxime Coquelin
@ 2019-08-29  7:59 ` Maxime Coquelin
  2019-09-03  5:02   ` Tiwei Bie
  2019-08-29  8:00 ` [dpdk-dev] [PATCH 15/15] doc: add documentation for Virtio vDPA driver Maxime Coquelin
  2019-09-09 11:55 ` [dpdk-dev] [PATCH 00/15] Introduce " Shahaf Shuler
  15 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  7:59 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

This patch implements the vDPA .get_notify_area()
callback.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
index e0b2f99ba..e681f527a 100644
--- a/drivers/net/virtio/virtio_vdpa.c
+++ b/drivers/net/virtio/virtio_vdpa.c
@@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid)
 	return list->dev->vfio_dev_fd;
 }
 
+static int
+virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
+{
+	int did;
+	struct internal_list *list;
+	struct virtio_vdpa_device *dev;
+	struct vfio_region_info reg = { .argsz = sizeof(reg) };
+	int ret;
+
+	did = rte_vhost_get_vdpa_device_id(vid);
+	list = find_internal_resource_by_did(did);
+	if (list == NULL) {
+		DRV_LOG(ERR, "Invalid device id: %d", did);
+		return -1;
+	}
+
+	dev = list->dev;
+
+	reg.index = dev->hw.notify_region;
+	ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+	if (ret) {
+		DRV_LOG(ERR, "Get not get device region info: %s",
+				strerror(errno));
+		return -1;
+	}
+
+	*offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset;
+	*size = 0x1000;
+
+	return 0;
+}
+
 static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.get_queue_num = virtio_vdpa_get_queue_num,
 	.get_features = virtio_vdpa_get_features,
@@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
 	.set_features = virtio_vdpa_set_features,
 	.get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd,
 	.get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd,
+	.get_notify_area = virtio_vdpa_get_notify_area,
 };
 
 static inline int
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [dpdk-dev] [PATCH 15/15] doc: add documentation for Virtio vDPA driver
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (13 preceding siblings ...)
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area Maxime Coquelin
@ 2019-08-29  8:00 ` Maxime Coquelin
  2019-09-09 11:55 ` [dpdk-dev] [PATCH 00/15] Introduce " Shahaf Shuler
  15 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-08-29  8:00 UTC (permalink / raw)
  To: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann
  Cc: stable, Maxime Coquelin

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/nics/index.rst       |  1 +
 doc/guides/nics/virtio_vdpa.rst | 45 +++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 doc/guides/nics/virtio_vdpa.rst

diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 9fec02f3e..12aee63d7 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -55,6 +55,7 @@ Network Interface Controller Drivers
     thunderx
     vdev_netvsc
     virtio
+    virtio_vdpa
     vhost
     vmxnet3
     pcap_ring
diff --git a/doc/guides/nics/virtio_vdpa.rst b/doc/guides/nics/virtio_vdpa.rst
new file mode 100644
index 000000000..b708ef77e
--- /dev/null
+++ b/doc/guides/nics/virtio_vdpa.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2019 Red Hat, Inc.
+
+Virtio vDPA driver
+==================
+
+The Virtio vDPA driver provides support to either para-virtualized
+or fully HW offloaded Virtio-net devices.
+
+Pre-Installation Configuration
+------------------------------
+
+Config File Options
+~~~~~~~~~~~~~~~~~~~
+
+The following option can be modified in the ``config`` file.
+
+- ``CONFIG_RTE_VIRTIO_VDPA`` (default ``y`` for linux)
+
+Virtio vDPA Implementation
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To let the Virtio-net device being probed by the Virtio vDPA driver, adding
+"vdpa=1" parameter helps to specify that this device is to be used in vDPA
+mode, rather than polling mode, virtio pmd will skip when it detects this
+message. If no specified, device will not be used as a vDPA device, and it
+will be driven by virtio pmd.
+
+This driver requires the use of VFIO with IOMMU enabled, as a second level
+of addresses translation is required.
+
+Features
+--------
+
+Features of the Virtio vDPA driver are:
+
+- Compatibility with virtio 0.95, 1.0 and 1.1.
+- Multiqueue support.
+
+Prerequisites
+-------------
+
+- Platform with IOMMU feature. Virtio device needs address translation
+  service to Rx/Tx directly with virtio driver in VM or container.
+
-- 
2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion Maxime Coquelin
@ 2019-09-02  6:03   ` Tiwei Bie
  2019-09-03  7:24     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:03 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:46AM +0200, Maxime Coquelin wrote:
> This is preliminary rework for virtio-vdpa driver, in
> order to avoid conflicts with Virtio PMD headers.
> 
> Generally, I think it is better not to include kernel
> headers in RTE headers, especially in the case of Vhost
> and Virtio which just re-use the kernel definitions,
> and has no runtime dependencies.
> 
> In order to not break IFC driver build, the vhost kernel
> header is now included directly in the driver.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/ifc/ifcvf_vdpa.c | 1 +
>  lib/librte_vhost/rte_vdpa.h  | 1 -
>  lib/librte_vhost/rte_vhost.h | 9 ++++-----
>  3 files changed, 5 insertions(+), 6 deletions(-)

Vhost examples need to be updated as well.

Regards,
Tiwei

> 
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> index 8de9ef199..40cb15ca8 100644

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file Maxime Coquelin
@ 2019-09-02  6:05   ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:05 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

> net/virtio: move control path fonctions in virtqueue file

s/fonctions/functions/

On Thu, Aug 29, 2019 at 09:59:48AM +0200, Maxime Coquelin wrote:
> Virtio-vdpa driver needs to implement the control path,
> so move related functions to virtqueue file so that it
> can be used by both Virtio PMD and Virtio-vdpa drivers.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_ethdev.c | 252 ----------------------------
>  drivers/net/virtio/virtqueue.c     | 255 +++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h     |   5 +
>  3 files changed, 260 insertions(+), 252 deletions(-)

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration Maxime Coquelin
@ 2019-09-02  6:14   ` Tiwei Bie
  2019-09-03  7:25     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:14 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:49AM +0200, Maxime Coquelin wrote:
> The Virtio PCI susbsytem IDs need to be specified to
> prevent it to probe IFC vDPA VFs.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_pci.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> index a38cb45ad..56f89a454 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -19,6 +19,7 @@ struct virtnet_ctl;
>  #define VIRTIO_PCI_VENDORID     0x1AF4
>  #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000
>  #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041
> +#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100

0x1100 is the subsystem device ID used by QEMU.
Maybe naming it VIRTIO_PCI_SUBSYS_DEVICEID_QEMU is better?

Regards,
Tiwei

>  
>  /* VirtIO ABI version, this must match exactly. */
>  #define VIRTIO_PCI_ABI_VERSION 0
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct Maxime Coquelin
@ 2019-09-02  6:17   ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:17 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:50AM +0200, Maxime Coquelin wrote:
> SAving the notify bar ID in the virtio HW struct will be

s/SAving/Saving/

> used by the virtio-vdpa driver in its .get_notify_area()
> callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_pci.c | 6 ++++--
>  drivers/net/virtio/virtio_pci.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

> 
> diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
> index 4468e89cb..762847d70 100644
> --- a/drivers/net/virtio/virtio_pci.c
> +++ b/drivers/net/virtio/virtio_pci.c
> @@ -617,12 +617,14 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
>  			ret = rte_pci_read_config(dev,
>  					&hw->notify_off_multiplier,
>  					4, pos + sizeof(cap));
> -			if (ret != 4)
> +			if (ret != 4) {
>  				PMD_INIT_LOG(DEBUG,
>  					"failed to read notify_off_multiplier, ret %d",
>  					ret);
> -			else
> +			} else {
>  				hw->notify_base = get_cfg_addr(dev, &cap);
> +				hw->notify_region = cap.bar;
> +			}
>  			break;
>  		case VIRTIO_PCI_CAP_DEVICE_CFG:
>  			hw->dev_cfg = get_cfg_addr(dev, &cap);
> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> index 56f89a454..afdcc906f 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -256,6 +256,7 @@ struct virtio_hw {
>  	uint32_t    notify_off_multiplier;
>  	uint8_t     *isr;
>  	uint16_t    *notify_base;
> +	uint8_t     notify_region;
>  	struct virtio_pci_common_cfg *common_cfg;
>  	struct virtio_net_config *dev_cfg;
>  	void	    *virtio_user_dev;
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver Maxime Coquelin
@ 2019-09-02  6:27   ` Tiwei Bie
  2019-09-03  7:25     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:27 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:51AM +0200, Maxime Coquelin wrote:
> +
> +RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa);
> +RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map);
> +RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci");
> +
> +RTE_INIT(virtio_vdpa_init_log)
> +{
> +	virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa");

We need to use pmd.net.virtio.init to enable the logs in
virtio_pci.c when using virtio vdpa. Maybe naming above
logtype e.g. "pmd.net.virtio.vdpa" will be more consistent?

Regards,
Tiwei

> +	if (virtio_vdpa_logtype >= 0)
> +		rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE);
> +}
> +
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue Maxime Coquelin
@ 2019-09-02  6:32   ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:32 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:52AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_queue_num() callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 43 ++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features Maxime Coquelin
@ 2019-09-02  6:43   ` Tiwei Bie
  2019-09-03  7:27     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:43 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:53AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_features() callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> index 07ff1e090..9e2af8313 100644
> --- a/drivers/net/virtio/virtio_vdpa.c
> +++ b/drivers/net/virtio/virtio_vdpa.c
> @@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num)
>  	return 0;
>  }
>  
> +static int
> +virtio_vdpa_get_features(int did, uint64_t *features)
> +{
> +	struct internal_list *list;
> +	struct virtio_vdpa_device *dev;
> +
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	dev = list->dev;
> +
> +	*features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw);
> +
> +	if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) {
> +		DRV_LOG(ERR, "Device does not support IOMMU");
> +		return -1;
> +	}
> +
> +	if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ))
> +		dev->has_ctrl_vq = true;
> +	else
> +		dev->has_ctrl_vq = false;
> +
> +	*features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES);

We need to drop the VIRTIO_F_IOMMU_PLATFORM bit before
reporting features to the upper layer as the vDPA backend
doesn't support the vIOMMU yet.

Regards,
Tiwei


> +
> +	return 0;
> +}
> +
>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>  	.get_queue_num = virtio_vdpa_get_queue_num,
> +	.get_features = virtio_vdpa_get_features,
>  };
>  
>  static inline int
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features Maxime Coquelin
@ 2019-09-02  6:46   ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  6:46 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:54AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_protocol_features()
> callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> index 9e2af8313..fc52a8e92 100644
> --- a/drivers/net/virtio/virtio_vdpa.c
> +++ b/drivers/net/virtio/virtio_vdpa.c
> @@ -204,9 +204,22 @@ virtio_vdpa_get_features(int did, uint64_t *features)
>  	return 0;
>  }
>  
> +#define VDPA_SUPPORTED_PROTOCOL_FEATURES \
> +		(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD | \
> +		 1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER)

Looks better to add an empty line here.

> +static int
> +virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
> +{
> +	*features = VDPA_SUPPORTED_PROTOCOL_FEATURES;
> +	return 0;
> +}
> +
>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>  	.get_queue_num = virtio_vdpa_get_queue_num,
>  	.get_features = virtio_vdpa_get_features,
> +	.get_protocol_features = virtio_vdpa_get_protocol_features,
>  };
>  
>  static inline int
> -- 
> 2.21.0
> 

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close the device
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close " Maxime Coquelin
@ 2019-09-02  7:07   ` Tiwei Bie
  2019-09-03  7:30     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-02  7:07 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:56AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .dev_close() callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> index 13b4dd07d..691844906 100644
> --- a/drivers/net/virtio/virtio_vdpa.c
> +++ b/drivers/net/virtio/virtio_vdpa.c
> @@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev)
>  	return ret;
>  }
>  
> +static void
> +virtio_vdpa_stop(struct virtio_vdpa_device *dev)
> +{
> +	struct virtio_hw *hw = &dev->hw;
> +	uint32_t i, nr_vring;
> +	int vid = dev->vid;
> +	struct rte_vhost_vring vr;
> +	uint16_t last_used_idx, last_avail_idx;
> +
> +	nr_vring = rte_vhost_get_vring_num(vid);
> +
> +	vtpci_reset(hw);
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(vid, i, &vr);
> +
> +		last_used_idx = vr.used->idx;
> +		last_avail_idx = vr.avail->idx;

This only works in split ring.

Regards,
Tiwei

> +
> +		rte_vhost_set_vring_base(vid, i, last_avail_idx,
> +				last_used_idx);
> +	}
> +
> +	rte_free(dev->vqs);
> +	dev->vqs = NULL;
> +}
> +

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready Maxime Coquelin
@ 2019-09-02  8:34   ` Ye Xiaolong
  2019-09-02  9:02     ` Wang, Xiao W
  0 siblings, 1 reply; 50+ messages in thread
From: Ye Xiaolong @ 2019-09-02  8:34 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On 08/29, Maxime Coquelin wrote:
>There might not have any VHOST_USER_SET_VRING_CALL requests
>sent once virtio device is ready. When it happens, the vDPA
>device's dev_conf() callback may never be called.
>
>Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call message")
>Cc: stable@dpdk.org
>Cc: xiaolong.ye@intel.com
>
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/vhost_user.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
>diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>index 0b72648a5..b1ea80c52 100644
>--- a/lib/librte_vhost/vhost_user.c
>+++ b/lib/librte_vhost/vhost_user.c
>@@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
> 	did = dev->vdpa_dev_id;
> 	vdpa_dev = rte_vdpa_get_device(did);
> 	if (vdpa_dev && virtio_is_ready(dev) &&
>-			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
>-			msg.request.master == VHOST_USER_SET_VRING_CALL) {
>+			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> 		if (vdpa_dev->ops->dev_conf)
> 			vdpa_dev->ops->dev_conf(dev->vid);
> 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
>-- 
>2.21.0
>

Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready
  2019-09-02  8:34   ` Ye Xiaolong
@ 2019-09-02  9:02     ` Wang, Xiao W
  2019-09-03  7:34       ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Wang, Xiao W @ 2019-09-02  9:02 UTC (permalink / raw)
  To: Ye, Xiaolong, Maxime Coquelin
  Cc: Bie, Tiwei, Wang, Zhihong, amorenoz, dev, jfreimann, stable

Hi,

> -----Original Message-----
> From: Ye, Xiaolong
> Sent: Monday, September 2, 2019 4:35 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W
> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com;
> stable@dpdk.org
> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is
> ready
> 
> On 08/29, Maxime Coquelin wrote:
> >There might not have any VHOST_USER_SET_VRING_CALL requests
> >sent once virtio device is ready. When it happens, the vDPA
> >device's dev_conf() callback may never be called.
> >
> >Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call
> message")
> >Cc: stable@dpdk.org
> >Cc: xiaolong.ye@intel.com
> >
> >Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >---
> > lib/librte_vhost/vhost_user.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >index 0b72648a5..b1ea80c52 100644
> >--- a/lib/librte_vhost/vhost_user.c
> >+++ b/lib/librte_vhost/vhost_user.c
> >@@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
> > 	did = dev->vdpa_dev_id;
> > 	vdpa_dev = rte_vdpa_get_device(did);
> > 	if (vdpa_dev && virtio_is_ready(dev) &&
> >-			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
> >-			msg.request.master ==
> VHOST_USER_SET_VRING_CALL) {
> >+			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {

In the early beginning of vhost user messages, there seems to be a VHOST_USER_SET_VRING_CALL with invalid call fd,
not sure if QEMU has any update on this point.
If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf() cannot setup interrupt properly.
I think that's why in our previous implementation, we wait for the real call fd and then call dev_conf().

BRs,
Xiao

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs Maxime Coquelin
@ 2019-09-03  4:47   ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-03  4:47 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:58AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_vfio_group_fd() and
> .get_vfio_device_fd() callbacks.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 34 ++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)

Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area Maxime Coquelin
@ 2019-09-03  5:02   ` Tiwei Bie
  2019-09-03  7:36     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-03  5:02 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote:
> This patch implements the vDPA .get_notify_area()
> callback.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> index e0b2f99ba..e681f527a 100644
> --- a/drivers/net/virtio/virtio_vdpa.c
> +++ b/drivers/net/virtio/virtio_vdpa.c
> @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid)
>  	return list->dev->vfio_dev_fd;
>  }
>  
> +static int
> +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
> +{
> +	int did;
> +	struct internal_list *list;
> +	struct virtio_vdpa_device *dev;
> +	struct vfio_region_info reg = { .argsz = sizeof(reg) };
> +	int ret;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	dev = list->dev;
> +
> +	reg.index = dev->hw.notify_region;
> +	ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
> +	if (ret) {
> +		DRV_LOG(ERR, "Get not get device region info: %s",
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	*offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset;
> +	*size = 0x1000;

It would be better to check whether the size is no less
than 0x1000, otherwise it's possible to give guest the
access to other registers of the vdpa device.

Regards,
Tiwei

> +
> +	return 0;
> +}
> +
>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>  	.get_queue_num = virtio_vdpa_get_queue_num,
>  	.get_features = virtio_vdpa_get_features,
> @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>  	.set_features = virtio_vdpa_set_features,
>  	.get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd,
>  	.get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd,
> +	.get_notify_area = virtio_vdpa_get_notify_area,
>  };
>  
>  static inline int
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-08-29  7:59 ` [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device Maxime Coquelin
@ 2019-09-03  5:30   ` Tiwei Bie
  2019-09-03  7:40     ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-03  5:30 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
> In order to support multi-queue, we need to implement the control
> path. The problem is that both the Vhost-user master and slave use
> VAs in their processes address spaces as IOVAs, which creates
> collusions between the data rings IOVAs managed the master, and
> the Control ring IOVAs. The trick here is to remmap the Control
> ring memory to another range, after the slave is aware of master's
> ranges.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++
>  1 file changed, 255 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> index fc52a8e92..13b4dd07d 100644
> --- a/drivers/net/virtio/virtio_vdpa.c
> +++ b/drivers/net/virtio/virtio_vdpa.c
> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev)
>  	return list;
>  }
>  
> +static int
> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map,
> +		uint64_t iova)
> +{
> +	const struct rte_memzone *mz;
> +	int ret;
> +
> +	/*
> +	 * IOVAs are processes VAs. We cannot use them as the Data and Control
> +	 * paths are run in different processes, which may (does) lead to
> +	 * collusions. The trick here is to fixup Ctrl path IOVAs so that they
> +	 * start after the Data path ranges.
> +	 */
> +	if (do_map) {
> +		mz = dev->cvq->cq.mz;
> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> +				(uint64_t)(uintptr_t)mz->addr,
> +				iova, mz->len);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
> +			return ret;
> +		}
> +
> +		dev->cvq->vq_ring_mem = iova;
> +		iova += mz->len;
> +
> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> +				(uint64_t)(uintptr_t)mz->addr,
> +				iova, mz->len);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
> +			return ret;
> +		}

This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
via the device which may have potential risks.

Regards,
Tiwei

> +
> +		dev->cvq->cq.virtio_net_hdr_mem = iova;
> +	} else {
> +		mz = dev->cvq->cq.mz;
> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> +				(uint64_t)(uintptr_t)mz->addr,
> +				iova, mz->len);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
> +			return ret;
> +		}
> +
> +		dev->cvq->vq_ring_mem = 0;
> +		iova += mz->len;
> +
> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> +				(uint64_t)(uintptr_t)mz->addr,
> +				iova, mz->len);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
> +			return ret;
> +		}
> +
> +		dev->cvq->cq.virtio_net_hdr_mem = 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
> +{
> +	uint32_t i;
> +	int ret;
> +	struct rte_vhost_memory *mem = NULL;
> +	int vfio_container_fd;
> +	uint64_t avail_iova = 0;
> +
> +	ret = rte_vhost_get_mem_table(dev->vid, &mem);
> +	if (ret < 0 || !mem) {
> +		DRV_LOG(ERR, "failed to get VM memory layout.");
> +		return ret;
> +	}
> +
> +	vfio_container_fd = dev->vfio_container_fd;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		struct rte_vhost_mem_region *reg;
> +
> +		reg = &mem->regions[i];
> +		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
> +			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
> +			do_map ? "DMA map" : "DMA unmap", i,
> +			reg->host_user_addr, reg->guest_phys_addr, reg->size);
> +
> +		if (reg->guest_phys_addr + reg->size > avail_iova)
> +			avail_iova = reg->guest_phys_addr + reg->size;
> +
> +		if (do_map) {
> +			ret = rte_vfio_container_dma_map(vfio_container_fd,
> +				reg->host_user_addr, reg->guest_phys_addr,
> +				reg->size);
> +			if (ret < 0) {
> +				DRV_LOG(ERR, "DMA map failed.");
> +				goto exit;
> +			}
> +		} else {
> +			ret = rte_vfio_container_dma_unmap(vfio_container_fd,
> +				reg->host_user_addr, reg->guest_phys_addr,
> +				reg->size);
> +			if (ret < 0) {
> +				DRV_LOG(ERR, "DMA unmap failed.");
> +				goto exit;
> +			}
> +		}
> +	}
> +
> +	if (dev->cvq)
> +		ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova);
> +
> +exit:
> +	free(mem);
> +
> +	return ret;
> +}
> +
>  static int
>  virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
>  {
> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
>  	return 0;
>  }
>  
> +static uint64_t
> +hva_to_gpa(int vid, uint64_t hva)
> +{
> +	struct rte_vhost_memory *mem = NULL;
> +	struct rte_vhost_mem_region *reg;
> +	uint32_t i;
> +	uint64_t gpa = 0;
> +
> +	if (rte_vhost_get_mem_table(vid, &mem) < 0)
> +		goto exit;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		reg = &mem->regions[i];
> +
> +		if (hva >= reg->host_user_addr &&
> +				hva < reg->host_user_addr + reg->size) {
> +			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
> +			break;
> +		}
> +	}
> +
> +exit:
> +	if (mem)
> +		free(mem);
> +	return gpa;
> +}
> +
> +static int
> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
> +{
> +	struct virtio_hw *hw = &dev->hw;
> +	int i, vid, nr_vring, ret;
> +	struct rte_vhost_vring vr;
> +	struct virtio_pmd_ctrl ctrl;
> +	int dlen[1];
> +
> +	vid = dev->vid;
> +	nr_vring = rte_vhost_get_vring_num(vid);
> +
> +	if (dev->vqs)
> +		rte_free(dev->vqs);
> +
> +	dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0);
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		struct virtqueue *vq = &dev->vqs[i];
> +
> +		rte_vhost_get_vhost_vring(vid, i, &vr);
> +
> +		vq->vq_queue_index = i;
> +		vq->vq_nentries = vr.size;
> +		vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc);
> +		if (vq->vq_ring_mem  == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> +			ret = -1;
> +			goto out_free_vqs;
> +		}
> +
> +		ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
> +		if (ret) {
> +			DRV_LOG(ERR, "Fail to setup queue.");
> +			goto out_free_vqs;
> +		}
> +	}
> +
> +	if (dev->cvq) {
> +		ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
> +		if (ret) {
> +			DRV_LOG(ERR, "Fail to setup ctrl queue.");
> +			goto out_free_vqs;
> +		}
> +	}
> +
> +	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
> +
> +	if (!dev->cvq)
> +		return 0;
> +
> +	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
> +	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
> +	memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
> +
> +	dlen[0] = sizeof(uint16_t);
> +
> +	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> +	if (ret) {
> +		DRV_LOG(ERR, "Multiqueue configured but send command "
> +			  "failed, this is too late now...");
> +		ret = -EINVAL;
> +		goto out_free_vqs;
> +	}
> +
> +	return 0;
> +out_free_vqs:
> +	rte_free(dev->vqs);
> +
> +	return ret;
> +}
> +
> +static int
> +virtio_vdpa_dev_config(int vid)
> +{
> +	int did, ret;
> +	struct internal_list *list;
> +	struct virtio_vdpa_device *dev;
> +
> +	did = rte_vhost_get_vdpa_device_id(vid);
> +	list = find_internal_resource_by_did(did);
> +	if (list == NULL) {
> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> +		return -1;
> +	}
> +
> +	dev = list->dev;
> +	dev->vid = vid;
> +
> +	rte_spinlock_lock(&dev->lock);
> +
> +	ret = virtio_vdpa_dma_map(dev, 1);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = virtio_vdpa_start(dev);
> +
> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> +
> +out_unlock:
> +	rte_spinlock_unlock(&dev->lock);
> +
> +	return ret;
> +}
> +
>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>  	.get_queue_num = virtio_vdpa_get_queue_num,
>  	.get_features = virtio_vdpa_get_features,
>  	.get_protocol_features = virtio_vdpa_get_protocol_features,
> +	.dev_conf = virtio_vdpa_dev_config,
>  };
>  
>  static inline int
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion
  2019-09-02  6:03   ` Tiwei Bie
@ 2019-09-03  7:24     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:24 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/2/19 8:03 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:46AM +0200, Maxime Coquelin wrote:
>> This is preliminary rework for virtio-vdpa driver, in
>> order to avoid conflicts with Virtio PMD headers.
>>
>> Generally, I think it is better not to include kernel
>> headers in RTE headers, especially in the case of Vhost
>> and Virtio which just re-use the kernel definitions,
>> and has no runtime dependencies.
>>
>> In order to not break IFC driver build, the vhost kernel
>> header is now included directly in the driver.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/ifc/ifcvf_vdpa.c | 1 +
>>  lib/librte_vhost/rte_vdpa.h  | 1 -
>>  lib/librte_vhost/rte_vhost.h | 9 ++++-----
>>  3 files changed, 5 insertions(+), 6 deletions(-)
> 
> Vhost examples need to be updated as well.

Ha, yes, good catch.
It will be done in v2.

Thanks,
Maxime

> Regards,
> Tiwei
> 
>>
>> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
>> index 8de9ef199..40cb15ca8 100644

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration
  2019-09-02  6:14   ` Tiwei Bie
@ 2019-09-03  7:25     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:25 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/2/19 8:14 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:49AM +0200, Maxime Coquelin wrote:
>> The Virtio PCI susbsytem IDs need to be specified to
>> prevent it to probe IFC vDPA VFs.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/virtio/virtio_pci.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
>> index a38cb45ad..56f89a454 100644
>> --- a/drivers/net/virtio/virtio_pci.h
>> +++ b/drivers/net/virtio/virtio_pci.h
>> @@ -19,6 +19,7 @@ struct virtnet_ctl;
>>  #define VIRTIO_PCI_VENDORID     0x1AF4
>>  #define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000
>>  #define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041
>> +#define VIRTIO_PCI_SUBSY_DEVICEID_NET 0x1100
> 
> 0x1100 is the subsystem device ID used by QEMU.
> Maybe naming it VIRTIO_PCI_SUBSYS_DEVICEID_QEMU is better?

Indeed, will do.

Thanks,
Maxime

> Regards,
> Tiwei
> 
>>  
>>  /* VirtIO ABI version, this must match exactly. */
>>  #define VIRTIO_PCI_ABI_VERSION 0
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver
  2019-09-02  6:27   ` Tiwei Bie
@ 2019-09-03  7:25     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:25 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/2/19 8:27 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:51AM +0200, Maxime Coquelin wrote:
>> +
>> +RTE_PMD_REGISTER_PCI(net_virtio_vdpa, rte_virtio_vdpa);
>> +RTE_PMD_REGISTER_PCI_TABLE(net_virtio_vdpa, pci_id_virtio_vdpa_map);
>> +RTE_PMD_REGISTER_KMOD_DEP(net_virtio_vdpa, "* vfio-pci");
>> +
>> +RTE_INIT(virtio_vdpa_init_log)
>> +{
>> +	virtio_vdpa_logtype = rte_log_register("pmd.net.virtio_vdpa");
> 
> We need to use pmd.net.virtio.init to enable the logs in
> virtio_pci.c when using virtio vdpa. Maybe naming above
> logtype e.g. "pmd.net.virtio.vdpa" will be more consistent?

It makes perfect sense, I will do the change.

> Regards,
> Tiwei
> 
>> +	if (virtio_vdpa_logtype >= 0)
>> +		rte_log_set_level(virtio_vdpa_logtype, RTE_LOG_NOTICE);
>> +}
>> +
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features
  2019-09-02  6:43   ` Tiwei Bie
@ 2019-09-03  7:27     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:27 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/2/19 8:43 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:53AM +0200, Maxime Coquelin wrote:
>> This patch implements the vDPA .get_features() callback.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/virtio/virtio_vdpa.c | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
>> index 07ff1e090..9e2af8313 100644
>> --- a/drivers/net/virtio/virtio_vdpa.c
>> +++ b/drivers/net/virtio/virtio_vdpa.c
>> @@ -173,8 +173,40 @@ virtio_vdpa_get_queue_num(int did, uint32_t *queue_num)
>>  	return 0;
>>  }
>>  
>> +static int
>> +virtio_vdpa_get_features(int did, uint64_t *features)
>> +{
>> +	struct internal_list *list;
>> +	struct virtio_vdpa_device *dev;
>> +
>> +	list = find_internal_resource_by_did(did);
>> +	if (list == NULL) {
>> +		DRV_LOG(ERR, "Invalid device id: %d", did);
>> +		return -1;
>> +	}
>> +
>> +	dev = list->dev;
>> +
>> +	*features = VTPCI_OPS(&dev->hw)->get_features(&dev->hw);
>> +
>> +	if (!(*features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) {
>> +		DRV_LOG(ERR, "Device does not support IOMMU");
>> +		return -1;
>> +	}
>> +
>> +	if (*features & (1ULL << VIRTIO_NET_F_CTRL_VQ))
>> +		dev->has_ctrl_vq = true;
>> +	else
>> +		dev->has_ctrl_vq = false;
>> +
>> +	*features |= (1ULL << VHOST_USER_F_PROTOCOL_FEATURES);
> 
> We need to drop the VIRTIO_F_IOMMU_PLATFORM bit before
> reporting features to the upper layer as the vDPA backend
> doesn't support the vIOMMU yet.

Indeed, it just work for now because Virtio-user does not advertise
IOMMU feature.

> Regards,
> Tiwei
> 
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>  	.get_queue_num = virtio_vdpa_get_queue_num,
>> +	.get_features = virtio_vdpa_get_features,
>>  };
>>  
>>  static inline int
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close the device
  2019-09-02  7:07   ` Tiwei Bie
@ 2019-09-03  7:30     ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:30 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/2/19 9:07 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:56AM +0200, Maxime Coquelin wrote:
>> This patch implements the vDPA .dev_close() callback.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/virtio/virtio_vdpa.c | 52 ++++++++++++++++++++++++++++++++
>>  1 file changed, 52 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
>> index 13b4dd07d..691844906 100644
>> --- a/drivers/net/virtio/virtio_vdpa.c
>> +++ b/drivers/net/virtio/virtio_vdpa.c
>> @@ -436,6 +436,33 @@ virtio_vdpa_start(struct virtio_vdpa_device *dev)
>>  	return ret;
>>  }
>>  
>> +static void
>> +virtio_vdpa_stop(struct virtio_vdpa_device *dev)
>> +{
>> +	struct virtio_hw *hw = &dev->hw;
>> +	uint32_t i, nr_vring;
>> +	int vid = dev->vid;
>> +	struct rte_vhost_vring vr;
>> +	uint16_t last_used_idx, last_avail_idx;
>> +
>> +	nr_vring = rte_vhost_get_vring_num(vid);
>> +
>> +	vtpci_reset(hw);
>> +
>> +	for (i = 0; i < nr_vring; i++) {
>> +		rte_vhost_get_vhost_vring(vid, i, &vr);
>> +
>> +		last_used_idx = vr.used->idx;
>> +		last_avail_idx = vr.avail->idx;
> 
> This only works in split ring.

Right, thanks for spotting it.
For the v2, I'll try to fix it and test it with Jason's Qemu series
adding packed ring support.

> Regards,
> Tiwei
> 
>> +
>> +		rte_vhost_set_vring_base(vid, i, last_avail_idx,
>> +				last_used_idx);
>> +	}
>> +
>> +	rte_free(dev->vqs);
>> +	dev->vqs = NULL;
>> +}
>> +

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready
  2019-09-02  9:02     ` Wang, Xiao W
@ 2019-09-03  7:34       ` Maxime Coquelin
  2019-09-03 10:58         ` Wang, Xiao W
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:34 UTC (permalink / raw)
  To: Wang, Xiao W, Ye, Xiaolong
  Cc: Bie, Tiwei, Wang, Zhihong, amorenoz, dev, jfreimann, stable



On 9/2/19 11:02 AM, Wang, Xiao W wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Ye, Xiaolong
>> Sent: Monday, September 2, 2019 4:35 PM
>> To: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong
>> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W
>> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com;
>> stable@dpdk.org
>> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is
>> ready
>>
>> On 08/29, Maxime Coquelin wrote:
>>> There might not have any VHOST_USER_SET_VRING_CALL requests
>>> sent once virtio device is ready. When it happens, the vDPA
>>> device's dev_conf() callback may never be called.
>>>
>>> Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call
>> message")
>>> Cc: stable@dpdk.org
>>> Cc: xiaolong.ye@intel.com
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>> lib/librte_vhost/vhost_user.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>> index 0b72648a5..b1ea80c52 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
>>> 	did = dev->vdpa_dev_id;
>>> 	vdpa_dev = rte_vdpa_get_device(did);
>>> 	if (vdpa_dev && virtio_is_ready(dev) &&
>>> -			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
>>> -			msg.request.master ==
>> VHOST_USER_SET_VRING_CALL) {
>>> +			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> 
> In the early beginning of vhost user messages, there seems to be a VHOST_USER_SET_VRING_CALL with invalid call fd,
> not sure if QEMU has any update on this point.
> If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf() cannot setup interrupt properly.
> I think that's why in our previous implementation, we wait for the real call fd and then call dev_conf().

I think that when we receive the first SET_VRING_CALL request from Qemu,
virtio_is_ready() should be false because the rings addresses won't be
received yet.

Did it fixed a real issue, or was it based on code/logs review?

Thanks for the explanations,
Maxime

> BRs,
> Xiao
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area
  2019-09-03  5:02   ` Tiwei Bie
@ 2019-09-03  7:36     ` Maxime Coquelin
  2019-09-03  8:40       ` Tiwei Bie
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:36 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/3/19 7:02 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote:
>> This patch implements the vDPA .get_notify_area()
>> callback.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++
>>  1 file changed, 33 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
>> index e0b2f99ba..e681f527a 100644
>> --- a/drivers/net/virtio/virtio_vdpa.c
>> +++ b/drivers/net/virtio/virtio_vdpa.c
>> @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid)
>>  	return list->dev->vfio_dev_fd;
>>  }
>>  
>> +static int
>> +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
>> +{
>> +	int did;
>> +	struct internal_list *list;
>> +	struct virtio_vdpa_device *dev;
>> +	struct vfio_region_info reg = { .argsz = sizeof(reg) };
>> +	int ret;
>> +
>> +	did = rte_vhost_get_vdpa_device_id(vid);
>> +	list = find_internal_resource_by_did(did);
>> +	if (list == NULL) {
>> +		DRV_LOG(ERR, "Invalid device id: %d", did);
>> +		return -1;
>> +	}
>> +
>> +	dev = list->dev;
>> +
>> +	reg.index = dev->hw.notify_region;
>> +	ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
>> +	if (ret) {
>> +		DRV_LOG(ERR, "Get not get device region info: %s",
>> +				strerror(errno));
>> +		return -1;
>> +	}
>> +
>> +	*offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset;
>> +	*size = 0x1000;
> 
> It would be better to check whether the size is no less
> than 0x1000, otherwise it's possible to give guest the
> access to other registers of the vdpa device.

Correct, if offset is not page-aligned, then it would mean one page may
be mapped while not needed. I took the ifcvf driver as example and
forgot to fix it. (Maybe it should also be fixed in ifcvf driver?)

Thanks,
Maxime
> Regards,
> Tiwei
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>  	.get_queue_num = virtio_vdpa_get_queue_num,
>>  	.get_features = virtio_vdpa_get_features,
>> @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>  	.set_features = virtio_vdpa_set_features,
>>  	.get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd,
>>  	.get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd,
>> +	.get_notify_area = virtio_vdpa_get_notify_area,
>>  };
>>  
>>  static inline int
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-09-03  5:30   ` Tiwei Bie
@ 2019-09-03  7:40     ` Maxime Coquelin
  2019-09-03  8:49       ` Tiwei Bie
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-03  7:40 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/3/19 7:30 AM, Tiwei Bie wrote:
> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
>> In order to support multi-queue, we need to implement the control
>> path. The problem is that both the Vhost-user master and slave use
>> VAs in their processes address spaces as IOVAs, which creates
>> collusions between the data rings IOVAs managed the master, and
>> the Control ring IOVAs. The trick here is to remmap the Control
>> ring memory to another range, after the slave is aware of master's
>> ranges.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++
>>  1 file changed, 255 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
>> index fc52a8e92..13b4dd07d 100644
>> --- a/drivers/net/virtio/virtio_vdpa.c
>> +++ b/drivers/net/virtio/virtio_vdpa.c
>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev)
>>  	return list;
>>  }
>>  
>> +static int
>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map,
>> +		uint64_t iova)
>> +{
>> +	const struct rte_memzone *mz;
>> +	int ret;
>> +
>> +	/*
>> +	 * IOVAs are processes VAs. We cannot use them as the Data and Control
>> +	 * paths are run in different processes, which may (does) lead to
>> +	 * collusions. The trick here is to fixup Ctrl path IOVAs so that they
>> +	 * start after the Data path ranges.
>> +	 */
>> +	if (do_map) {
>> +		mz = dev->cvq->cq.mz;
>> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>> +				(uint64_t)(uintptr_t)mz->addr,
>> +				iova, mz->len);
>> +		if (ret < 0) {
>> +			DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
>> +			return ret;
>> +		}
>> +
>> +		dev->cvq->vq_ring_mem = iova;
>> +		iova += mz->len;
>> +
>> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
>> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>> +				(uint64_t)(uintptr_t)mz->addr,
>> +				iova, mz->len);
>> +		if (ret < 0) {
>> +			DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
>> +			return ret;
>> +		}
> 
> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
> via the device which may have potential risks.

I get what you mean, but I'm not sure to see how we could avoid that.
AFAIU, we need to map the control queue in the device IOMMU, otherwise
how could the host (in case of virtual device) or the NIC (in case of
Virtio offload), could access the ring?
Any thoughts?

Thanks,
Maxime
> Regards,
> Tiwei
> 
>> +
>> +		dev->cvq->cq.virtio_net_hdr_mem = iova;
>> +	} else {
>> +		mz = dev->cvq->cq.mz;
>> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>> +				(uint64_t)(uintptr_t)mz->addr,
>> +				iova, mz->len);
>> +		if (ret < 0) {
>> +			DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
>> +			return ret;
>> +		}
>> +
>> +		dev->cvq->vq_ring_mem = 0;
>> +		iova += mz->len;
>> +
>> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
>> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>> +				(uint64_t)(uintptr_t)mz->addr,
>> +				iova, mz->len);
>> +		if (ret < 0) {
>> +			DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
>> +			return ret;
>> +		}
>> +
>> +		dev->cvq->cq.virtio_net_hdr_mem = 0;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
>> +{
>> +	uint32_t i;
>> +	int ret;
>> +	struct rte_vhost_memory *mem = NULL;
>> +	int vfio_container_fd;
>> +	uint64_t avail_iova = 0;
>> +
>> +	ret = rte_vhost_get_mem_table(dev->vid, &mem);
>> +	if (ret < 0 || !mem) {
>> +		DRV_LOG(ERR, "failed to get VM memory layout.");
>> +		return ret;
>> +	}
>> +
>> +	vfio_container_fd = dev->vfio_container_fd;
>> +
>> +	for (i = 0; i < mem->nregions; i++) {
>> +		struct rte_vhost_mem_region *reg;
>> +
>> +		reg = &mem->regions[i];
>> +		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
>> +			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
>> +			do_map ? "DMA map" : "DMA unmap", i,
>> +			reg->host_user_addr, reg->guest_phys_addr, reg->size);
>> +
>> +		if (reg->guest_phys_addr + reg->size > avail_iova)
>> +			avail_iova = reg->guest_phys_addr + reg->size;
>> +
>> +		if (do_map) {
>> +			ret = rte_vfio_container_dma_map(vfio_container_fd,
>> +				reg->host_user_addr, reg->guest_phys_addr,
>> +				reg->size);
>> +			if (ret < 0) {
>> +				DRV_LOG(ERR, "DMA map failed.");
>> +				goto exit;
>> +			}
>> +		} else {
>> +			ret = rte_vfio_container_dma_unmap(vfio_container_fd,
>> +				reg->host_user_addr, reg->guest_phys_addr,
>> +				reg->size);
>> +			if (ret < 0) {
>> +				DRV_LOG(ERR, "DMA unmap failed.");
>> +				goto exit;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (dev->cvq)
>> +		ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova);
>> +
>> +exit:
>> +	free(mem);
>> +
>> +	return ret;
>> +}
>> +
>>  static int
>>  virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
>>  {
>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
>>  	return 0;
>>  }
>>  
>> +static uint64_t
>> +hva_to_gpa(int vid, uint64_t hva)
>> +{
>> +	struct rte_vhost_memory *mem = NULL;
>> +	struct rte_vhost_mem_region *reg;
>> +	uint32_t i;
>> +	uint64_t gpa = 0;
>> +
>> +	if (rte_vhost_get_mem_table(vid, &mem) < 0)
>> +		goto exit;
>> +
>> +	for (i = 0; i < mem->nregions; i++) {
>> +		reg = &mem->regions[i];
>> +
>> +		if (hva >= reg->host_user_addr &&
>> +				hva < reg->host_user_addr + reg->size) {
>> +			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
>> +			break;
>> +		}
>> +	}
>> +
>> +exit:
>> +	if (mem)
>> +		free(mem);
>> +	return gpa;
>> +}
>> +
>> +static int
>> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
>> +{
>> +	struct virtio_hw *hw = &dev->hw;
>> +	int i, vid, nr_vring, ret;
>> +	struct rte_vhost_vring vr;
>> +	struct virtio_pmd_ctrl ctrl;
>> +	int dlen[1];
>> +
>> +	vid = dev->vid;
>> +	nr_vring = rte_vhost_get_vring_num(vid);
>> +
>> +	if (dev->vqs)
>> +		rte_free(dev->vqs);
>> +
>> +	dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0);
>> +
>> +	for (i = 0; i < nr_vring; i++) {
>> +		struct virtqueue *vq = &dev->vqs[i];
>> +
>> +		rte_vhost_get_vhost_vring(vid, i, &vr);
>> +
>> +		vq->vq_queue_index = i;
>> +		vq->vq_nentries = vr.size;
>> +		vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc);
>> +		if (vq->vq_ring_mem  == 0) {
>> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
>> +			ret = -1;
>> +			goto out_free_vqs;
>> +		}
>> +
>> +		ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
>> +		if (ret) {
>> +			DRV_LOG(ERR, "Fail to setup queue.");
>> +			goto out_free_vqs;
>> +		}
>> +	}
>> +
>> +	if (dev->cvq) {
>> +		ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
>> +		if (ret) {
>> +			DRV_LOG(ERR, "Fail to setup ctrl queue.");
>> +			goto out_free_vqs;
>> +		}
>> +	}
>> +
>> +	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
>> +
>> +	if (!dev->cvq)
>> +		return 0;
>> +
>> +	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
>> +	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
>> +	memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
>> +
>> +	dlen[0] = sizeof(uint16_t);
>> +
>> +	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
>> +	if (ret) {
>> +		DRV_LOG(ERR, "Multiqueue configured but send command "
>> +			  "failed, this is too late now...");
>> +		ret = -EINVAL;
>> +		goto out_free_vqs;
>> +	}
>> +
>> +	return 0;
>> +out_free_vqs:
>> +	rte_free(dev->vqs);
>> +
>> +	return ret;
>> +}
>> +
>> +static int
>> +virtio_vdpa_dev_config(int vid)
>> +{
>> +	int did, ret;
>> +	struct internal_list *list;
>> +	struct virtio_vdpa_device *dev;
>> +
>> +	did = rte_vhost_get_vdpa_device_id(vid);
>> +	list = find_internal_resource_by_did(did);
>> +	if (list == NULL) {
>> +		DRV_LOG(ERR, "Invalid device id: %d", did);
>> +		return -1;
>> +	}
>> +
>> +	dev = list->dev;
>> +	dev->vid = vid;
>> +
>> +	rte_spinlock_lock(&dev->lock);
>> +
>> +	ret = virtio_vdpa_dma_map(dev, 1);
>> +	if (ret)
>> +		goto out_unlock;
>> +
>> +	ret = virtio_vdpa_start(dev);
>> +
>> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>> +
>> +out_unlock:
>> +	rte_spinlock_unlock(&dev->lock);
>> +
>> +	return ret;
>> +}
>> +
>>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>  	.get_queue_num = virtio_vdpa_get_queue_num,
>>  	.get_features = virtio_vdpa_get_features,
>>  	.get_protocol_features = virtio_vdpa_get_protocol_features,
>> +	.dev_conf = virtio_vdpa_dev_config,
>>  };
>>  
>>  static inline int
>> -- 
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area
  2019-09-03  7:36     ` Maxime Coquelin
@ 2019-09-03  8:40       ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-03  8:40 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Tue, Sep 03, 2019 at 09:36:54AM +0200, Maxime Coquelin wrote:
> On 9/3/19 7:02 AM, Tiwei Bie wrote:
> > On Thu, Aug 29, 2019 at 09:59:59AM +0200, Maxime Coquelin wrote:
> >> This patch implements the vDPA .get_notify_area()
> >> callback.
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>  drivers/net/virtio/virtio_vdpa.c | 33 ++++++++++++++++++++++++++++++++
> >>  1 file changed, 33 insertions(+)
> >>
> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> >> index e0b2f99ba..e681f527a 100644
> >> --- a/drivers/net/virtio/virtio_vdpa.c
> >> +++ b/drivers/net/virtio/virtio_vdpa.c
> >> @@ -707,6 +707,38 @@ virtio_vdpa_get_vfio_device_fd(int vid)
> >>  	return list->dev->vfio_dev_fd;
> >>  }
> >>  
> >> +static int
> >> +virtio_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
> >> +{
> >> +	int did;
> >> +	struct internal_list *list;
> >> +	struct virtio_vdpa_device *dev;
> >> +	struct vfio_region_info reg = { .argsz = sizeof(reg) };
> >> +	int ret;
> >> +
> >> +	did = rte_vhost_get_vdpa_device_id(vid);
> >> +	list = find_internal_resource_by_did(did);
> >> +	if (list == NULL) {
> >> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> >> +		return -1;
> >> +	}
> >> +
> >> +	dev = list->dev;
> >> +
> >> +	reg.index = dev->hw.notify_region;
> >> +	ret = ioctl(dev->vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
> >> +	if (ret) {
> >> +		DRV_LOG(ERR, "Get not get device region info: %s",
> >> +				strerror(errno));
> >> +		return -1;
> >> +	}
> >> +
> >> +	*offset = dev->vqs[qid].notify_addr - dev->hw.notify_base + reg.offset;
> >> +	*size = 0x1000;
> > 
> > It would be better to check whether the size is no less
> > than 0x1000, otherwise it's possible to give guest the
> > access to other registers of the vdpa device.
> 
> Correct, if offset is not page-aligned, then it would mean one page may
> be mapped while not needed. I took the ifcvf driver as example and
> forgot to fix it. (Maybe it should also be fixed in ifcvf driver?)

The ifcvf hardware will make sure that the notify register will
stay in a separate page, so the size is hardcoded in ifcvf driver.

Regards,
Tiwei

> 
> Thanks,
> Maxime
> > Regards,
> > Tiwei
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
> >>  	.get_queue_num = virtio_vdpa_get_queue_num,
> >>  	.get_features = virtio_vdpa_get_features,
> >> @@ -716,6 +748,7 @@ static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
> >>  	.set_features = virtio_vdpa_set_features,
> >>  	.get_vfio_group_fd = virtio_vdpa_get_vfio_group_fd,
> >>  	.get_vfio_device_fd = virtio_vdpa_get_vfio_device_fd,
> >> +	.get_notify_area = virtio_vdpa_get_notify_area,
> >>  };
> >>  
> >>  static inline int
> >> -- 
> >> 2.21.0
> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-09-03  7:40     ` Maxime Coquelin
@ 2019-09-03  8:49       ` Tiwei Bie
  2019-09-04  4:06         ` Jason Wang
  0 siblings, 1 reply; 50+ messages in thread
From: Tiwei Bie @ 2019-09-03  8:49 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable, jasowang

On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote:
> On 9/3/19 7:30 AM, Tiwei Bie wrote:
> > On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
> >> In order to support multi-queue, we need to implement the control
> >> path. The problem is that both the Vhost-user master and slave use
> >> VAs in their processes address spaces as IOVAs, which creates
> >> collusions between the data rings IOVAs managed the master, and
> >> the Control ring IOVAs. The trick here is to remmap the Control
> >> ring memory to another range, after the slave is aware of master's
> >> ranges.
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>  drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++
> >>  1 file changed, 255 insertions(+)
> >>
> >> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
> >> index fc52a8e92..13b4dd07d 100644
> >> --- a/drivers/net/virtio/virtio_vdpa.c
> >> +++ b/drivers/net/virtio/virtio_vdpa.c
> >> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev)
> >>  	return list;
> >>  }
> >>  
> >> +static int
> >> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map,
> >> +		uint64_t iova)
> >> +{
> >> +	const struct rte_memzone *mz;
> >> +	int ret;
> >> +
> >> +	/*
> >> +	 * IOVAs are processes VAs. We cannot use them as the Data and Control
> >> +	 * paths are run in different processes, which may (does) lead to
> >> +	 * collusions. The trick here is to fixup Ctrl path IOVAs so that they
> >> +	 * start after the Data path ranges.
> >> +	 */
> >> +	if (do_map) {
> >> +		mz = dev->cvq->cq.mz;
> >> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> >> +				(uint64_t)(uintptr_t)mz->addr,
> >> +				iova, mz->len);
> >> +		if (ret < 0) {
> >> +			DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
> >> +			return ret;
> >> +		}
> >> +
> >> +		dev->cvq->vq_ring_mem = iova;
> >> +		iova += mz->len;
> >> +
> >> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
> >> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> >> +				(uint64_t)(uintptr_t)mz->addr,
> >> +				iova, mz->len);
> >> +		if (ret < 0) {
> >> +			DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
> >> +			return ret;
> >> +		}
> > 
> > This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
> > via the device which may have potential risks.
> 
> I get what you mean, but I'm not sure to see how we could avoid that.
> AFAIU, we need to map the control queue in the device IOMMU, otherwise
> how could the host (in case of virtual device) or the NIC (in case of
> Virtio offload), could access the ring?
> Any thoughts?

I also don't see a way to avoid that. That's why I said in below
thread that I think the control queue based interface seems not a
quite good interface for a backend device:

https://lkml.org/lkml/2019/9/2/934

In IFCVF NIC, we added a MMIO based interface to replace control
queue for the multiqueue setup in vDPA mode.

Jason is proposing some changes to make virtio device suitable
for backend device. I'm not sure whether it's possible to cover
this case as well..

Regards,
Tiwei

> 
> Thanks,
> Maxime
> > Regards,
> > Tiwei
> > 
> >> +
> >> +		dev->cvq->cq.virtio_net_hdr_mem = iova;
> >> +	} else {
> >> +		mz = dev->cvq->cq.mz;
> >> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> >> +				(uint64_t)(uintptr_t)mz->addr,
> >> +				iova, mz->len);
> >> +		if (ret < 0) {
> >> +			DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
> >> +			return ret;
> >> +		}
> >> +
> >> +		dev->cvq->vq_ring_mem = 0;
> >> +		iova += mz->len;
> >> +
> >> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
> >> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> >> +				(uint64_t)(uintptr_t)mz->addr,
> >> +				iova, mz->len);
> >> +		if (ret < 0) {
> >> +			DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
> >> +			return ret;
> >> +		}
> >> +
> >> +		dev->cvq->cq.virtio_net_hdr_mem = 0;
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int
> >> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
> >> +{
> >> +	uint32_t i;
> >> +	int ret;
> >> +	struct rte_vhost_memory *mem = NULL;
> >> +	int vfio_container_fd;
> >> +	uint64_t avail_iova = 0;
> >> +
> >> +	ret = rte_vhost_get_mem_table(dev->vid, &mem);
> >> +	if (ret < 0 || !mem) {
> >> +		DRV_LOG(ERR, "failed to get VM memory layout.");
> >> +		return ret;
> >> +	}
> >> +
> >> +	vfio_container_fd = dev->vfio_container_fd;
> >> +
> >> +	for (i = 0; i < mem->nregions; i++) {
> >> +		struct rte_vhost_mem_region *reg;
> >> +
> >> +		reg = &mem->regions[i];
> >> +		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
> >> +			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
> >> +			do_map ? "DMA map" : "DMA unmap", i,
> >> +			reg->host_user_addr, reg->guest_phys_addr, reg->size);
> >> +
> >> +		if (reg->guest_phys_addr + reg->size > avail_iova)
> >> +			avail_iova = reg->guest_phys_addr + reg->size;
> >> +
> >> +		if (do_map) {
> >> +			ret = rte_vfio_container_dma_map(vfio_container_fd,
> >> +				reg->host_user_addr, reg->guest_phys_addr,
> >> +				reg->size);
> >> +			if (ret < 0) {
> >> +				DRV_LOG(ERR, "DMA map failed.");
> >> +				goto exit;
> >> +			}
> >> +		} else {
> >> +			ret = rte_vfio_container_dma_unmap(vfio_container_fd,
> >> +				reg->host_user_addr, reg->guest_phys_addr,
> >> +				reg->size);
> >> +			if (ret < 0) {
> >> +				DRV_LOG(ERR, "DMA unmap failed.");
> >> +				goto exit;
> >> +			}
> >> +		}
> >> +	}
> >> +
> >> +	if (dev->cvq)
> >> +		ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova);
> >> +
> >> +exit:
> >> +	free(mem);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >>  static int
> >>  virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
> >>  {
> >> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
> >>  	return 0;
> >>  }
> >>  
> >> +static uint64_t
> >> +hva_to_gpa(int vid, uint64_t hva)
> >> +{
> >> +	struct rte_vhost_memory *mem = NULL;
> >> +	struct rte_vhost_mem_region *reg;
> >> +	uint32_t i;
> >> +	uint64_t gpa = 0;
> >> +
> >> +	if (rte_vhost_get_mem_table(vid, &mem) < 0)
> >> +		goto exit;
> >> +
> >> +	for (i = 0; i < mem->nregions; i++) {
> >> +		reg = &mem->regions[i];
> >> +
> >> +		if (hva >= reg->host_user_addr &&
> >> +				hva < reg->host_user_addr + reg->size) {
> >> +			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +exit:
> >> +	if (mem)
> >> +		free(mem);
> >> +	return gpa;
> >> +}
> >> +
> >> +static int
> >> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
> >> +{
> >> +	struct virtio_hw *hw = &dev->hw;
> >> +	int i, vid, nr_vring, ret;
> >> +	struct rte_vhost_vring vr;
> >> +	struct virtio_pmd_ctrl ctrl;
> >> +	int dlen[1];
> >> +
> >> +	vid = dev->vid;
> >> +	nr_vring = rte_vhost_get_vring_num(vid);
> >> +
> >> +	if (dev->vqs)
> >> +		rte_free(dev->vqs);
> >> +
> >> +	dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0);
> >> +
> >> +	for (i = 0; i < nr_vring; i++) {
> >> +		struct virtqueue *vq = &dev->vqs[i];
> >> +
> >> +		rte_vhost_get_vhost_vring(vid, i, &vr);
> >> +
> >> +		vq->vq_queue_index = i;
> >> +		vq->vq_nentries = vr.size;
> >> +		vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc);
> >> +		if (vq->vq_ring_mem  == 0) {
> >> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> >> +			ret = -1;
> >> +			goto out_free_vqs;
> >> +		}
> >> +
> >> +		ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
> >> +		if (ret) {
> >> +			DRV_LOG(ERR, "Fail to setup queue.");
> >> +			goto out_free_vqs;
> >> +		}
> >> +	}
> >> +
> >> +	if (dev->cvq) {
> >> +		ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
> >> +		if (ret) {
> >> +			DRV_LOG(ERR, "Fail to setup ctrl queue.");
> >> +			goto out_free_vqs;
> >> +		}
> >> +	}
> >> +
> >> +	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
> >> +
> >> +	if (!dev->cvq)
> >> +		return 0;
> >> +
> >> +	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
> >> +	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
> >> +	memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
> >> +
> >> +	dlen[0] = sizeof(uint16_t);
> >> +
> >> +	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> >> +	if (ret) {
> >> +		DRV_LOG(ERR, "Multiqueue configured but send command "
> >> +			  "failed, this is too late now...");
> >> +		ret = -EINVAL;
> >> +		goto out_free_vqs;
> >> +	}
> >> +
> >> +	return 0;
> >> +out_free_vqs:
> >> +	rte_free(dev->vqs);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +virtio_vdpa_dev_config(int vid)
> >> +{
> >> +	int did, ret;
> >> +	struct internal_list *list;
> >> +	struct virtio_vdpa_device *dev;
> >> +
> >> +	did = rte_vhost_get_vdpa_device_id(vid);
> >> +	list = find_internal_resource_by_did(did);
> >> +	if (list == NULL) {
> >> +		DRV_LOG(ERR, "Invalid device id: %d", did);
> >> +		return -1;
> >> +	}
> >> +
> >> +	dev = list->dev;
> >> +	dev->vid = vid;
> >> +
> >> +	rte_spinlock_lock(&dev->lock);
> >> +
> >> +	ret = virtio_vdpa_dma_map(dev, 1);
> >> +	if (ret)
> >> +		goto out_unlock;
> >> +
> >> +	ret = virtio_vdpa_start(dev);
> >> +
> >> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> >> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> >> +
> >> +out_unlock:
> >> +	rte_spinlock_unlock(&dev->lock);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >>  static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
> >>  	.get_queue_num = virtio_vdpa_get_queue_num,
> >>  	.get_features = virtio_vdpa_get_features,
> >>  	.get_protocol_features = virtio_vdpa_get_protocol_features,
> >> +	.dev_conf = virtio_vdpa_dev_config,
> >>  };
> >>  
> >>  static inline int
> >> -- 
> >> 2.21.0
> >>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready
  2019-09-03  7:34       ` Maxime Coquelin
@ 2019-09-03 10:58         ` Wang, Xiao W
  0 siblings, 0 replies; 50+ messages in thread
From: Wang, Xiao W @ 2019-09-03 10:58 UTC (permalink / raw)
  To: Maxime Coquelin, Ye, Xiaolong
  Cc: Bie, Tiwei, Wang, Zhihong, amorenoz, dev, jfreimann, stable

Hi

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, September 3, 2019 3:35 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Ye, Xiaolong
> <xiaolong.ye@intel.com>
> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; amorenoz@redhat.com; dev@dpdk.org;
> jfreimann@redhat.com; stable@dpdk.org
> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is
> ready
> 
> 
> 
> On 9/2/19 11:02 AM, Wang, Xiao W wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Ye, Xiaolong
> >> Sent: Monday, September 2, 2019 4:35 PM
> >> To: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Cc: Bie, Tiwei <tiwei.bie@intel.com>; Wang, Zhihong
> >> <zhihong.wang@intel.com>; amorenoz@redhat.com; Wang, Xiao W
> >> <xiao.w.wang@intel.com>; dev@dpdk.org; jfreimann@redhat.com;
> >> stable@dpdk.org
> >> Subject: Re: [PATCH 02/15] vhost: configure vDPA as soon as the device is
> >> ready
> >>
> >> On 08/29, Maxime Coquelin wrote:
> >>> There might not have any VHOST_USER_SET_VRING_CALL requests
> >>> sent once virtio device is ready. When it happens, the vDPA
> >>> device's dev_conf() callback may never be called.
> >>>
> >>> Fixes: 9f9014512822 ("vhost: configure vDPA device after set vring call
> >> message")
> >>> Cc: stable@dpdk.org
> >>> Cc: xiaolong.ye@intel.com
> >>>
> >>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>> ---
> >>> lib/librte_vhost/vhost_user.c | 3 +--
> >>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >>> index 0b72648a5..b1ea80c52 100644
> >>> --- a/lib/librte_vhost/vhost_user.c
> >>> +++ b/lib/librte_vhost/vhost_user.c
> >>> @@ -2112,8 +2112,7 @@ vhost_user_msg_handler(int vid, int fd)
> >>> 	did = dev->vdpa_dev_id;
> >>> 	vdpa_dev = rte_vdpa_get_device(did);
> >>> 	if (vdpa_dev && virtio_is_ready(dev) &&
> >>> -			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED) &&
> >>> -			msg.request.master ==
> >> VHOST_USER_SET_VRING_CALL) {
> >>> +			!(dev->flags & VIRTIO_DEV_VDPA_CONFIGURED)) {
> >
> > In the early beginning of vhost user messages, there seems to be a
> VHOST_USER_SET_VRING_CALL with invalid call fd,
> > not sure if QEMU has any update on this point.
> > If the virtio_is_ready() is based on that invalid call fd, then vdpa_dev_conf()
> cannot setup interrupt properly.
> > I think that's why in our previous implementation, we wait for the real call fd
> and then call dev_conf().
> 
> I think that when we receive the first SET_VRING_CALL request from Qemu,
> virtio_is_ready() should be false because the rings addresses won't be
> received yet.

In the last phase of vhost communication, there're usually a sequence of messages
"SET_VRING_KICK" and "SET_VRING_CALL". With this patch, at the arrival of
"SET_VRING_KICK", virtio_is_ready() will return true, and the invalid call fd will
get involved into dev_con().

I'm not sure if the invalid callfd implementation still exist in latest QEMU, if yes,
We have to handle this case.

> 
> Did it fixed a real issue, or was it based on code/logs review?

In an old implementation with QEMU 2.6, I met the issue. To verify it, you need to
Setup a VM (maybe nested) and verify if virtio can get an interrupt.

BRs,
Xiao

> 
> Thanks for the explanations,
> Maxime
> 
> > BRs,
> > Xiao
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-09-03  8:49       ` Tiwei Bie
@ 2019-09-04  4:06         ` Jason Wang
  2019-09-04  6:56           ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Wang @ 2019-09-04  4:06 UTC (permalink / raw)
  To: Tiwei Bie, Maxime Coquelin
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable


On 2019/9/3 下午4:49, Tiwei Bie wrote:
> On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote:
>> On 9/3/19 7:30 AM, Tiwei Bie wrote:
>>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
>>>> In order to support multi-queue, we need to implement the control
>>>> path. The problem is that both the Vhost-user master and slave use
>>>> VAs in their processes address spaces as IOVAs, which creates
>>>> collusions between the data rings IOVAs managed the master, and
>>>> the Control ring IOVAs. The trick here is to remmap the Control
>>>> ring memory to another range, after the slave is aware of master's
>>>> ranges.
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>>   drivers/net/virtio/virtio_vdpa.c | 255 +++++++++++++++++++++++++++++++
>>>>   1 file changed, 255 insertions(+)
>>>>
>>>> diff --git a/drivers/net/virtio/virtio_vdpa.c b/drivers/net/virtio/virtio_vdpa.c
>>>> index fc52a8e92..13b4dd07d 100644
>>>> --- a/drivers/net/virtio/virtio_vdpa.c
>>>> +++ b/drivers/net/virtio/virtio_vdpa.c
>>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct rte_pci_device *pdev)
>>>>   	return list;
>>>>   }
>>>>   
>>>> +static int
>>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int do_map,
>>>> +		uint64_t iova)
>>>> +{
>>>> +	const struct rte_memzone *mz;
>>>> +	int ret;
>>>> +
>>>> +	/*
>>>> +	 * IOVAs are processes VAs. We cannot use them as the Data and Control
>>>> +	 * paths are run in different processes, which may (does) lead to
>>>> +	 * collusions. The trick here is to fixup Ctrl path IOVAs so that they
>>>> +	 * start after the Data path ranges.
>>>> +	 */
>>>> +	if (do_map) {
>>>> +		mz = dev->cvq->cq.mz;
>>>> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>>>> +				(uint64_t)(uintptr_t)mz->addr,
>>>> +				iova, mz->len);
>>>> +		if (ret < 0) {
>>>> +			DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
>>>> +			return ret;
>>>> +		}
>>>> +
>>>> +		dev->cvq->vq_ring_mem = iova;
>>>> +		iova += mz->len;
>>>> +
>>>> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
>>>> +		ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>>>> +				(uint64_t)(uintptr_t)mz->addr,
>>>> +				iova, mz->len);
>>>> +		if (ret < 0) {
>>>> +			DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
>>>> +			return ret;
>>>> +		}
>>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
>>> via the device which may have potential risks.
>> I get what you mean, but I'm not sure to see how we could avoid that.
>> AFAIU, we need to map the control queue in the device IOMMU, otherwise
>> how could the host (in case of virtual device) or the NIC (in case of
>> Virtio offload), could access the ring?
>> Any thoughts?
> I also don't see a way to avoid that. That's why I said in below
> thread that I think the control queue based interface seems not a
> quite good interface for a backend device:
>
> https://lkml.org/lkml/2019/9/2/934
>
> In IFCVF NIC, we added a MMIO based interface to replace control
> queue for the multiqueue setup in vDPA mode.
>
> Jason is proposing some changes to make virtio device suitable
> for backend device. I'm not sure whether it's possible to cover
> this case as well..


A silly question, can we do dynamic mapping like what kernel driver did 
here?

Thanks


>
> Regards,
> Tiwei
>
>> Thanks,
>> Maxime
>>> Regards,
>>> Tiwei
>>>
>>>> +
>>>> +		dev->cvq->cq.virtio_net_hdr_mem = iova;
>>>> +	} else {
>>>> +		mz = dev->cvq->cq.mz;
>>>> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>>>> +				(uint64_t)(uintptr_t)mz->addr,
>>>> +				iova, mz->len);
>>>> +		if (ret < 0) {
>>>> +			DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
>>>> +			return ret;
>>>> +		}
>>>> +
>>>> +		dev->cvq->vq_ring_mem = 0;
>>>> +		iova += mz->len;
>>>> +
>>>> +		mz = dev->cvq->cq.virtio_net_hdr_mz;
>>>> +		ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>>>> +				(uint64_t)(uintptr_t)mz->addr,
>>>> +				iova, mz->len);
>>>> +		if (ret < 0) {
>>>> +			DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
>>>> +			return ret;
>>>> +		}
>>>> +
>>>> +		dev->cvq->cq.virtio_net_hdr_mem = 0;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
>>>> +{
>>>> +	uint32_t i;
>>>> +	int ret;
>>>> +	struct rte_vhost_memory *mem = NULL;
>>>> +	int vfio_container_fd;
>>>> +	uint64_t avail_iova = 0;
>>>> +
>>>> +	ret = rte_vhost_get_mem_table(dev->vid, &mem);
>>>> +	if (ret < 0 || !mem) {
>>>> +		DRV_LOG(ERR, "failed to get VM memory layout.");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	vfio_container_fd = dev->vfio_container_fd;
>>>> +
>>>> +	for (i = 0; i < mem->nregions; i++) {
>>>> +		struct rte_vhost_mem_region *reg;
>>>> +
>>>> +		reg = &mem->regions[i];
>>>> +		DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
>>>> +			"GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
>>>> +			do_map ? "DMA map" : "DMA unmap", i,
>>>> +			reg->host_user_addr, reg->guest_phys_addr, reg->size);
>>>> +
>>>> +		if (reg->guest_phys_addr + reg->size > avail_iova)
>>>> +			avail_iova = reg->guest_phys_addr + reg->size;
>>>> +
>>>> +		if (do_map) {
>>>> +			ret = rte_vfio_container_dma_map(vfio_container_fd,
>>>> +				reg->host_user_addr, reg->guest_phys_addr,
>>>> +				reg->size);
>>>> +			if (ret < 0) {
>>>> +				DRV_LOG(ERR, "DMA map failed.");
>>>> +				goto exit;
>>>> +			}
>>>> +		} else {
>>>> +			ret = rte_vfio_container_dma_unmap(vfio_container_fd,
>>>> +				reg->host_user_addr, reg->guest_phys_addr,
>>>> +				reg->size);
>>>> +			if (ret < 0) {
>>>> +				DRV_LOG(ERR, "DMA unmap failed.");
>>>> +				goto exit;
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if (dev->cvq)
>>>> +		ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map, avail_iova);
>>>> +
>>>> +exit:
>>>> +	free(mem);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>   static int
>>>>   virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
>>>>   {
>>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did __rte_unused, uint64_t *features)
>>>>   	return 0;
>>>>   }
>>>>   
>>>> +static uint64_t
>>>> +hva_to_gpa(int vid, uint64_t hva)
>>>> +{
>>>> +	struct rte_vhost_memory *mem = NULL;
>>>> +	struct rte_vhost_mem_region *reg;
>>>> +	uint32_t i;
>>>> +	uint64_t gpa = 0;
>>>> +
>>>> +	if (rte_vhost_get_mem_table(vid, &mem) < 0)
>>>> +		goto exit;
>>>> +
>>>> +	for (i = 0; i < mem->nregions; i++) {
>>>> +		reg = &mem->regions[i];
>>>> +
>>>> +		if (hva >= reg->host_user_addr &&
>>>> +				hva < reg->host_user_addr + reg->size) {
>>>> +			gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
>>>> +			break;
>>>> +		}
>>>> +	}
>>>> +
>>>> +exit:
>>>> +	if (mem)
>>>> +		free(mem);
>>>> +	return gpa;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
>>>> +{
>>>> +	struct virtio_hw *hw = &dev->hw;
>>>> +	int i, vid, nr_vring, ret;
>>>> +	struct rte_vhost_vring vr;
>>>> +	struct virtio_pmd_ctrl ctrl;
>>>> +	int dlen[1];
>>>> +
>>>> +	vid = dev->vid;
>>>> +	nr_vring = rte_vhost_get_vring_num(vid);
>>>> +
>>>> +	if (dev->vqs)
>>>> +		rte_free(dev->vqs);
>>>> +
>>>> +	dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) * nr_vring, 0);
>>>> +
>>>> +	for (i = 0; i < nr_vring; i++) {
>>>> +		struct virtqueue *vq = &dev->vqs[i];
>>>> +
>>>> +		rte_vhost_get_vhost_vring(vid, i, &vr);
>>>> +
>>>> +		vq->vq_queue_index = i;
>>>> +		vq->vq_nentries = vr.size;
>>>> +		vq->vq_ring_mem = hva_to_gpa(vid, (uint64_t)(uintptr_t)vr.desc);
>>>> +		if (vq->vq_ring_mem  == 0) {
>>>> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
>>>> +			ret = -1;
>>>> +			goto out_free_vqs;
>>>> +		}
>>>> +
>>>> +		ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
>>>> +		if (ret) {
>>>> +			DRV_LOG(ERR, "Fail to setup queue.");
>>>> +			goto out_free_vqs;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if (dev->cvq) {
>>>> +		ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
>>>> +		if (ret) {
>>>> +			DRV_LOG(ERR, "Fail to setup ctrl queue.");
>>>> +			goto out_free_vqs;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
>>>> +
>>>> +	if (!dev->cvq)
>>>> +		return 0;
>>>> +
>>>> +	ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
>>>> +	ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
>>>> +	memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
>>>> +
>>>> +	dlen[0] = sizeof(uint16_t);
>>>> +
>>>> +	ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
>>>> +	if (ret) {
>>>> +		DRV_LOG(ERR, "Multiqueue configured but send command "
>>>> +			  "failed, this is too late now...");
>>>> +		ret = -EINVAL;
>>>> +		goto out_free_vqs;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +out_free_vqs:
>>>> +	rte_free(dev->vqs);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int
>>>> +virtio_vdpa_dev_config(int vid)
>>>> +{
>>>> +	int did, ret;
>>>> +	struct internal_list *list;
>>>> +	struct virtio_vdpa_device *dev;
>>>> +
>>>> +	did = rte_vhost_get_vdpa_device_id(vid);
>>>> +	list = find_internal_resource_by_did(did);
>>>> +	if (list == NULL) {
>>>> +		DRV_LOG(ERR, "Invalid device id: %d", did);
>>>> +		return -1;
>>>> +	}
>>>> +
>>>> +	dev = list->dev;
>>>> +	dev->vid = vid;
>>>> +
>>>> +	rte_spinlock_lock(&dev->lock);
>>>> +
>>>> +	ret = virtio_vdpa_dma_map(dev, 1);
>>>> +	if (ret)
>>>> +		goto out_unlock;
>>>> +
>>>> +	ret = virtio_vdpa_start(dev);
>>>> +
>>>> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>>>> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>>>> +
>>>> +out_unlock:
>>>> +	rte_spinlock_unlock(&dev->lock);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>>   static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>>>   	.get_queue_num = virtio_vdpa_get_queue_num,
>>>>   	.get_features = virtio_vdpa_get_features,
>>>>   	.get_protocol_features = virtio_vdpa_get_protocol_features,
>>>> +	.dev_conf = virtio_vdpa_dev_config,
>>>>   };
>>>>   
>>>>   static inline int
>>>> -- 
>>>> 2.21.0
>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-09-04  4:06         ` Jason Wang
@ 2019-09-04  6:56           ` Maxime Coquelin
  2019-09-05  2:57             ` Tiwei Bie
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-04  6:56 UTC (permalink / raw)
  To: Jason Wang, Tiwei Bie
  Cc: zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable



On 9/4/19 6:06 AM, Jason Wang wrote:
> 
> On 2019/9/3 下午4:49, Tiwei Bie wrote:
>> On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote:
>>> On 9/3/19 7:30 AM, Tiwei Bie wrote:
>>>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
>>>>> In order to support multi-queue, we need to implement the control
>>>>> path. The problem is that both the Vhost-user master and slave use
>>>>> VAs in their processes address spaces as IOVAs, which creates
>>>>> collusions between the data rings IOVAs managed the master, and
>>>>> the Control ring IOVAs. The trick here is to remmap the Control
>>>>> ring memory to another range, after the slave is aware of master's
>>>>> ranges.
>>>>>
>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>> ---
>>>>>   drivers/net/virtio/virtio_vdpa.c | 255
>>>>> +++++++++++++++++++++++++++++++
>>>>>   1 file changed, 255 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/virtio/virtio_vdpa.c
>>>>> b/drivers/net/virtio/virtio_vdpa.c
>>>>> index fc52a8e92..13b4dd07d 100644
>>>>> --- a/drivers/net/virtio/virtio_vdpa.c
>>>>> +++ b/drivers/net/virtio/virtio_vdpa.c
>>>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct
>>>>> rte_pci_device *pdev)
>>>>>       return list;
>>>>>   }
>>>>>   +static int
>>>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int
>>>>> do_map,
>>>>> +        uint64_t iova)
>>>>> +{
>>>>> +    const struct rte_memzone *mz;
>>>>> +    int ret;
>>>>> +
>>>>> +    /*
>>>>> +     * IOVAs are processes VAs. We cannot use them as the Data and
>>>>> Control
>>>>> +     * paths are run in different processes, which may (does) lead to
>>>>> +     * collusions. The trick here is to fixup Ctrl path IOVAs so
>>>>> that they
>>>>> +     * start after the Data path ranges.
>>>>> +     */
>>>>> +    if (do_map) {
>>>>> +        mz = dev->cvq->cq.mz;
>>>>> +        ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>>>>> +                (uint64_t)(uintptr_t)mz->addr,
>>>>> +                iova, mz->len);
>>>>> +        if (ret < 0) {
>>>>> +            DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
>>>>> +            return ret;
>>>>> +        }
>>>>> +
>>>>> +        dev->cvq->vq_ring_mem = iova;
>>>>> +        iova += mz->len;
>>>>> +
>>>>> +        mz = dev->cvq->cq.virtio_net_hdr_mz;
>>>>> +        ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
>>>>> +                (uint64_t)(uintptr_t)mz->addr,
>>>>> +                iova, mz->len);
>>>>> +        if (ret < 0) {
>>>>> +            DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
>>>>> +            return ret;
>>>>> +        }
>>>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
>>>> via the device which may have potential risks.
>>> I get what you mean, but I'm not sure to see how we could avoid that.
>>> AFAIU, we need to map the control queue in the device IOMMU, otherwise
>>> how could the host (in case of virtual device) or the NIC (in case of
>>> Virtio offload), could access the ring?
>>> Any thoughts?
>> I also don't see a way to avoid that. That's why I said in below
>> thread that I think the control queue based interface seems not a
>> quite good interface for a backend device:
>>
>> https://lkml.org/lkml/2019/9/2/934
>>
>> In IFCVF NIC, we added a MMIO based interface to replace control
>> queue for the multiqueue setup in vDPA mode.
>>
>> Jason is proposing some changes to make virtio device suitable
>> for backend device. I'm not sure whether it's possible to cover
>> this case as well..
> 
> 
> A silly question, can we do dynamic mapping like what kernel driver did
> here?

Not silly at all, it is of course possible.
I will implement that in my v2.

Thanks!
Maxime

> Thanks
> 
> 
>>
>> Regards,
>> Tiwei
>>
>>> Thanks,
>>> Maxime
>>>> Regards,
>>>> Tiwei
>>>>
>>>>> +
>>>>> +        dev->cvq->cq.virtio_net_hdr_mem = iova;
>>>>> +    } else {
>>>>> +        mz = dev->cvq->cq.mz;
>>>>> +        ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>>>>> +                (uint64_t)(uintptr_t)mz->addr,
>>>>> +                iova, mz->len);
>>>>> +        if (ret < 0) {
>>>>> +            DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
>>>>> +            return ret;
>>>>> +        }
>>>>> +
>>>>> +        dev->cvq->vq_ring_mem = 0;
>>>>> +        iova += mz->len;
>>>>> +
>>>>> +        mz = dev->cvq->cq.virtio_net_hdr_mz;
>>>>> +        ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
>>>>> +                (uint64_t)(uintptr_t)mz->addr,
>>>>> +                iova, mz->len);
>>>>> +        if (ret < 0) {
>>>>> +            DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
>>>>> +            return ret;
>>>>> +        }
>>>>> +
>>>>> +        dev->cvq->cq.virtio_net_hdr_mem = 0;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
>>>>> +{
>>>>> +    uint32_t i;
>>>>> +    int ret;
>>>>> +    struct rte_vhost_memory *mem = NULL;
>>>>> +    int vfio_container_fd;
>>>>> +    uint64_t avail_iova = 0;
>>>>> +
>>>>> +    ret = rte_vhost_get_mem_table(dev->vid, &mem);
>>>>> +    if (ret < 0 || !mem) {
>>>>> +        DRV_LOG(ERR, "failed to get VM memory layout.");
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>> +    vfio_container_fd = dev->vfio_container_fd;
>>>>> +
>>>>> +    for (i = 0; i < mem->nregions; i++) {
>>>>> +        struct rte_vhost_mem_region *reg;
>>>>> +
>>>>> +        reg = &mem->regions[i];
>>>>> +        DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
>>>>> +            "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
>>>>> +            do_map ? "DMA map" : "DMA unmap", i,
>>>>> +            reg->host_user_addr, reg->guest_phys_addr, reg->size);
>>>>> +
>>>>> +        if (reg->guest_phys_addr + reg->size > avail_iova)
>>>>> +            avail_iova = reg->guest_phys_addr + reg->size;
>>>>> +
>>>>> +        if (do_map) {
>>>>> +            ret = rte_vfio_container_dma_map(vfio_container_fd,
>>>>> +                reg->host_user_addr, reg->guest_phys_addr,
>>>>> +                reg->size);
>>>>> +            if (ret < 0) {
>>>>> +                DRV_LOG(ERR, "DMA map failed.");
>>>>> +                goto exit;
>>>>> +            }
>>>>> +        } else {
>>>>> +            ret = rte_vfio_container_dma_unmap(vfio_container_fd,
>>>>> +                reg->host_user_addr, reg->guest_phys_addr,
>>>>> +                reg->size);
>>>>> +            if (ret < 0) {
>>>>> +                DRV_LOG(ERR, "DMA unmap failed.");
>>>>> +                goto exit;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if (dev->cvq)
>>>>> +        ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map,
>>>>> avail_iova);
>>>>> +
>>>>> +exit:
>>>>> +    free(mem);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>>   static int
>>>>>   virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
>>>>>   {
>>>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did
>>>>> __rte_unused, uint64_t *features)
>>>>>       return 0;
>>>>>   }
>>>>>   +static uint64_t
>>>>> +hva_to_gpa(int vid, uint64_t hva)
>>>>> +{
>>>>> +    struct rte_vhost_memory *mem = NULL;
>>>>> +    struct rte_vhost_mem_region *reg;
>>>>> +    uint32_t i;
>>>>> +    uint64_t gpa = 0;
>>>>> +
>>>>> +    if (rte_vhost_get_mem_table(vid, &mem) < 0)
>>>>> +        goto exit;
>>>>> +
>>>>> +    for (i = 0; i < mem->nregions; i++) {
>>>>> +        reg = &mem->regions[i];
>>>>> +
>>>>> +        if (hva >= reg->host_user_addr &&
>>>>> +                hva < reg->host_user_addr + reg->size) {
>>>>> +            gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +exit:
>>>>> +    if (mem)
>>>>> +        free(mem);
>>>>> +    return gpa;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
>>>>> +{
>>>>> +    struct virtio_hw *hw = &dev->hw;
>>>>> +    int i, vid, nr_vring, ret;
>>>>> +    struct rte_vhost_vring vr;
>>>>> +    struct virtio_pmd_ctrl ctrl;
>>>>> +    int dlen[1];
>>>>> +
>>>>> +    vid = dev->vid;
>>>>> +    nr_vring = rte_vhost_get_vring_num(vid);
>>>>> +
>>>>> +    if (dev->vqs)
>>>>> +        rte_free(dev->vqs);
>>>>> +
>>>>> +    dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) *
>>>>> nr_vring, 0);
>>>>> +
>>>>> +    for (i = 0; i < nr_vring; i++) {
>>>>> +        struct virtqueue *vq = &dev->vqs[i];
>>>>> +
>>>>> +        rte_vhost_get_vhost_vring(vid, i, &vr);
>>>>> +
>>>>> +        vq->vq_queue_index = i;
>>>>> +        vq->vq_nentries = vr.size;
>>>>> +        vq->vq_ring_mem = hva_to_gpa(vid,
>>>>> (uint64_t)(uintptr_t)vr.desc);
>>>>> +        if (vq->vq_ring_mem  == 0) {
>>>>> +            DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
>>>>> +            ret = -1;
>>>>> +            goto out_free_vqs;
>>>>> +        }
>>>>> +
>>>>> +        ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
>>>>> +        if (ret) {
>>>>> +            DRV_LOG(ERR, "Fail to setup queue.");
>>>>> +            goto out_free_vqs;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if (dev->cvq) {
>>>>> +        ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
>>>>> +        if (ret) {
>>>>> +            DRV_LOG(ERR, "Fail to setup ctrl queue.");
>>>>> +            goto out_free_vqs;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
>>>>> +
>>>>> +    if (!dev->cvq)
>>>>> +        return 0;
>>>>> +
>>>>> +    ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
>>>>> +    ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
>>>>> +    memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
>>>>> +
>>>>> +    dlen[0] = sizeof(uint16_t);
>>>>> +
>>>>> +    ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
>>>>> +    if (ret) {
>>>>> +        DRV_LOG(ERR, "Multiqueue configured but send command "
>>>>> +              "failed, this is too late now...");
>>>>> +        ret = -EINVAL;
>>>>> +        goto out_free_vqs;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +out_free_vqs:
>>>>> +    rte_free(dev->vqs);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static int
>>>>> +virtio_vdpa_dev_config(int vid)
>>>>> +{
>>>>> +    int did, ret;
>>>>> +    struct internal_list *list;
>>>>> +    struct virtio_vdpa_device *dev;
>>>>> +
>>>>> +    did = rte_vhost_get_vdpa_device_id(vid);
>>>>> +    list = find_internal_resource_by_did(did);
>>>>> +    if (list == NULL) {
>>>>> +        DRV_LOG(ERR, "Invalid device id: %d", did);
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    dev = list->dev;
>>>>> +    dev->vid = vid;
>>>>> +
>>>>> +    rte_spinlock_lock(&dev->lock);
>>>>> +
>>>>> +    ret = virtio_vdpa_dma_map(dev, 1);
>>>>> +    if (ret)
>>>>> +        goto out_unlock;
>>>>> +
>>>>> +    ret = virtio_vdpa_start(dev);
>>>>> +
>>>>> +    if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
>>>>> +        DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
>>>>> +
>>>>> +out_unlock:
>>>>> +    rte_spinlock_unlock(&dev->lock);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>>   static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
>>>>>       .get_queue_num = virtio_vdpa_get_queue_num,
>>>>>       .get_features = virtio_vdpa_get_features,
>>>>>       .get_protocol_features = virtio_vdpa_get_protocol_features,
>>>>> +    .dev_conf = virtio_vdpa_dev_config,
>>>>>   };
>>>>>     static inline int
>>>>> -- 
>>>>> 2.21.0
>>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device
  2019-09-04  6:56           ` Maxime Coquelin
@ 2019-09-05  2:57             ` Tiwei Bie
  0 siblings, 0 replies; 50+ messages in thread
From: Tiwei Bie @ 2019-09-05  2:57 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Jason Wang, zhihong.wang, amorenoz, xiao.w.wang, dev, jfreimann, stable

On Wed, Sep 04, 2019 at 08:56:32AM +0200, Maxime Coquelin wrote:
> On 9/4/19 6:06 AM, Jason Wang wrote:
> > On 2019/9/3 下午4:49, Tiwei Bie wrote:
> >> On Tue, Sep 03, 2019 at 09:40:25AM +0200, Maxime Coquelin wrote:
> >>> On 9/3/19 7:30 AM, Tiwei Bie wrote:
> >>>> On Thu, Aug 29, 2019 at 09:59:55AM +0200, Maxime Coquelin wrote:
> >>>>> In order to support multi-queue, we need to implement the control
> >>>>> path. The problem is that both the Vhost-user master and slave use
> >>>>> VAs in their processes address spaces as IOVAs, which creates
> >>>>> collusions between the data rings IOVAs managed the master, and
> >>>>> the Control ring IOVAs. The trick here is to remmap the Control
> >>>>> ring memory to another range, after the slave is aware of master's
> >>>>> ranges.
> >>>>>
> >>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>> ---
> >>>>>   drivers/net/virtio/virtio_vdpa.c | 255
> >>>>> +++++++++++++++++++++++++++++++
> >>>>>   1 file changed, 255 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/net/virtio/virtio_vdpa.c
> >>>>> b/drivers/net/virtio/virtio_vdpa.c
> >>>>> index fc52a8e92..13b4dd07d 100644
> >>>>> --- a/drivers/net/virtio/virtio_vdpa.c
> >>>>> +++ b/drivers/net/virtio/virtio_vdpa.c
> >>>>> @@ -106,6 +106,127 @@ find_internal_resource_by_dev(struct
> >>>>> rte_pci_device *pdev)
> >>>>>       return list;
> >>>>>   }
> >>>>>   +static int
> >>>>> +virtio_vdpa_dma_map_ctrl_queue(struct virtio_vdpa_device *dev, int
> >>>>> do_map,
> >>>>> +        uint64_t iova)
> >>>>> +{
> >>>>> +    const struct rte_memzone *mz;
> >>>>> +    int ret;
> >>>>> +
> >>>>> +    /*
> >>>>> +     * IOVAs are processes VAs. We cannot use them as the Data and
> >>>>> Control
> >>>>> +     * paths are run in different processes, which may (does) lead to
> >>>>> +     * collusions. The trick here is to fixup Ctrl path IOVAs so
> >>>>> that they
> >>>>> +     * start after the Data path ranges.
> >>>>> +     */
> >>>>> +    if (do_map) {
> >>>>> +        mz = dev->cvq->cq.mz;
> >>>>> +        ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> >>>>> +                (uint64_t)(uintptr_t)mz->addr,
> >>>>> +                iova, mz->len);
> >>>>> +        if (ret < 0) {
> >>>>> +            DRV_LOG(ERR, "Failed to map ctrl ring (%d)", ret);
> >>>>> +            return ret;
> >>>>> +        }
> >>>>> +
> >>>>> +        dev->cvq->vq_ring_mem = iova;
> >>>>> +        iova += mz->len;
> >>>>> +
> >>>>> +        mz = dev->cvq->cq.virtio_net_hdr_mz;
> >>>>> +        ret = rte_vfio_container_dma_map(dev->vfio_container_fd,
> >>>>> +                (uint64_t)(uintptr_t)mz->addr,
> >>>>> +                iova, mz->len);
> >>>>> +        if (ret < 0) {
> >>>>> +            DRV_LOG(ERR, "Failed to map ctrl headers (%d)", ret);
> >>>>> +            return ret;
> >>>>> +        }
> >>>> This will allow guest to access the cq.mz and cq.virtio_net_hdr_mz
> >>>> via the device which may have potential risks.
> >>> I get what you mean, but I'm not sure to see how we could avoid that.
> >>> AFAIU, we need to map the control queue in the device IOMMU, otherwise
> >>> how could the host (in case of virtual device) or the NIC (in case of
> >>> Virtio offload), could access the ring?
> >>> Any thoughts?
> >> I also don't see a way to avoid that. That's why I said in below
> >> thread that I think the control queue based interface seems not a
> >> quite good interface for a backend device:
> >>
> >> https://lkml.org/lkml/2019/9/2/934
> >>
> >> In IFCVF NIC, we added a MMIO based interface to replace control
> >> queue for the multiqueue setup in vDPA mode.
> >>
> >> Jason is proposing some changes to make virtio device suitable
> >> for backend device. I'm not sure whether it's possible to cover
> >> this case as well..
> > 
> > 
> > A silly question, can we do dynamic mapping like what kernel driver did
> > here?
> 
> Not silly at all, it is of course possible.

+1. It's a good idea to mitigate the risks (if possible, we should
make the Rx/Tx held while cvq is being used, or try to make cvq's
iova unpredictable each time from guest side).

> I will implement that in my v2.

Thanks!
Tiwei

> 
> Thanks!
> Maxime
> 
> > Thanks
> > 
> > 
> >>
> >> Regards,
> >> Tiwei
> >>
> >>> Thanks,
> >>> Maxime
> >>>> Regards,
> >>>> Tiwei
> >>>>
> >>>>> +
> >>>>> +        dev->cvq->cq.virtio_net_hdr_mem = iova;
> >>>>> +    } else {
> >>>>> +        mz = dev->cvq->cq.mz;
> >>>>> +        ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> >>>>> +                (uint64_t)(uintptr_t)mz->addr,
> >>>>> +                iova, mz->len);
> >>>>> +        if (ret < 0) {
> >>>>> +            DRV_LOG(ERR, "Failed to unmap ctrl ring (%d)", ret);
> >>>>> +            return ret;
> >>>>> +        }
> >>>>> +
> >>>>> +        dev->cvq->vq_ring_mem = 0;
> >>>>> +        iova += mz->len;
> >>>>> +
> >>>>> +        mz = dev->cvq->cq.virtio_net_hdr_mz;
> >>>>> +        ret = rte_vfio_container_dma_unmap(dev->vfio_container_fd,
> >>>>> +                (uint64_t)(uintptr_t)mz->addr,
> >>>>> +                iova, mz->len);
> >>>>> +        if (ret < 0) {
> >>>>> +            DRV_LOG(ERR, "Failed to unmap ctrl headers (%d)", ret);
> >>>>> +            return ret;
> >>>>> +        }
> >>>>> +
> >>>>> +        dev->cvq->cq.virtio_net_hdr_mem = 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static int
> >>>>> +virtio_vdpa_dma_map(struct virtio_vdpa_device *dev, int do_map)
> >>>>> +{
> >>>>> +    uint32_t i;
> >>>>> +    int ret;
> >>>>> +    struct rte_vhost_memory *mem = NULL;
> >>>>> +    int vfio_container_fd;
> >>>>> +    uint64_t avail_iova = 0;
> >>>>> +
> >>>>> +    ret = rte_vhost_get_mem_table(dev->vid, &mem);
> >>>>> +    if (ret < 0 || !mem) {
> >>>>> +        DRV_LOG(ERR, "failed to get VM memory layout.");
> >>>>> +        return ret;
> >>>>> +    }
> >>>>> +
> >>>>> +    vfio_container_fd = dev->vfio_container_fd;
> >>>>> +
> >>>>> +    for (i = 0; i < mem->nregions; i++) {
> >>>>> +        struct rte_vhost_mem_region *reg;
> >>>>> +
> >>>>> +        reg = &mem->regions[i];
> >>>>> +        DRV_LOG(INFO, "%s, region %u: HVA 0x%" PRIx64 ", "
> >>>>> +            "GPA 0x%" PRIx64 ", size 0x%" PRIx64 ".",
> >>>>> +            do_map ? "DMA map" : "DMA unmap", i,
> >>>>> +            reg->host_user_addr, reg->guest_phys_addr, reg->size);
> >>>>> +
> >>>>> +        if (reg->guest_phys_addr + reg->size > avail_iova)
> >>>>> +            avail_iova = reg->guest_phys_addr + reg->size;
> >>>>> +
> >>>>> +        if (do_map) {
> >>>>> +            ret = rte_vfio_container_dma_map(vfio_container_fd,
> >>>>> +                reg->host_user_addr, reg->guest_phys_addr,
> >>>>> +                reg->size);
> >>>>> +            if (ret < 0) {
> >>>>> +                DRV_LOG(ERR, "DMA map failed.");
> >>>>> +                goto exit;
> >>>>> +            }
> >>>>> +        } else {
> >>>>> +            ret = rte_vfio_container_dma_unmap(vfio_container_fd,
> >>>>> +                reg->host_user_addr, reg->guest_phys_addr,
> >>>>> +                reg->size);
> >>>>> +            if (ret < 0) {
> >>>>> +                DRV_LOG(ERR, "DMA unmap failed.");
> >>>>> +                goto exit;
> >>>>> +            }
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    if (dev->cvq)
> >>>>> +        ret = virtio_vdpa_dma_map_ctrl_queue(dev, do_map,
> >>>>> avail_iova);
> >>>>> +
> >>>>> +exit:
> >>>>> +    free(mem);
> >>>>> +
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +
> >>>>>   static int
> >>>>>   virtio_vdpa_vfio_setup(struct virtio_vdpa_device *dev)
> >>>>>   {
> >>>>> @@ -216,10 +337,144 @@ virtio_vdpa_get_protocol_features(int did
> >>>>> __rte_unused, uint64_t *features)
> >>>>>       return 0;
> >>>>>   }
> >>>>>   +static uint64_t
> >>>>> +hva_to_gpa(int vid, uint64_t hva)
> >>>>> +{
> >>>>> +    struct rte_vhost_memory *mem = NULL;
> >>>>> +    struct rte_vhost_mem_region *reg;
> >>>>> +    uint32_t i;
> >>>>> +    uint64_t gpa = 0;
> >>>>> +
> >>>>> +    if (rte_vhost_get_mem_table(vid, &mem) < 0)
> >>>>> +        goto exit;
> >>>>> +
> >>>>> +    for (i = 0; i < mem->nregions; i++) {
> >>>>> +        reg = &mem->regions[i];
> >>>>> +
> >>>>> +        if (hva >= reg->host_user_addr &&
> >>>>> +                hva < reg->host_user_addr + reg->size) {
> >>>>> +            gpa = hva - reg->host_user_addr + reg->guest_phys_addr;
> >>>>> +            break;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +exit:
> >>>>> +    if (mem)
> >>>>> +        free(mem);
> >>>>> +    return gpa;
> >>>>> +}
> >>>>> +
> >>>>> +static int
> >>>>> +virtio_vdpa_start(struct virtio_vdpa_device *dev)
> >>>>> +{
> >>>>> +    struct virtio_hw *hw = &dev->hw;
> >>>>> +    int i, vid, nr_vring, ret;
> >>>>> +    struct rte_vhost_vring vr;
> >>>>> +    struct virtio_pmd_ctrl ctrl;
> >>>>> +    int dlen[1];
> >>>>> +
> >>>>> +    vid = dev->vid;
> >>>>> +    nr_vring = rte_vhost_get_vring_num(vid);
> >>>>> +
> >>>>> +    if (dev->vqs)
> >>>>> +        rte_free(dev->vqs);
> >>>>> +
> >>>>> +    dev->vqs = rte_zmalloc("virtio_vdpa", sizeof(*dev->vqs) *
> >>>>> nr_vring, 0);
> >>>>> +
> >>>>> +    for (i = 0; i < nr_vring; i++) {
> >>>>> +        struct virtqueue *vq = &dev->vqs[i];
> >>>>> +
> >>>>> +        rte_vhost_get_vhost_vring(vid, i, &vr);
> >>>>> +
> >>>>> +        vq->vq_queue_index = i;
> >>>>> +        vq->vq_nentries = vr.size;
> >>>>> +        vq->vq_ring_mem = hva_to_gpa(vid,
> >>>>> (uint64_t)(uintptr_t)vr.desc);
> >>>>> +        if (vq->vq_ring_mem  == 0) {
> >>>>> +            DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> >>>>> +            ret = -1;
> >>>>> +            goto out_free_vqs;
> >>>>> +        }
> >>>>> +
> >>>>> +        ret = VTPCI_OPS(hw)->setup_queue(hw, vq);
> >>>>> +        if (ret) {
> >>>>> +            DRV_LOG(ERR, "Fail to setup queue.");
> >>>>> +            goto out_free_vqs;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    if (dev->cvq) {
> >>>>> +        ret = VTPCI_OPS(hw)->setup_queue(hw, dev->cvq);
> >>>>> +        if (ret) {
> >>>>> +            DRV_LOG(ERR, "Fail to setup ctrl queue.");
> >>>>> +            goto out_free_vqs;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
> >>>>> +
> >>>>> +    if (!dev->cvq)
> >>>>> +        return 0;
> >>>>> +
> >>>>> +    ctrl.hdr.class = VIRTIO_NET_CTRL_MQ;
> >>>>> +    ctrl.hdr.cmd = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET;
> >>>>> +    memcpy(ctrl.data, &dev->max_queue_pairs, sizeof(uint16_t));
> >>>>> +
> >>>>> +    dlen[0] = sizeof(uint16_t);
> >>>>> +
> >>>>> +    ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> >>>>> +    if (ret) {
> >>>>> +        DRV_LOG(ERR, "Multiqueue configured but send command "
> >>>>> +              "failed, this is too late now...");
> >>>>> +        ret = -EINVAL;
> >>>>> +        goto out_free_vqs;
> >>>>> +    }
> >>>>> +
> >>>>> +    return 0;
> >>>>> +out_free_vqs:
> >>>>> +    rte_free(dev->vqs);
> >>>>> +
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +
> >>>>> +static int
> >>>>> +virtio_vdpa_dev_config(int vid)
> >>>>> +{
> >>>>> +    int did, ret;
> >>>>> +    struct internal_list *list;
> >>>>> +    struct virtio_vdpa_device *dev;
> >>>>> +
> >>>>> +    did = rte_vhost_get_vdpa_device_id(vid);
> >>>>> +    list = find_internal_resource_by_did(did);
> >>>>> +    if (list == NULL) {
> >>>>> +        DRV_LOG(ERR, "Invalid device id: %d", did);
> >>>>> +        return -1;
> >>>>> +    }
> >>>>> +
> >>>>> +    dev = list->dev;
> >>>>> +    dev->vid = vid;
> >>>>> +
> >>>>> +    rte_spinlock_lock(&dev->lock);
> >>>>> +
> >>>>> +    ret = virtio_vdpa_dma_map(dev, 1);
> >>>>> +    if (ret)
> >>>>> +        goto out_unlock;
> >>>>> +
> >>>>> +    ret = virtio_vdpa_start(dev);
> >>>>> +
> >>>>> +    if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> >>>>> +        DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> >>>>> +
> >>>>> +out_unlock:
> >>>>> +    rte_spinlock_unlock(&dev->lock);
> >>>>> +
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +
> >>>>>   static struct rte_vdpa_dev_ops virtio_vdpa_ops = {
> >>>>>       .get_queue_num = virtio_vdpa_get_queue_num,
> >>>>>       .get_features = virtio_vdpa_get_features,
> >>>>>       .get_protocol_features = virtio_vdpa_get_protocol_features,
> >>>>> +    .dev_conf = virtio_vdpa_dev_config,
> >>>>>   };
> >>>>>     static inline int
> >>>>> -- 
> >>>>> 2.21.0
> >>>>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
                   ` (14 preceding siblings ...)
  2019-08-29  8:00 ` [dpdk-dev] [PATCH 15/15] doc: add documentation for Virtio vDPA driver Maxime Coquelin
@ 2019-09-09 11:55 ` " Shahaf Shuler
  2019-09-10  7:46   ` Maxime Coquelin
  15 siblings, 1 reply; 50+ messages in thread
From: Shahaf Shuler @ 2019-09-09 11:55 UTC (permalink / raw)
  To: Maxime Coquelin, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad

Hi Maxime, 

Thursday, August 29, 2019 11:00 AM, Maxime Coquelin:
> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
> 
> vDPA allows to offload Virtio Datapath processing by supported NICs, like
> IFCVF for example.
> 
> The control path has to be handled by a dedicated vDPA driver, so that it can
> translate Vhost-user protocol requests to proprietary NICs registers
> accesses.
> 
> This driver is the vDPA driver for Virtio devices, meaning that Vhost-user
> protocol requests get translated to Virtio registers accesses as per defined in
> the Virtio spec.
> 
> Basically, it can be used within a guest with a para-virtualized Virtio-net
> device, or even with a full Virtio HW offload NIC directly on host.

Can you elaborate more on the use cases to use such driver? 

1. If the underlying HW can support full virtio device, why we need to work w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one?
2. why it is preferable to work w/ virtio device as the backend device to be used w/ vDPA v.s. working w/ the underlying HW VF? 

Is nested virtualization is what you have in mind?

> 
> Amongst the main features, all currently supported Virtio spec versions are
> supported (split & packed rings, but only tested with split ring for now) and
> also multiqueue support is added by implementing the cotnrol virtqueue in
> the driver.
> 
> The structure of this driver is heavily based on IFCVF vDPA.
> 
> Maxime Coquelin (15):
>   vhost: remove vhost kernel header inclusion
>   vhost: configure vDPA as soon as the device is ready
>   net/virtio: move control path fonctions in virtqueue file
>   net/virtio: add virtio PCI subsystem device ID declaration
>   net/virtio: save notify bar ID in virtio HW struct
>   net/virtio: add skeleton for virtio vDPA driver
>   net/virtio: add vDPA ops to get number of queue
>   net/virtio: add virtio vDPA op to get features
>   net/virtio: add virtio vDPA op to get protocol features
>   net/virtio: add vDPA op to configure and start the device
>   net/virtio: add vDPA op to stop and close the device
>   net/virtio: add vDPA op to set features
>   net/virtio: add vDPA ops to get VFIO FDs
>   net/virtio: add vDPA op to get notification area
>   doc: add documentation for Virtio vDPA driver
> 
>  config/common_linux                |   1 +
>  doc/guides/nics/index.rst          |   1 +
>  doc/guides/nics/virtio_vdpa.rst    |  45 ++
>  drivers/net/ifc/ifcvf_vdpa.c       |   1 +
>  drivers/net/virtio/Makefile        |   4 +
>  drivers/net/virtio/meson.build     |   3 +-
>  drivers/net/virtio/virtio_ethdev.c | 252 --------
>  drivers/net/virtio/virtio_pci.c    |   6 +-
>  drivers/net/virtio/virtio_pci.h    |   2 +
>  drivers/net/virtio/virtio_vdpa.c   | 918 +++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.c     | 255 ++++++++
>  drivers/net/virtio/virtqueue.h     |   5 +
>  lib/librte_vhost/rte_vdpa.h        |   1 -
>  lib/librte_vhost/rte_vhost.h       |   9 +-
>  lib/librte_vhost/vhost_user.c      |   3 +-
>  15 files changed, 1243 insertions(+), 263 deletions(-)  create mode 100644
> doc/guides/nics/virtio_vdpa.rst  create mode 100644
> drivers/net/virtio/virtio_vdpa.c
> 
> --
> 2.21.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-09-09 11:55 ` [dpdk-dev] [PATCH 00/15] Introduce " Shahaf Shuler
@ 2019-09-10  7:46   ` Maxime Coquelin
  2019-09-10 13:44     ` Shahaf Shuler
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-10  7:46 UTC (permalink / raw)
  To: Shahaf Shuler, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad

Hi Shahaf,

On 9/9/19 1:55 PM, Shahaf Shuler wrote:
> Hi Maxime, 
> 
> Thursday, August 29, 2019 11:00 AM, Maxime Coquelin:
>> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
>>
>> vDPA allows to offload Virtio Datapath processing by supported NICs, like
>> IFCVF for example.
>>
>> The control path has to be handled by a dedicated vDPA driver, so that it can
>> translate Vhost-user protocol requests to proprietary NICs registers
>> accesses.
>>
>> This driver is the vDPA driver for Virtio devices, meaning that Vhost-user
>> protocol requests get translated to Virtio registers accesses as per defined in
>> the Virtio spec.
>>
>> Basically, it can be used within a guest with a para-virtualized Virtio-net
>> device, or even with a full Virtio HW offload NIC directly on host.
> 
> Can you elaborate more on the use cases to use such driver? 
> 
> 1. If the underlying HW can support full virtio device, why we need to work w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one?
> 2. why it is preferable to work w/ virtio device as the backend device to be used w/ vDPA v.s. working w/ the underlying HW VF?


IMHO, I see two uses cases where it can make sense to use vDPA with a
full offload HW device:
1. Live-migration support:  It makes it possible to switch to rings
   processing in SW during the migration as Virtio HH does not support
   dirty pages logging.

2. Can be used to provide a single standard interface (the vhost-user
   socket) to containers in the scope of CNFs. Doing so, the container
   does not need to be modified, whatever the HW NIC: Virtio datapath
   offload only, full Virtio offload, or no offload at all. In the
   latter case, it would not be optimal as it implies forwarding between
   the Vhost PMD and the HW NIC PMD but it would work.

> Is nested virtualization is what you have in mind?

For the para-virtualized virtio device, either nested virtualization,
or container running within the guest.

Maxime

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-09-10  7:46   ` Maxime Coquelin
@ 2019-09-10 13:44     ` Shahaf Shuler
  2019-09-10 13:56       ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Shahaf Shuler @ 2019-09-10 13:44 UTC (permalink / raw)
  To: Maxime Coquelin, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad

Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin:
> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
> 
> Hi Shahaf,
> 
> On 9/9/19 1:55 PM, Shahaf Shuler wrote:
> > Hi Maxime,
> >
> > Thursday, August 29, 2019 11:00 AM, Maxime Coquelin:
> >> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
> >>
> >> vDPA allows to offload Virtio Datapath processing by supported NICs,
> >> like IFCVF for example.
> >>
> >> The control path has to be handled by a dedicated vDPA driver, so
> >> that it can translate Vhost-user protocol requests to proprietary
> >> NICs registers accesses.
> >>
> >> This driver is the vDPA driver for Virtio devices, meaning that
> >> Vhost-user protocol requests get translated to Virtio registers
> >> accesses as per defined in the Virtio spec.
> >>
> >> Basically, it can be used within a guest with a para-virtualized
> >> Virtio-net device, or even with a full Virtio HW offload NIC directly on
> host.
> >
> > Can you elaborate more on the use cases to use such driver?
> >
> > 1. If the underlying HW can support full virtio device, why we need to work
> w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one?
> > 2. why it is preferable to work w/ virtio device as the backend device to be
> used w/ vDPA v.s. working w/ the underlying HW VF?
> 
> 
> IMHO, I see two uses cases where it can make sense to use vDPA with a full
> offload HW device:
> 1. Live-migration support:  It makes it possible to switch to rings
>    processing in SW during the migration as Virtio HH does not support
>    dirty pages logging.

Can you elaborate why specifically using virtio_vdpa PMD enables this SW relay during migration?
e.g. the vdpa PMD of intel that runs on top of VF do that today as well. 

> 
> 2. Can be used to provide a single standard interface (the vhost-user
>    socket) to containers in the scope of CNFs. Doing so, the container
>    does not need to be modified, whatever the HW NIC: Virtio datapath
>    offload only, full Virtio offload, or no offload at all. In the
>    latter case, it would not be optimal as it implies forwarding between
>    the Vhost PMD and the HW NIC PMD but it would work.

It is not clear to me the interface map in such system.
From what I understand the container will have virtio-user i/f and the host will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or not. 
For full emulation I guess you will need to expose the netdev of the fully emulated virtio device to the container?

Am trying to map when it is beneficial to use this virtio_vdpa PMD and when it is better to use the vendor specific vDPA PMD on top of VF. 

> 
> > Is nested virtualization is what you have in mind?
> 
> For the para-virtualized virtio device, either nested virtualization, or
> container running within the guest.
> 
> Maxime

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-09-10 13:44     ` Shahaf Shuler
@ 2019-09-10 13:56       ` Maxime Coquelin
  2019-09-11  5:15         ` Shahaf Shuler
  0 siblings, 1 reply; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-10 13:56 UTC (permalink / raw)
  To: Shahaf Shuler, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad



On 9/10/19 3:44 PM, Shahaf Shuler wrote:
> Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin:
>> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
>>
>> Hi Shahaf,
>>
>> On 9/9/19 1:55 PM, Shahaf Shuler wrote:
>>> Hi Maxime,
>>>
>>> Thursday, August 29, 2019 11:00 AM, Maxime Coquelin:
>>>> Subject: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
>>>>
>>>> vDPA allows to offload Virtio Datapath processing by supported NICs,
>>>> like IFCVF for example.
>>>>
>>>> The control path has to be handled by a dedicated vDPA driver, so
>>>> that it can translate Vhost-user protocol requests to proprietary
>>>> NICs registers accesses.
>>>>
>>>> This driver is the vDPA driver for Virtio devices, meaning that
>>>> Vhost-user protocol requests get translated to Virtio registers
>>>> accesses as per defined in the Virtio spec.
>>>>
>>>> Basically, it can be used within a guest with a para-virtualized
>>>> Virtio-net device, or even with a full Virtio HW offload NIC directly on
>> host.
>>>
>>> Can you elaborate more on the use cases to use such driver?
>>>
>>> 1. If the underlying HW can support full virtio device, why we need to work
>> w/ it w/ vDPA mode? Why not providing it to the VM as passthrough one?
>>> 2. why it is preferable to work w/ virtio device as the backend device to be
>> used w/ vDPA v.s. working w/ the underlying HW VF?
>>
>>
>> IMHO, I see two uses cases where it can make sense to use vDPA with a full
>> offload HW device:
>> 1. Live-migration support:  It makes it possible to switch to rings
>>    processing in SW during the migration as Virtio HH does not support
>>    dirty pages logging.
> 
> Can you elaborate why specifically using virtio_vdpa PMD enables this SW relay during migration?
> e.g. the vdpa PMD of intel that runs on top of VF do that today as well. 

I think there were a misunderstanding. When I said:
"
I see two uses cases where it can make sense to use vDPA with a full
offload HW device
"

I meant, I see two uses cases where it can make sense to use vDPA with a
full offload HW device, instead of the full offload HW device to use
Virtio PMD.

In other words, I think it is preferable to only offload the datapath,
so that it is possible to support SW live-migration.

>>
>> 2. Can be used to provide a single standard interface (the vhost-user
>>    socket) to containers in the scope of CNFs. Doing so, the container
>>    does not need to be modified, whatever the HW NIC: Virtio datapath
>>    offload only, full Virtio offload, or no offload at all. In the
>>    latter case, it would not be optimal as it implies forwarding between
>>    the Vhost PMD and the HW NIC PMD but it would work.
> 
> It is not clear to me the interface map in such system.
> From what I understand the container will have virtio-user i/f and the host will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or not. 
> For full emulation I guess you will need to expose the netdev of the fully emulated virtio device to the container?
> 
> Am trying to map when it is beneficial to use this virtio_vdpa PMD and when it is better to use the vendor specific vDPA PMD on top of VF.

I think that with above clarification, I made it clear that the goal of
this driver is not to replace vendors vDPA drivers (their control path
maybe not even be compatible), but instead to provide a generic driver
that can be used either within a guest with a para-virtualized Virtio-
net device or with HW NIC that fully offloads Virtio (both data and
control paths).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-09-10 13:56       ` Maxime Coquelin
@ 2019-09-11  5:15         ` Shahaf Shuler
  2019-09-11  7:15           ` Maxime Coquelin
  0 siblings, 1 reply; 50+ messages in thread
From: Shahaf Shuler @ 2019-09-11  5:15 UTC (permalink / raw)
  To: Maxime Coquelin, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad

Tuesday, September 10, 2019 4:56 PM, Maxime Coquelin:
> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
> On 9/10/19 3:44 PM, Shahaf Shuler wrote:
> > Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin:
> >> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver

[...]

> >>
> >> Hi Shahaf,
> >>
> >>
> >> IMHO, I see two uses cases where it can make sense to use vDPA with a
> >> full offload HW device:
> >> 1. Live-migration support:  It makes it possible to switch to rings
> >>    processing in SW during the migration as Virtio HH does not support
> >>    dirty pages logging.
> >
> > Can you elaborate why specifically using virtio_vdpa PMD enables this SW
> relay during migration?
> > e.g. the vdpa PMD of intel that runs on top of VF do that today as well.
> 
> I think there were a misunderstanding. When I said:
> "
> I see two uses cases where it can make sense to use vDPA with a full offload
> HW device "
> 
> I meant, I see two uses cases where it can make sense to use vDPA with a full
> offload HW device, instead of the full offload HW device to use Virtio PMD.
> 
> In other words, I think it is preferable to only offload the datapath, so that it
> is possible to support SW live-migration.
> 
> >>
> >> 2. Can be used to provide a single standard interface (the vhost-user
> >>    socket) to containers in the scope of CNFs. Doing so, the container
> >>    does not need to be modified, whatever the HW NIC: Virtio datapath
> >>    offload only, full Virtio offload, or no offload at all. In the
> >>    latter case, it would not be optimal as it implies forwarding between
> >>    the Vhost PMD and the HW NIC PMD but it would work.
> >
> > It is not clear to me the interface map in such system.
> > From what I understand the container will have virtio-user i/f and the host
> will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or
> not.
> > For full emulation I guess you will need to expose the netdev of the fully
> emulated virtio device to the container?
> >
> > Am trying to map when it is beneficial to use this virtio_vdpa PMD and
> when it is better to use the vendor specific vDPA PMD on top of VF.
> 
> I think that with above clarification, I made it clear that the goal of this driver
> is not to replace vendors vDPA drivers (their control path maybe not even be
> compatible), but instead to provide a generic driver that can be used either
> within a guest with a para-virtualized Virtio- net device or with HW NIC that
> fully offloads Virtio (both data and control paths).

Thanks Maxim, It is clearer now. 
From what I understand this driver is to be used w/ vDPA when the underlying device is virtio. 

I can perfectly understand the para-virt ( + nested virtualization / container inside VM) use case. 

Regarding the fully emulated virtio device on the host (instead of a plain VF) - for me the benefit still not clear - if you have HW that can expose VF why not use VF + vendor specific vDPA driver.

Anyway - for the series,
Acked-by: Shahaf Shuler <shahafs@mellanox.com>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
  2019-09-11  5:15         ` Shahaf Shuler
@ 2019-09-11  7:15           ` Maxime Coquelin
  0 siblings, 0 replies; 50+ messages in thread
From: Maxime Coquelin @ 2019-09-11  7:15 UTC (permalink / raw)
  To: Shahaf Shuler, tiwei.bie, zhihong.wang, amorenoz, xiao.w.wang,
	dev, jfreimann
  Cc: stable, Matan Azrad



On 9/11/19 7:15 AM, Shahaf Shuler wrote:
> Tuesday, September 10, 2019 4:56 PM, Maxime Coquelin:
>> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
>> On 9/10/19 3:44 PM, Shahaf Shuler wrote:
>>> Tuesday, September 10, 2019 10:46 AM, Maxime Coquelin:
>>>> Subject: Re: [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver
> 
> [...]
> 
>>>>
>>>> Hi Shahaf,
>>>>
>>>>
>>>> IMHO, I see two uses cases where it can make sense to use vDPA with a
>>>> full offload HW device:
>>>> 1. Live-migration support:  It makes it possible to switch to rings
>>>>    processing in SW during the migration as Virtio HH does not support
>>>>    dirty pages logging.
>>>
>>> Can you elaborate why specifically using virtio_vdpa PMD enables this SW
>> relay during migration?
>>> e.g. the vdpa PMD of intel that runs on top of VF do that today as well.
>>
>> I think there were a misunderstanding. When I said:
>> "
>> I see two uses cases where it can make sense to use vDPA with a full offload
>> HW device "
>>
>> I meant, I see two uses cases where it can make sense to use vDPA with a full
>> offload HW device, instead of the full offload HW device to use Virtio PMD.
>>
>> In other words, I think it is preferable to only offload the datapath, so that it
>> is possible to support SW live-migration.
>>
>>>>
>>>> 2. Can be used to provide a single standard interface (the vhost-user
>>>>    socket) to containers in the scope of CNFs. Doing so, the container
>>>>    does not need to be modified, whatever the HW NIC: Virtio datapath
>>>>    offload only, full Virtio offload, or no offload at all. In the
>>>>    latter case, it would not be optimal as it implies forwarding between
>>>>    the Vhost PMD and the HW NIC PMD but it would work.
>>>
>>> It is not clear to me the interface map in such system.
>>> From what I understand the container will have virtio-user i/f and the host
>> will have virtio i/f. then the virtio i/f can be programmed to work w/ vDPA or
>> not.
>>> For full emulation I guess you will need to expose the netdev of the fully
>> emulated virtio device to the container?
>>>
>>> Am trying to map when it is beneficial to use this virtio_vdpa PMD and
>> when it is better to use the vendor specific vDPA PMD on top of VF.
>>
>> I think that with above clarification, I made it clear that the goal of this driver
>> is not to replace vendors vDPA drivers (their control path maybe not even be
>> compatible), but instead to provide a generic driver that can be used either
>> within a guest with a para-virtualized Virtio- net device or with HW NIC that
>> fully offloads Virtio (both data and control paths).
> 
> Thanks Maxim, It is clearer now. 
> From what I understand this driver is to be used w/ vDPA when the underlying device is virtio. 
> 
> I can perfectly understand the para-virt ( + nested virtualization / container inside VM) use case. 
> 
> Regarding the fully emulated virtio device on the host (instead of a plain VF) - for me the benefit still not clear - if you have HW that can expose VF why not use VF + vendor specific vDPA driver.

If you need a vendor specific vDPA driver for the VF, then you
definitely want to use the vendor specific driver.

However, if there is a HW or VF that implements the Virtio Spec even for
the control path (i.e. the PCI registers layout), one may be tempted to
do device assignment directly to the guest and use Virtio PMD.
The downside of doing that is it won't support live-migration.

The benefit of using vDPA with virtio vDPA driver in this case is
provide a way to support live-migration (by switch to SW ring processing
and perform dirty pages logging).

> 
> Anyway - for the series,
> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
> 

Thanks!
Maxime

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, back to index

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29  7:59 [dpdk-dev] [PATCH 00/15] Introduce Virtio vDPA driver Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 01/15] vhost: remove vhost kernel header inclusion Maxime Coquelin
2019-09-02  6:03   ` Tiwei Bie
2019-09-03  7:24     ` Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 02/15] vhost: configure vDPA as soon as the device is ready Maxime Coquelin
2019-09-02  8:34   ` Ye Xiaolong
2019-09-02  9:02     ` Wang, Xiao W
2019-09-03  7:34       ` Maxime Coquelin
2019-09-03 10:58         ` Wang, Xiao W
2019-08-29  7:59 ` [dpdk-dev] [PATCH 03/15] net/virtio: move control path fonctions in virtqueue file Maxime Coquelin
2019-09-02  6:05   ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 04/15] net/virtio: add virtio PCI subsystem device ID declaration Maxime Coquelin
2019-09-02  6:14   ` Tiwei Bie
2019-09-03  7:25     ` Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 05/15] net/virtio: save notify bar ID in virtio HW struct Maxime Coquelin
2019-09-02  6:17   ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 06/15] net/virtio: add skeleton for virtio vDPA driver Maxime Coquelin
2019-09-02  6:27   ` Tiwei Bie
2019-09-03  7:25     ` Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 07/15] net/virtio: add vDPA ops to get number of queue Maxime Coquelin
2019-09-02  6:32   ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 08/15] net/virtio: add virtio vDPA op to get features Maxime Coquelin
2019-09-02  6:43   ` Tiwei Bie
2019-09-03  7:27     ` Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 09/15] net/virtio: add virtio vDPA op to get protocol features Maxime Coquelin
2019-09-02  6:46   ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 10/15] net/virtio: add vDPA op to configure and start the device Maxime Coquelin
2019-09-03  5:30   ` Tiwei Bie
2019-09-03  7:40     ` Maxime Coquelin
2019-09-03  8:49       ` Tiwei Bie
2019-09-04  4:06         ` Jason Wang
2019-09-04  6:56           ` Maxime Coquelin
2019-09-05  2:57             ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 11/15] net/virtio: add vDPA op to stop and close " Maxime Coquelin
2019-09-02  7:07   ` Tiwei Bie
2019-09-03  7:30     ` Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 12/15] net/virtio: add vDPA op to set features Maxime Coquelin
2019-08-29  7:59 ` [dpdk-dev] [PATCH 13/15] net/virtio: add vDPA ops to get VFIO FDs Maxime Coquelin
2019-09-03  4:47   ` Tiwei Bie
2019-08-29  7:59 ` [dpdk-dev] [PATCH 14/15] net/virtio: add vDPA op to get notification area Maxime Coquelin
2019-09-03  5:02   ` Tiwei Bie
2019-09-03  7:36     ` Maxime Coquelin
2019-09-03  8:40       ` Tiwei Bie
2019-08-29  8:00 ` [dpdk-dev] [PATCH 15/15] doc: add documentation for Virtio vDPA driver Maxime Coquelin
2019-09-09 11:55 ` [dpdk-dev] [PATCH 00/15] Introduce " Shahaf Shuler
2019-09-10  7:46   ` Maxime Coquelin
2019-09-10 13:44     ` Shahaf Shuler
2019-09-10 13:56       ` Maxime Coquelin
2019-09-11  5:15         ` Shahaf Shuler
2019-09-11  7:15           ` Maxime Coquelin

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox