DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration
@ 2018-11-28  9:45 Xiao Wang
  2018-11-28  9:45 ` [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:45 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

In the previous VDPA implementation we have enabled live migration support
by HW accelerator doing all the stuff, including dirty page logging and
device status report/restore. In this mode VDPA sample daemon and device
driver just takes care of the control path and does not involve in data
path, so there's almost 0 CPU resource usage. This mode requires device
to have dirty page logging capability.

This patch series adds live migration support for devices without logging
capability. VDPA driver could set up a relay thread standing between the
guest and device when live migration happens, this relay intervenes into
the communication between guest virtio driver and physical virtio
accelerator, it helps device to do a vring relay and passingly log dirty
pages. Thus some CPU resource will be consumed in this scenario, percentage
depending on the network throughput.

Some new helpers are added into vhost lib for this VDPA SW fallback:
- rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
  datapath.
- rte_vdpa_relay_avail_ring, to relay the available ring from guest vring
  to mediate vring.
- rte_vdpa_relay_used_ring, to relay the used ring from mediate vring to
  guest vring.

Some existing helpers are also leveraged for SW fallback setup, like VFIO
interrupt configuration, IOMMU table programming, etc.

This patch enables this SW assisted VDPA live migration in ifc driver.
Since ifcvf also supports HW dirty page logging, we add a new devarg
for user to select if the SW mode is used or not.

Xiao Wang (9):
  vhost: provide helper for host notifier ctrl
  vhost: provide helpers for virtio ring relay
  net/ifc: dump debug message for error
  net/ifc: store only registered device instance
  net/ifc: detect if VDPA mode is specified
  net/ifc: add devarg for LM mode
  net/ifc: use lib API for used ring logging
  net/ifc: support SW assisted VDPA live migration
  doc: update ifc NIC document

 doc/guides/nics/ifc.rst                |   7 +
 drivers/net/ifc/base/ifcvf.h           |   1 +
 drivers/net/ifc/ifcvf_vdpa.c           | 463 ++++++++++++++++++++++++++++++---
 lib/librte_vhost/rte_vdpa.h            |  56 ++++
 lib/librte_vhost/rte_vhost_version.map |   3 +
 lib/librte_vhost/vdpa.c                | 173 ++++++++++++
 lib/librte_vhost/vhost.c               |   3 +-
 lib/librte_vhost/vhost.h               |  40 +++
 lib/librte_vhost/vhost_user.c          |   7 +-
 lib/librte_vhost/virtio_net.c          |  39 ---
 10 files changed, 714 insertions(+), 78 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
@ 2018-11-28  9:45 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:45 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

VDPA driver can decide if it needs to enable/disable the EPT mapping,
exposing a API can allow flexibility. A later patch will base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
 lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/vhost.c               |  3 +--
 lib/librte_vhost/vhost_user.c          |  7 +------
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 97a57f182..e844109f3 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
 	return 0;
 }
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index a418da47c..89c5bb6b3 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -11,6 +11,8 @@
  * Device specific vhost lib
  */
 
+#include <stdbool.h>
+
 #include <rte_pci.h>
 #include "rte_vhost.h"
 
@@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
  */
 int __rte_experimental
 rte_vdpa_get_device_num(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable/Disable EPT mapping for a vdpa port.
+ *
+ * @param vid
+ *  vhost device id
+ * @enable
+ *  true for EPT map, false for EPT unmap
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_host_notifier_ctrl(int vid, bool enable);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index ae39b6e21..22302e972 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -83,4 +83,5 @@ EXPERIMENTAL {
 	rte_vhost_crypto_finalize_requests;
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
+	rte_vhost_host_notifier_ctrl;
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70ac6bc9c..e7a60e0b4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid)
 	if (dev == NULL)
 		return;
 
-	vhost_user_host_notifier_ctrl(vid, false);
-
+	vhost_destroy_device_notify(dev);
 	dev->vdpa_dev_id = -1;
 }
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3ea64eba6..5e0da0589 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd)
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"(%d) software relay is used for vDPA, performance may be low.\n",
-				dev->vid);
-		}
 	}
 
 	return 0;
@@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int vhost_user_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-11-28  9:45 ` [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-12-04  6:22   ` Tiwei Bie
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 3/9] net/ifc: dump debug message for error Xiao Wang
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediate virtio ring.

The available ring relay will synchronize the available entries, and
helps to do desc validity checking.

The used ring relay will synchronize the used entries from mediate ring
to guest ring, and helps to do dirty page logging for live migration.

The next patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 lib/librte_vhost/rte_vdpa.h            |  38 ++++++++
 lib/librte_vhost/rte_vhost_version.map |   2 +
 lib/librte_vhost/vdpa.c                | 173 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h               |  40 ++++++++
 lib/librte_vhost/virtio_net.c          |  39 --------
 5 files changed, 253 insertions(+), 39 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index 89c5bb6b3..0c44b9080 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -173,4 +173,42 @@ rte_vdpa_get_device_num(void);
  */
 int __rte_experimental
 rte_vhost_host_notifier_ctrl(int vid, bool enable);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the available ring from guest to mediate ring, help to
+ * check desc validity to protect against malicious guest driver.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param m_vring
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced available entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the used ring from mediate ring to guest, log dirty
+ * page for each Rx buffer used.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param m_vring
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced used entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 22302e972..0ad0fbea2 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -84,4 +84,6 @@ EXPERIMENTAL {
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
 	rte_vhost_host_notifier_ctrl;
+	rte_vdpa_relay_avail_ring;
+	rte_vdpa_relay_used_ring;
 };
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index e7d849ee0..e41117776 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void)
 {
 	return vdpa_device_num;
 }
+
+static int
+invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
+{
+	uint64_t desc_addr, desc_chunck_len;
+
+	while (desc_len) {
+		desc_chunck_len = desc_len;
+		desc_addr = vhost_iova_to_vva(dev, vq,
+				desc_iova,
+				&desc_chunck_len,
+				perm);
+
+		if (!desc_addr)
+			return -1;
+
+		desc_len -= desc_chunck_len;
+		desc_iova += desc_chunck_len;
+	}
+
+	return 0;
+}
+
+int
+rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vring_desc desc;
+	struct vhost_virtqueue *vq;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev)
+		return -1;
+
+	vq = dev->virtqueue[qid];
+	idx = vq->avail->idx;
+	idx_m = m_vring->avail->idx;
+	ret = idx - idx_m;
+
+	while (idx_m != idx) {
+		/* avail entry copy */
+		desc_id = vq->avail->ring[idx_m % vq->size];
+		m_vring->avail->ring[idx_m % vq->size] = desc_id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
+						&dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+					vq->desc[idx].addr, vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* check if the buf addr is within the guest memory */
+		do {
+			desc = desc_ring[desc_id];
+			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
+						VHOST_ACCESS_RW))
+				return -1;
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(!!idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx_m++;
+	}
+
+	m_vring->avail->idx = idx;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(vq) = vq->avail->idx;
+
+	return ret;
+}
+
+int
+rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vhost_virtqueue *vq;
+	struct vring_desc desc;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev)
+		return -1;
+
+	vq = dev->virtqueue[qid];
+	idx = vq->used->idx;
+	idx_m = m_vring->used->idx;
+	ret = idx_m - idx;
+
+	while (idx != idx_m) {
+		/* copy used entry, used ring logging is not covered here */
+		vq->used->ring[idx % vq->size] =
+			m_vring->used->ring[idx % vq->size];
+
+		/* dirty page logging for used ring */
+		vhost_log_used_vring(dev, vq,
+			offsetof(struct vring_used, ring[idx % vq->size]),
+			sizeof(struct vring_used_elem));
+
+		desc_id = vq->used->ring[idx % vq->size].id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
+						&dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+					vq->desc[idx].addr, vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* dirty page logging for Rx buffer */
+		do {
+			desc = desc_ring[desc_id];
+			if (desc.flags & VRING_DESC_F_WRITE)
+				vhost_log_write(dev, desc.addr, desc.len);
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(!!idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx++;
+	}
+
+	vq->used->idx = idx_m;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vring_used_event(m_vring) = m_vring->used->idx;
+
+	return ret;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5218f1b12..2164cd6d9 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
+#include <rte_malloc.h>
 
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
@@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		eventfd_write(vq->callfd, (eventfd_t)1);
 }
 
+static __rte_always_inline void *
+alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_addr, uint64_t desc_len)
+{
+	void *idesc;
+	uint64_t src, dst;
+	uint64_t len, remain = desc_len;
+
+	idesc = rte_malloc(__func__, desc_len, 0);
+	if (unlikely(!idesc))
+		return 0;
+
+	dst = (uint64_t)(uintptr_t)idesc;
+
+	while (remain) {
+		len = remain;
+		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
+				VHOST_ACCESS_RO);
+		if (unlikely(!src || !len)) {
+			rte_free(idesc);
+			return 0;
+		}
+
+		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
+
+		remain -= len;
+		dst += len;
+		desc_addr += len;
+	}
+
+	return idesc;
+}
+
+static __rte_always_inline void
+free_ind_table(void *idesc)
+{
+	rte_free(idesc);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..8c657a101 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
 	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
 }
 
-static __rte_always_inline void *
-alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
-		uint64_t desc_addr, uint64_t desc_len)
-{
-	void *idesc;
-	uint64_t src, dst;
-	uint64_t len, remain = desc_len;
-
-	idesc = rte_malloc(__func__, desc_len, 0);
-	if (unlikely(!idesc))
-		return 0;
-
-	dst = (uint64_t)(uintptr_t)idesc;
-
-	while (remain) {
-		len = remain;
-		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
-				VHOST_ACCESS_RO);
-		if (unlikely(!src || !len)) {
-			rte_free(idesc);
-			return 0;
-		}
-
-		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
-
-		remain -= len;
-		dst += len;
-		desc_addr += len;
-	}
-
-	return idesc;
-}
-
-static __rte_always_inline void
-free_ind_table(void *idesc)
-{
-	rte_free(idesc);
-}
-
 static __rte_always_inline void
 do_flush_shadow_used_ring_split(struct virtio_net *dev,
 			struct vhost_virtqueue *vq,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 3/9] net/ifc: dump debug message for error
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-11-28  9:45 ` [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 4/9] net/ifc: store only registered device instance Xiao Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

Driver probe may fail for different causes, debug message is helpful for
debugging issue.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e844109f3..aacd5f9bf 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -22,7 +22,7 @@
 
 #define DRV_LOG(level, fmt, args...) \
 	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
-		"%s(): " fmt "\n", __func__, ##args)
+		"IFCVF %s(): " fmt "\n", __func__, ##args)
 
 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
@@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->pdev = pci_dev;
 	rte_spinlock_init(&internal->lock);
-	if (ifcvf_vfio_setup(internal) < 0)
-		return -1;
 
-	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0)
-		return -1;
+	if (ifcvf_vfio_setup(internal) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto error;
+	}
+
+	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
+		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
+		goto error;
+	}
 
 	internal->max_queues = IFCVF_MAX_QUEUES;
 	features = ifcvf_get_features(&internal->hw);
@@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
-	if (internal->did < 0)
+	if (internal->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
 		goto error;
+	}
 
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 4/9] net/ifc: store only registered device instance
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (2 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 3/9] net/ifc: dump debug message for error Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang, stable

If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index aacd5f9bf..6fcd50b73 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
-	pthread_mutex_lock(&internal_list_lock);
-	TAILQ_INSERT_TAIL(&internal_list, list, next);
-	pthread_mutex_unlock(&internal_list_lock);
-
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
@@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		goto error;
 	}
 
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 5/9] net/ifc: detect if VDPA mode is specified
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (3 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 4/9] net/ifc: store only registered device instance Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode Xiao Wang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.

So if driver doesn't not find this option, it should quit and let the
bus continue the probe.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6fcd50b73..c0e50354a 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -17,6 +17,8 @@
 #include <rte_vfio.h>
 #include <rte_spinlock.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "base/ifcvf.h"
 
@@ -28,6 +30,13 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_VDPA_MODE		"vdpa"
+
+static const char * const ifcvf_valid_arguments[] = {
+	IFCVF_VDPA_MODE,
+	NULL
+};
+
 static int ifcvf_vdpa_logtype;
 
 struct ifcvf_internal {
@@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = {
 	.get_notify_area = ifcvf_get_notify_area,
 };
 
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
 static int
 ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		struct rte_pci_device *pci_dev)
@@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	uint64_t features;
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
+	int vdpa_mode = 0;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			ifcvf_valid_arguments);
+	if (kvlist == NULL)
+		return 1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
+	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
 	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
 	if (list == NULL)
 		goto error;
@@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
+	rte_kvargs_free(kvlist);
 	return 0;
 
 error:
+	rte_kvargs_free(kvlist);
 	rte_free(list);
 	rte_free(internal);
 	return -1;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (4 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-12-04  6:31   ` Tiwei Bie
  2018-12-12 10:15   ` Alejandro Lucero
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 7/9] net/ifc: use lib API for used ring logging Xiao Wang
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "swlm=1", driver/device will do live migration with a relay thread
dealing with dirty page logging. Without this parameter, device will do
dirty page logging and there's no relay thread consuming CPU resource.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..e9cc8d7bc 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>
 
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif
 
 #define IFCVF_VDPA_MODE		"vdpa"
+#define IFCVF_SW_FALLBACK_LM	"swlm"
 
 static const char * const ifcvf_valid_arguments[] = {
 	IFCVF_VDPA_MODE,
+	IFCVF_SW_FALLBACK_LM,
 	NULL
 };
 
@@ -56,6 +59,7 @@ struct ifcvf_internal {
 	rte_atomic32_t dev_attached;
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
+	bool sw_lm;
 };
 
 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
 	int vdpa_mode = 0;
+	int sw_fallback_lm = 0;
 	struct rte_kvargs *kvlist = NULL;
 	int ret = 0;
 
@@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
+	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+				&open_int, &sw_fallback_lm);
+		if (ret < 0)
+			goto error;
+		internal->sw_lm = sw_fallback_lm ? true : false;
+	} else {
+		internal->sw_lm = false;
+	}
+
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 7/9] net/ifc: use lib API for used ring logging
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (5 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e9cc8d7bc..6c64ac4f7 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -31,6 +31,9 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
 #define IFCVF_VDPA_MODE		"vdpa"
 #define IFCVF_SW_FALLBACK_LM	"swlm"
 
@@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal)
 	return ifcvf_start_hw(&internal->hw);
 }
 
-static void
-ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf)
-{
-	uint32_t i, size;
-	uint64_t pfn;
-
-	pfn = hw->vring[queue].used / PAGE_SIZE;
-	size = hw->vring[queue].size * sizeof(struct vring_used_elem) +
-			sizeof(uint16_t) * 3;
-
-	for (i = 0; i <= size / PAGE_SIZE; i++)
-		__sync_fetch_and_or_8(&log_buf[(pfn + i) / 8],
-				1 << ((pfn + i) % 8));
-}
-
 static void
 vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 {
@@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 	int vid;
 	uint64_t features;
 	uint64_t log_base, log_size;
-	uint8_t *log_buf;
+	uint64_t len;
 
 	vid = internal->vid;
 	ifcvf_stop_hw(hw);
@@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		 * IFCVF marks dirty memory pages for only packet buffer,
 		 * SW helps to mark the used ring as dirty after device stops.
 		 */
-		log_buf = (uint8_t *)(uintptr_t)log_base;
-		for (i = 0; i < hw->nr_vring; i++)
-			ifcvf_used_ring_log(hw, i, log_buf);
+		for (i = 0; i < hw->nr_vring; i++) {
+			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
+			rte_vhost_log_used_vring(vid, i, 0, len);
+		}
 	}
 }
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 8/9] net/ifc: support SW assisted VDPA live migration
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (6 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 7/9] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

In SW assisted live migration mode, driver will stop the device and
setup a mediate virtio ring to relay the communication between the
virtio driver and the VDPA device.

This data path intervention will allow SW to help on guest dirty page
logging for live migration.

This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordinly.

User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/base/ifcvf.h |   1 +
 drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
index f026c70ab..8eb70ae9d 100644
--- a/drivers/net/ifc/base/ifcvf.h
+++ b/drivers/net/ifc/base/ifcvf.h
@@ -50,6 +50,7 @@
 #define IFCVF_LM_ENABLE_VF		0x1
 #define IFCVF_LM_ENABLE_PF		0x3
 #define IFCVF_LOG_BASE			0x100000000000
+#define IFCVF_MEDIATE_VRING		0x200000000000
 
 #define IFCVF_32_BIT_MASK		0xffffffff
 
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6c64ac4f7..875a0009d 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -63,6 +63,9 @@ struct ifcvf_internal {
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
 	bool sw_lm;
+	bool sw_fallback_running;
+	/* mediated vring for sw fallback */
+	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
 };
 
 struct internal_list {
@@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
 				hw->vring[i].last_used_idx);
 
+	if (internal->sw_lm)
+		return;
+
 	rte_vhost_get_negotiated_features(vid, &features);
 	if (RTE_VHOST_NEED_LOG(features)) {
 		ifcvf_disable_logging(hw);
@@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
 	return ret;
 }
 
+static int
+m_ifcvf_start(struct ifcvf_internal *internal)
+{
+	struct ifcvf_hw *hw = &internal->hw;
+	uint32_t i, nr_vring;
+	int vid, ret;
+	struct rte_vhost_vring vq;
+	void *vring_buf;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size;
+	uint64_t gpa;
+
+	vid = internal->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+	rte_vhost_get_negotiated_features(vid, &hw->req_features);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
+		vring_init(&internal->m_vring[i], vq.size, vring_buf,
+				PAGE_SIZE);
+
+		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
+		if (ret < 0) {
+			DRV_LOG(ERR, "mediate vring DMA map failed.");
+			goto error;
+		}
+
+		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
+		if (gpa == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			return -1;
+		}
+		hw->vring[i].desc = gpa;
+
+		hw->vring[i].avail = m_vring_iova +
+			(char *)internal->m_vring[i].avail -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].used = m_vring_iova +
+			(char *)internal->m_vring[i].used -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].size = vq.size;
+
+		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
+				&hw->vring[i].last_used_idx);
+
+		m_vring_iova += size;
+	}
+	hw->nr_vring = nr_vring;
+
+	return ifcvf_start_hw(&internal->hw);
+
+error:
+	for (i = 0; i < nr_vring; i++)
+		if (internal->m_vring[i].desc)
+			rte_free(internal->m_vring[i].desc);
+
+	return -1;
+}
+
+static int
+m_ifcvf_stop(struct ifcvf_internal *internal)
+{
+	int vid;
+	uint32_t i;
+	struct rte_vhost_vring vq;
+	struct ifcvf_hw *hw = &internal->hw;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size, len;
+
+	vid = internal->vid;
+	ifcvf_stop_hw(hw);
+
+	for (i = 0; i < hw->nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+		len = IFCVF_USED_RING_LEN(vq.size);
+		rte_vhost_log_used_vring(vid, i, 0, len);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
+			m_vring_iova, size);
+
+		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
+				hw->vring[i].last_used_idx);
+		rte_free(internal->m_vring[i].desc);
+		m_vring_iova += size;
+	}
+
+	return 0;
+}
+
+static int
+m_enable_vfio_intr(struct ifcvf_internal *internal)
+{
+	uint32_t nr_vring;
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+	int ret;
+
+	nr_vring = rte_vhost_get_vring_num(internal->vid);
+
+	ret = rte_intr_efd_enable(intr_handle, nr_vring);
+	if (ret)
+		return -1;
+
+	ret = rte_intr_enable(intr_handle);
+	if (ret)
+		return -1;
+
+	return 0;
+}
+
+static void
+m_disable_vfio_intr(struct ifcvf_internal *internal)
+{
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+
+	rte_intr_efd_disable(intr_handle);
+	rte_intr_disable(intr_handle);
+}
+
+static void
+update_avail_ring(struct ifcvf_internal *internal, int qid)
+{
+	rte_vdpa_relay_avail_ring(internal->vid, qid, &internal->m_vring[qid]);
+	ifcvf_notify_queue(&internal->hw, qid);
+}
+
+static void
+update_used_ring(struct ifcvf_internal *internal, int qid)
+{
+	rte_vdpa_relay_used_ring(internal->vid, qid, &internal->m_vring[qid]);
+	rte_vhost_vring_call(internal->vid, qid);
+}
+
+static void *
+vring_relay(void *arg)
+{
+	int i, vid, epfd, fd, nfds;
+	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
+	struct rte_vhost_vring vring;
+	struct rte_intr_handle *intr_handle;
+	uint32_t qid, q_num;
+	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
+	struct epoll_event ev;
+	int nbytes;
+	uint64_t buf;
+
+	vid = internal->vid;
+	q_num = rte_vhost_get_vring_num(vid);
+	/* prepare the mediate vring */
+	for (qid = 0; qid < q_num; qid++) {
+		rte_vhost_get_vring_base(vid, qid,
+				&internal->m_vring[qid].avail->idx,
+				&internal->m_vring[qid].used->idx);
+		rte_vdpa_relay_avail_ring(vid, qid, &internal->m_vring[qid]);
+	}
+
+	/* add notify fd and interrupt fd to epoll */
+	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
+	if (epfd < 0) {
+		DRV_LOG(ERR, "failed to create epoll instance.");
+		return NULL;
+	}
+	internal->epfd = epfd;
+
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		rte_vhost_get_vhost_vring(vid, qid, &vring);
+		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	intr_handle = &internal->pdev->intr_handle;
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		ev.data.u64 = 1 | qid << 1 |
+			(uint64_t)intr_handle->efds[qid] << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
+				< 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	/* start relay with a first kick */
+	for (qid = 0; qid < q_num; qid++)
+		ifcvf_notify_queue(&internal->hw, qid);
+
+	/* listen to the events and react accordingly */
+	for (;;) {
+		nfds = epoll_wait(epfd, events, q_num * 2, -1);
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			DRV_LOG(ERR, "epoll_wait return fail\n");
+			return NULL;
+		}
+
+		for (i = 0; i < nfds; i++) {
+			fd = (uint32_t)(events[i].data.u64 >> 32);
+			do {
+				nbytes = read(fd, &buf, 8);
+				if (nbytes < 0) {
+					if (errno == EINTR ||
+					    errno == EWOULDBLOCK ||
+					    errno == EAGAIN)
+						continue;
+					DRV_LOG(INFO, "Error reading "
+						"kickfd: %s",
+						strerror(errno));
+				}
+				break;
+			} while (1);
+
+			qid = events[i].data.u32 >> 1;
+
+			if (events[i].data.u32 & 1)
+				update_used_ring(internal, qid);
+			else
+				update_avail_ring(internal, qid);
+		}
+	}
+
+	return NULL;
+}
+
+static int
+setup_vring_relay(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->tid, NULL, vring_relay,
+			(void *)internal);
+	if (ret) {
+		DRV_LOG(ERR, "failed to create ring relay pthread.");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+unset_vring_relay(struct ifcvf_internal *internal)
+{
+	void *status;
+
+	if (internal->tid) {
+		pthread_cancel(internal->tid);
+		pthread_join(internal->tid, &status);
+	}
+	internal->tid = 0;
+
+	if (internal->epfd >= 0)
+		close(internal->epfd);
+	internal->epfd = -1;
+
+	return 0;
+}
+
+static int
+ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	/* stop the direct IO data path */
+	unset_notify_relay(internal);
+	vdpa_ifcvf_stop(internal);
+	vdpa_disable_vfio_intr(internal);
+
+	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
+	if (ret && ret != -ENOTSUP)
+		goto error;
+
+	/* set up interrupt for interrupt relay */
+	ret = m_enable_vfio_intr(internal);
+	if (ret)
+		goto unmap;
+
+	/* config the VF */
+	ret = m_ifcvf_start(internal);
+	if (ret)
+		goto unset_intr;
+
+	/* set up vring relay thread */
+	ret = setup_vring_relay(internal);
+	if (ret)
+		goto stop_vf;
+
+	internal->sw_fallback_running = true;
+
+	return 0;
+
+stop_vf:
+	m_ifcvf_stop(internal);
+unset_intr:
+	m_disable_vfio_intr(internal);
+unmap:
+	ifcvf_dma_map(internal, 0);
+error:
+	return -1;
+}
+
 static int
 ifcvf_dev_config(int vid)
 {
@@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
 	}
 
 	internal = list->internal;
-	rte_atomic32_set(&internal->dev_attached, 0);
-	update_datapath(internal);
+
+	if (internal->sw_fallback_running) {
+		/* unset ring relay */
+		unset_vring_relay(internal);
+
+		/* reset VF */
+		m_ifcvf_stop(internal);
+
+		/* remove interrupt setting */
+		m_disable_vfio_intr(internal);
+
+		/* unset DMA map for guest memory */
+		ifcvf_dma_map(internal, 0);
+
+		internal->sw_fallback_running = false;
+	} else {
+		rte_atomic32_set(&internal->dev_attached, 0);
+		update_datapath(internal);
+	}
 
 	return 0;
 }
@@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
 	internal = list->internal;
 	rte_vhost_get_negotiated_features(vid, &features);
 
-	if (RTE_VHOST_NEED_LOG(features)) {
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+
+	if (internal->sw_lm) {
+		ifcvf_sw_fallback_switchover(internal);
+	} else {
 		rte_vhost_get_log_base(vid, &log_base, &log_size);
 		rte_vfio_container_dma_map(internal->vfio_container_fd,
 				log_base, IFCVF_LOG_BASE, log_size);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 9/9] doc: update ifc NIC document
  2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
                   ` (7 preceding siblings ...)
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-11-28  9:46 ` Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-11-28  9:46 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin; +Cc: dev, zhihong.wang, xiaolong.ye, Xiao Wang

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 doc/guides/nics/ifc.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
index 48f9adf1d..a16f2982f 100644
--- a/doc/guides/nics/ifc.rst
+++ b/doc/guides/nics/ifc.rst
@@ -39,6 +39,12 @@ the driver probe a new container is created for this device, with this
 container vDPA driver can program DMA remapping table with the VM's memory
 region information.
 
+The device argument "swlm=1" will configure the driver into SW assisted live
+migration mode. In this mode, the driver will set up a SW relay thread when LM
+happens, this thread will help device to log dirty pages. Thus this mode does
+not require HW to implement a dirty page logging function block, but will
+consume some percentage of CPU resource depending on the network throughput.
+
 Key IFCVF vDPA driver ops
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -70,6 +76,7 @@ Features
 Features of the IFCVF driver are:
 
 - Compatibility with virtio 0.95 and 1.0.
+- SW assisted vDPA for live migration.
 
 
 Prerequisites
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-04  6:22   ` Tiwei Bie
  2018-12-12  6:51     ` Wang, Xiao W
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
  1 sibling, 1 reply; 86+ messages in thread
From: Tiwei Bie @ 2018-12-04  6:22 UTC (permalink / raw)
  To: Xiao Wang; +Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye

On Wed, Nov 28, 2018 at 05:46:00PM +0800, Xiao Wang wrote:
[...]
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Synchronize the available ring from guest to mediate ring, help to
> + * check desc validity to protect against malicious guest driver.
> + *
> + * @param vid
> + *  vhost device id
> + * @param qid
> + *  vhost queue id
> + * @param m_vring
> + *  mediate virtio ring pointer
> + * @return
> + *  number of synced available entries on success, -1 on failure
> + */
> +int __rte_experimental
> +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Synchronize the used ring from mediate ring to guest, log dirty
> + * page for each Rx buffer used.
> + *
> + * @param vid
> + *  vhost device id
> + * @param qid
> + *  vhost queue id
> + * @param m_vring
> + *  mediate virtio ring pointer
> + * @return
> + *  number of synced used entries on success, -1 on failure
> + */
> +int __rte_experimental
> +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring);

Above APIs are split ring specific. We also need to take
packed ring into consideration.

>  #endif /* _RTE_VDPA_H_ */
[...]
> diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
> index e7d849ee0..e41117776 100644
> --- a/lib/librte_vhost/vdpa.c
> +++ b/lib/librte_vhost/vdpa.c
> @@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void)
>  {
>  	return vdpa_device_num;
>  }
> +
> +static int
> +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
> +		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
> +{
> +	uint64_t desc_addr, desc_chunck_len;
> +
> +	while (desc_len) {
> +		desc_chunck_len = desc_len;
> +		desc_addr = vhost_iova_to_vva(dev, vq,
> +				desc_iova,
> +				&desc_chunck_len,
> +				perm);
> +
> +		if (!desc_addr)
> +			return -1;
> +
> +		desc_len -= desc_chunck_len;
> +		desc_iova += desc_chunck_len;
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring)
> +{
> +	struct virtio_net *dev = get_device(vid);
> +	uint16_t idx, idx_m, desc_id;
> +	struct vring_desc desc;
> +	struct vhost_virtqueue *vq;
> +	struct vring_desc *desc_ring;
> +	struct vring_desc *idesc = NULL;
> +	uint64_t dlen;
> +	int ret;
> +
> +	if (!dev)
> +		return -1;
> +
> +	vq = dev->virtqueue[qid];

Better to also validate qid.

> +	idx = vq->avail->idx;
> +	idx_m = m_vring->avail->idx;
> +	ret = idx - idx_m;

Need to cast (idx - idx_m) to uint16_t.

> +
> +	while (idx_m != idx) {
> +		/* avail entry copy */
> +		desc_id = vq->avail->ring[idx_m % vq->size];

idx_m & (vq->size - 1) should be faster.

> +		m_vring->avail->ring[idx_m % vq->size] = desc_id;
> +		desc_ring = vq->desc;
> +
> +		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
> +			dlen = vq->desc[desc_id].len;
> +			desc_ring = (struct vring_desc *)(uintptr_t)
> +			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,

The indent needs to be fixed.

> +						&dlen,
> +						VHOST_ACCESS_RO);
> +			if (unlikely(!desc_ring))
> +				return -1;
> +
> +			if (unlikely(dlen < vq->desc[idx].len)) {
> +				idesc = alloc_copy_ind_table(dev, vq,
> +					vq->desc[idx].addr, vq->desc[idx].len);
> +				if (unlikely(!idesc))
> +					return -1;
> +
> +				desc_ring = idesc;
> +			}
> +
> +			desc_id = 0;
> +		}
> +
> +		/* check if the buf addr is within the guest memory */
> +		do {
> +			desc = desc_ring[desc_id];
> +			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
> +						VHOST_ACCESS_RW))

Should check with < 0, otherwise should return bool.

We may just have RO access.

> +				return -1;

The memory allocated for idesc if any will leak in this case.

> +			desc_id = desc.next;
> +		} while (desc.flags & VRING_DESC_F_NEXT);
> +
> +		if (unlikely(!!idesc)) {

The !! isn't needed.

> +			free_ind_table(idesc);
> +			idesc = NULL;
> +		}
> +
> +		idx_m++;
> +	}
> +

Barrier is needed here.

> +	m_vring->avail->idx = idx;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vhost_avail_event(vq) = vq->avail->idx;

Need to use idx instead of vq->avail->idx which may
have already been changed by driver.

> +
> +	return ret;
> +}
> +
> +int
> +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring)
> +{
> +	struct virtio_net *dev = get_device(vid);
> +	uint16_t idx, idx_m, desc_id;
> +	struct vhost_virtqueue *vq;
> +	struct vring_desc desc;
> +	struct vring_desc *desc_ring;
> +	struct vring_desc *idesc = NULL;
> +	uint64_t dlen;
> +	int ret;
> +
> +	if (!dev)
> +		return -1;
> +
> +	vq = dev->virtqueue[qid];

Better to also validate qid.

> +	idx = vq->used->idx;
> +	idx_m = m_vring->used->idx;
> +	ret = idx_m - idx;

Need to cast (idx_m - idx) to uint16_t.

> +
> +	while (idx != idx_m) {
> +		/* copy used entry, used ring logging is not covered here */

The used ring logging has been covered here by the following call
to vhost_log_used_vring() after used ring is changed.

> +		vq->used->ring[idx % vq->size] =

idx & (vq->size - 1) should be faster.

> +			m_vring->used->ring[idx % vq->size];
> +
> +		/* dirty page logging for used ring */
> +		vhost_log_used_vring(dev, vq,
> +			offsetof(struct vring_used, ring[idx % vq->size]),
> +			sizeof(struct vring_used_elem));
> +
> +		desc_id = vq->used->ring[idx % vq->size].id;
> +		desc_ring = vq->desc;
> +
> +		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
> +			dlen = vq->desc[desc_id].len;
> +			desc_ring = (struct vring_desc *)(uintptr_t)
> +			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,

The indent needs to be fixed.

> +						&dlen,
> +						VHOST_ACCESS_RO);
> +			if (unlikely(!desc_ring))
> +				return -1;
> +
> +			if (unlikely(dlen < vq->desc[idx].len)) {
> +				idesc = alloc_copy_ind_table(dev, vq,
> +					vq->desc[idx].addr, vq->desc[idx].len);
> +				if (unlikely(!idesc))
> +					return -1;
> +
> +				desc_ring = idesc;
> +			}
> +
> +			desc_id = 0;
> +		}
> +
> +		/* dirty page logging for Rx buffer */

Rx is for net, this API isn't net specific.

> +		do {
> +			desc = desc_ring[desc_id];
> +			if (desc.flags & VRING_DESC_F_WRITE)
> +				vhost_log_write(dev, desc.addr, desc.len);
> +			desc_id = desc.next;
> +		} while (desc.flags & VRING_DESC_F_NEXT);
> +
> +		if (unlikely(!!idesc)) {

The !! isn't needed.

> +			free_ind_table(idesc);
> +			idesc = NULL;
> +		}
> +
> +		idx++;
> +	}
> +

Barrier is needed here.

> +	vq->used->idx = idx_m;
> +
> +	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> +		vring_used_event(m_vring) = m_vring->used->idx;
> +
> +	return ret;
> +}
[...]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-04  6:31   ` Tiwei Bie
  2018-12-12  6:53     ` Wang, Xiao W
  2018-12-12 10:15   ` Alejandro Lucero
  1 sibling, 1 reply; 86+ messages in thread
From: Tiwei Bie @ 2018-12-04  6:31 UTC (permalink / raw)
  To: Xiao Wang; +Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye

On Wed, Nov 28, 2018 at 05:46:04PM +0800, Xiao Wang wrote:
[...]
> @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>  	struct ifcvf_internal *internal = NULL;
>  	struct internal_list *list = NULL;
>  	int vdpa_mode = 0;
> +	int sw_fallback_lm = 0;
>  	struct rte_kvargs *kvlist = NULL;
>  	int ret = 0;
>  
> @@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>  	internal->dev_addr.type = PCI_ADDR;
>  	list->internal = internal;
>  
> +	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> +		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> +				&open_int, &sw_fallback_lm);
> +		if (ret < 0)
> +			goto error;
> +		internal->sw_lm = sw_fallback_lm ? true : false;
> +	} else {
> +		internal->sw_lm = false;
> +	}

Something like this would be better:

	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
				&open_int, &sw_fallback_lm);
		if (ret < 0)
			goto error;
	}

	internal->sw_lm = sw_fallback_lm;


>  	internal->did = rte_vdpa_register_device(&internal->dev_addr,
>  				&ifcvf_ops);
>  	if (internal->did < 0) {
> -- 
> 2.15.1
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay
  2018-12-04  6:22   ` Tiwei Bie
@ 2018-12-12  6:51     ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-12  6:51 UTC (permalink / raw)
  To: Bie, Tiwei; +Cc: maxime.coquelin, dev, Wang, Zhihong, Ye, Xiaolong

Hi,

> -----Original Message-----
> From: Bie, Tiwei
> Sent: Monday, December 3, 2018 10:23 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>
> Cc: maxime.coquelin@redhat.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH 2/9] vhost: provide helpers for virtio ring relay
> 
> On Wed, Nov 28, 2018 at 05:46:00PM +0800, Xiao Wang wrote:
> [...]
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Synchronize the available ring from guest to mediate ring, help to
> > + * check desc validity to protect against malicious guest driver.
> > + *
> > + * @param vid
> > + *  vhost device id
> > + * @param qid
> > + *  vhost queue id
> > + * @param m_vring
> > + *  mediate virtio ring pointer
> > + * @return
> > + *  number of synced available entries on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Synchronize the used ring from mediate ring to guest, log dirty
> > + * page for each Rx buffer used.
> > + *
> > + * @param vid
> > + *  vhost device id
> > + * @param qid
> > + *  vhost queue id
> > + * @param m_vring
> > + *  mediate virtio ring pointer
> > + * @return
> > + *  number of synced used entries on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring);
> 
> Above APIs are split ring specific. We also need to take
> packed ring into consideration.

After some study on the current packed ring description, several ideas:
1. These APIs are used as helpers to setup a mediate relay layer to help do dirty page logging, we may not need
 this kind of ring relay for packed ring at all. The target of a mediate SW layer is to help device do dirty page
 logging, so this SW-assisted VDPA tries to find a way to intercept the frontend-backend communication, as you
 can see in this patch set, SW captures the device interrupt and then parse the vring and log dirty page
 afterwards. We set up this mediate vring to make sure the relay SW can intercept the device interrupt, as you
 know, this way we can control the mediate vring's interrupt suppression structure.

2.One new point about the packed ring is that it separates out the event suppression structure from the
description ring. So in this case, we can just set up a mediate event suppression structure to intercept event
 notification.

BTW, I find one troublesome point about the packed ring is that it's hard for a mediate SW to quickly handle the
 "buffer id", guest virtio driver understands this id well, it keeps some internal info about each id, e.g. chain list
 length, but the relay SW has to parse the packed ring again, which is not efficient.

3. In the split vring, relay SW reuses the guest desc vring, and desc is not writed by DMA, so no log for the desc.
 But in the packed vring, desc is writed by DMA, desc ring's logging is a new thing.
Packed ring is quite different, it could be a very different mechanism, other than following a vring relay API. Also
 from testing point of view, if we come out with a new efficient implementation for packed ring VDPA, it's hard to
 test it with HW. Testing need a HW supporting packed ring DMA and the get_vring_base/set_vring_base
 interface.

> 
> >  #endif /* _RTE_VDPA_H_ */
> [...]
> > diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
> > index e7d849ee0..e41117776 100644
> > --- a/lib/librte_vhost/vdpa.c
> > +++ b/lib/librte_vhost/vdpa.c
> > @@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void)
> >  {
> >  	return vdpa_device_num;
> >  }
> > +
> > +static int
> > +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > +		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
> > +{
> > +	uint64_t desc_addr, desc_chunck_len;
> > +
> > +	while (desc_len) {
> > +		desc_chunck_len = desc_len;
> > +		desc_addr = vhost_iova_to_vva(dev, vq,
> > +				desc_iova,
> > +				&desc_chunck_len,
> > +				perm);
> > +
> > +		if (!desc_addr)
> > +			return -1;
> > +
> > +		desc_len -= desc_chunck_len;
> > +		desc_iova += desc_chunck_len;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int
> > +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring)
> > +{
> > +	struct virtio_net *dev = get_device(vid);
> > +	uint16_t idx, idx_m, desc_id;
> > +	struct vring_desc desc;
> > +	struct vhost_virtqueue *vq;
> > +	struct vring_desc *desc_ring;
> > +	struct vring_desc *idesc = NULL;
> > +	uint64_t dlen;
> > +	int ret;
> > +
> > +	if (!dev)
> > +		return -1;
> > +
> > +	vq = dev->virtqueue[qid];
> 
> Better to also validate qid.
> 
> > +	idx = vq->avail->idx;
> > +	idx_m = m_vring->avail->idx;
> > +	ret = idx - idx_m;
> 
> Need to cast (idx - idx_m) to uint16_t.
> 
> > +
> > +	while (idx_m != idx) {
> > +		/* avail entry copy */
> > +		desc_id = vq->avail->ring[idx_m % vq->size];
> 
> idx_m & (vq->size - 1) should be faster.
> 
> > +		m_vring->avail->ring[idx_m % vq->size] = desc_id;
> > +		desc_ring = vq->desc;
> > +
> > +		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
> > +			dlen = vq->desc[desc_id].len;
> > +			desc_ring = (struct vring_desc *)(uintptr_t)
> > +			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
> 
> The indent needs to be fixed.
> 
> > +						&dlen,
> > +						VHOST_ACCESS_RO);
> > +			if (unlikely(!desc_ring))
> > +				return -1;
> > +
> > +			if (unlikely(dlen < vq->desc[idx].len)) {
> > +				idesc = alloc_copy_ind_table(dev, vq,
> > +					vq->desc[idx].addr, vq->desc[idx].len);
> > +				if (unlikely(!idesc))
> > +					return -1;
> > +
> > +				desc_ring = idesc;
> > +			}
> > +
> > +			desc_id = 0;
> > +		}
> > +
> > +		/* check if the buf addr is within the guest memory */
> > +		do {
> > +			desc = desc_ring[desc_id];
> > +			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
> > +						VHOST_ACCESS_RW))
> 
> Should check with < 0, otherwise should return bool.
> 
> We may just have RO access.

The desc may refers to a transmit buffer as well as receive buffer. Agree on the comments and nice catches elsewhere above, will send new version.

[...]

BRs,
Xiao

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode
  2018-12-04  6:31   ` Tiwei Bie
@ 2018-12-12  6:53     ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-12  6:53 UTC (permalink / raw)
  To: Bie, Tiwei; +Cc: maxime.coquelin, dev, Wang, Zhihong, Ye, Xiaolong

Hi,

> -----Original Message-----
> From: Bie, Tiwei
> Sent: Monday, December 3, 2018 10:32 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>
> Cc: maxime.coquelin@redhat.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH 6/9] net/ifc: add devarg for LM mode
> 
> On Wed, Nov 28, 2018 at 05:46:04PM +0800, Xiao Wang wrote:
> [...]
> > @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
> >  	struct ifcvf_internal *internal = NULL;
> >  	struct internal_list *list = NULL;
> >  	int vdpa_mode = 0;
> > +	int sw_fallback_lm = 0;
> >  	struct rte_kvargs *kvlist = NULL;
> >  	int ret = 0;
> >
> > @@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
> >  	internal->dev_addr.type = PCI_ADDR;
> >  	list->internal = internal;
> >
> > +	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> > +		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> > +				&open_int, &sw_fallback_lm);
> > +		if (ret < 0)
> > +			goto error;
> > +		internal->sw_lm = sw_fallback_lm ? true : false;
> > +	} else {
> > +		internal->sw_lm = false;
> > +	}
> 
> Something like this would be better:
> 
> 	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> 		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> 				&open_int, &sw_fallback_lm);
> 		if (ret < 0)
> 			goto error;
> 	}
> 
> 	internal->sw_lm = sw_fallback_lm;
> 

Yeah, shorter lines of code, will have an update.

BRs,
Xiao

> 
> >  	internal->did = rte_vdpa_register_device(&internal->dev_addr,
> >  				&ifcvf_ops);
> >  	if (internal->did < 0) {
> > --
> > 2.15.1
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode Xiao Wang
  2018-12-04  6:31   ` Tiwei Bie
@ 2018-12-12 10:15   ` Alejandro Lucero
  2018-12-12 10:23     ` Wang, Xiao W
  1 sibling, 1 reply; 86+ messages in thread
From: Alejandro Lucero @ 2018-12-12 10:15 UTC (permalink / raw)
  To: xiao.w.wang; +Cc: tiwei.bie, Maxime Coquelin, dev, zhihong.wang, xiaolong.ye

On Wed, Nov 28, 2018 at 9:56 AM Xiao Wang <xiao.w.wang@intel.com> wrote:

> This patch series enables a new method for live migration, i.e. software
> assisted live migration. This patch provides a device argument for user
> to choose the methold.
>
> When "swlm=1", driver/device will do live migration with a relay thread
> dealing with dirty page logging. Without this parameter, device will do
> dirty page logging and there's no relay thread consuming CPU resource.
>
>
I'm a bit confused with this mode. If it is a relay thread doing the dirty
page logging, does it mean that the datapath is through the relay thread
and not between the VM and the vdpa device?


> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>  drivers/net/ifc/ifcvf_vdpa.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> index c0e50354a..e9cc8d7bc 100644
> --- a/drivers/net/ifc/ifcvf_vdpa.c
> +++ b/drivers/net/ifc/ifcvf_vdpa.c
> @@ -8,6 +8,7 @@
>  #include <sys/ioctl.h>
>  #include <sys/epoll.h>
>  #include <linux/virtio_net.h>
> +#include <stdbool.h>
>
>  #include <rte_malloc.h>
>  #include <rte_memory.h>
> @@ -31,9 +32,11 @@
>  #endif
>
>  #define IFCVF_VDPA_MODE                "vdpa"
> +#define IFCVF_SW_FALLBACK_LM   "swlm"
>
>  static const char * const ifcvf_valid_arguments[] = {
>         IFCVF_VDPA_MODE,
> +       IFCVF_SW_FALLBACK_LM,
>         NULL
>  };
>
> @@ -56,6 +59,7 @@ struct ifcvf_internal {
>         rte_atomic32_t dev_attached;
>         rte_atomic32_t running;
>         rte_spinlock_t lock;
> +       bool sw_lm;
>  };
>
>  struct internal_list {
> @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>         struct ifcvf_internal *internal = NULL;
>         struct internal_list *list = NULL;
>         int vdpa_mode = 0;
> +       int sw_fallback_lm = 0;
>         struct rte_kvargs *kvlist = NULL;
>         int ret = 0;
>
> @@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>         internal->dev_addr.type = PCI_ADDR;
>         list->internal = internal;
>
> +       if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
> +               ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
> +                               &open_int, &sw_fallback_lm);
> +               if (ret < 0)
> +                       goto error;
> +               internal->sw_lm = sw_fallback_lm ? true : false;
> +       } else {
> +               internal->sw_lm = false;
> +       }
> +
>         internal->did = rte_vdpa_register_device(&internal->dev_addr,
>                                 &ifcvf_ops);
>         if (internal->did < 0) {
> --
> 2.15.1
>
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode
  2018-12-12 10:15   ` Alejandro Lucero
@ 2018-12-12 10:23     ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-12 10:23 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Bie, Tiwei, Maxime Coquelin, dev, Wang, Zhihong, Ye, Xiaolong

Hi Alejandro,

Yes, this mode datapath is through the relay thread when LM happens, it’s not the direct interaction between VM and vdpa device.

BRs,
Xiao

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Wednesday, December 12, 2018 2:15 AM
To: Wang, Xiao W <xiao.w.wang@intel.com>
Cc: Bie, Tiwei <tiwei.bie@intel.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; dev <dev@dpdk.org>; Wang, Zhihong <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
Subject: Re: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode


On Wed, Nov 28, 2018 at 9:56 AM Xiao Wang <xiao.w.wang@intel.com<mailto:xiao.w.wang@intel.com>> wrote:
This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "swlm=1", driver/device will do live migration with a relay thread
dealing with dirty page logging. Without this parameter, device will do
dirty page logging and there's no relay thread consuming CPU resource.

I'm a bit confused with this mode. If it is a relay thread doing the dirty page logging, does it mean that the datapath is through the relay thread and not between the VM and the vdpa device?

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com<mailto:xiao.w.wang@intel.com>>
---
 drivers/net/ifc/ifcvf_vdpa.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..e9cc8d7bc 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>

 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif

 #define IFCVF_VDPA_MODE                "vdpa"
+#define IFCVF_SW_FALLBACK_LM   "swlm"

 static const char * const ifcvf_valid_arguments[] = {
        IFCVF_VDPA_MODE,
+       IFCVF_SW_FALLBACK_LM,
        NULL
 };

@@ -56,6 +59,7 @@ struct ifcvf_internal {
        rte_atomic32_t dev_attached;
        rte_atomic32_t running;
        rte_spinlock_t lock;
+       bool sw_lm;
 };

 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
        struct ifcvf_internal *internal = NULL;
        struct internal_list *list = NULL;
        int vdpa_mode = 0;
+       int sw_fallback_lm = 0;
        struct rte_kvargs *kvlist = NULL;
        int ret = 0;

@@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
        internal->dev_addr.type = PCI_ADDR;
        list->internal = internal;

+       if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+               ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+                               &open_int, &sw_fallback_lm);
+               if (ret < 0)
+                       goto error;
+               internal->sw_lm = sw_fallback_lm ? true : false;
+       } else {
+               internal->sw_lm = false;
+       }
+
        internal->did = rte_vdpa_register_device(&internal->dev_addr,
                                &ifcvf_ops);
        if (internal->did < 0) {
--
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration
  2018-11-28  9:46 ` [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
  2018-12-04  6:22   ` Tiwei Bie
@ 2018-12-13  1:10   ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
                       ` (8 more replies)
  1 sibling, 9 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In the previous VDPA implementation we have enabled live migration support
by HW accelerator doing all the stuff, including dirty page logging and
device status report/restore. In this mode VDPA sample daemon and device
driver just takes care of the control path and does not involve in data
path, so there's almost 0 CPU resource usage. This mode requires device
to have dirty page logging capability.

This patch series adds live migration support for devices without logging
capability. VDPA driver could set up a relay thread standing between the
guest and device when live migration happens, this relay intervenes into
the communication between guest virtio driver and physical virtio
accelerator, it helps device to do a vring relay and passingly log dirty
pages. Thus some CPU resource will be consumed in this scenario, percentage
depending on the network throughput.

Some new helpers are added into vhost lib for this VDPA SW fallback:
- rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
  datapath.
- rte_vdpa_relay_avail_ring, to relay the available ring from guest vring
  to mediate vring.
- rte_vdpa_relay_used_ring, to relay the used ring from mediate vring to
  guest vring.

Some existing helpers are also leveraged for SW fallback setup, like VFIO
interrupt configuration, IOMMU table programming, etc.

This patch enables this SW assisted VDPA live migration in ifc driver.
Since ifcvf also supports HW dirty page logging, we add a new devarg
for user to select if the SW mode is used or not.

v2:
* Reword the vdpa host notifier control API comment.
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
* Add release note update.

Xiao Wang (9):
  vhost: provide helper for host notifier ctrl
  vhost: provide helpers for virtio ring relay
  net/ifc: dump debug message for error
  net/ifc: store only registered device instance
  net/ifc: detect if VDPA mode is specified
  net/ifc: add devarg for LM mode
  net/ifc: use lib API for used ring logging
  net/ifc: support SW assisted VDPA live migration
  doc: update ifc NIC document

 doc/guides/nics/ifc.rst                |   7 +
 doc/guides/rel_notes/release_19_02.rst |   5 +
 drivers/net/ifc/base/ifcvf.h           |   1 +
 drivers/net/ifc/ifcvf_vdpa.c           | 461 ++++++++++++++++++++++++++++++---
 lib/librte_vhost/rte_vdpa.h            |  56 ++++
 lib/librte_vhost/rte_vhost_version.map |   3 +
 lib/librte_vhost/vdpa.c                | 187 +++++++++++++
 lib/librte_vhost/vhost.c               |   3 +-
 lib/librte_vhost/vhost.h               |  40 +++
 lib/librte_vhost/vhost_user.c          |   7 +-
 lib/librte_vhost/virtio_net.c          |  39 ---
 11 files changed, 731 insertions(+), 78 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Reword the vdpa host notifier control API comment.
---
 drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
 lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/vhost.c               |  3 +--
 lib/librte_vhost/vhost_user.c          |  7 +------
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 97a57f182..e844109f3 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
 	return 0;
 }
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index a418da47c..fff657391 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -11,6 +11,8 @@
  * Device specific vhost lib
  */
 
+#include <stdbool.h>
+
 #include <rte_pci.h>
 #include "rte_vhost.h"
 
@@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
  */
 int __rte_experimental
 rte_vdpa_get_device_num(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable/Disable host notifier mapping for a vdpa port.
+ *
+ * @param vid
+ *  vhost device id
+ * @enable
+ *  true for host notifier map, false for host notifier unmap
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_host_notifier_ctrl(int vid, bool enable);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index ae39b6e21..22302e972 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -83,4 +83,5 @@ EXPERIMENTAL {
 	rte_vhost_crypto_finalize_requests;
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
+	rte_vhost_host_notifier_ctrl;
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70ac6bc9c..e7a60e0b4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid)
 	if (dev == NULL)
 		return;
 
-	vhost_user_host_notifier_ctrl(vid, false);
-
+	vhost_destroy_device_notify(dev);
 	dev->vdpa_dev_id = -1;
 }
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3ea64eba6..5e0da0589 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd)
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"(%d) software relay is used for vDPA, performance may be low.\n",
-				dev->vid);
-		}
 	}
 
 	return 0;
@@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int vhost_user_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 3/9] net/ifc: dump debug message for error Xiao Wang
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediate virtio ring.

The available ring relay will synchronize the available entries, and
helps to do desc validity checking.

The used ring relay will synchronize the used entries from mediate ring
to guest ring, and helps to do dirty page logging for live migration.

The next patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
---
 lib/librte_vhost/rte_vdpa.h            |  38 +++++++
 lib/librte_vhost/rte_vhost_version.map |   2 +
 lib/librte_vhost/vdpa.c                | 187 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h               |  40 +++++++
 lib/librte_vhost/virtio_net.c          |  39 -------
 5 files changed, 267 insertions(+), 39 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index fff657391..265250939 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -173,4 +173,42 @@ rte_vdpa_get_device_num(void);
  */
 int __rte_experimental
 rte_vhost_host_notifier_ctrl(int vid, bool enable);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the available ring from guest to mediate ring, help to
+ * check desc validity to protect against malicious guest driver.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param m_vring
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced available entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_avail_ring(int vid, uint16_t qid, void *m_vring);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the used ring from mediate ring to guest, log dirty
+ * page for each Rx buffer used.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param m_vring
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced used entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_used_ring(int vid, uint16_t qid, void *m_vring);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 22302e972..0ad0fbea2 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -84,4 +84,6 @@ EXPERIMENTAL {
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
 	rte_vhost_host_notifier_ctrl;
+	rte_vdpa_relay_avail_ring;
+	rte_vdpa_relay_used_ring;
 };
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index e7d849ee0..16193cfc0 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -122,3 +122,190 @@ rte_vdpa_get_device_num(void)
 {
 	return vdpa_device_num;
 }
+
+static bool
+invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
+{
+	uint64_t desc_addr, desc_chunck_len;
+
+	while (desc_len) {
+		desc_chunck_len = desc_len;
+		desc_addr = vhost_iova_to_vva(dev, vq,
+				desc_iova,
+				&desc_chunck_len,
+				perm);
+
+		if (!desc_addr)
+			return true;
+
+		desc_len -= desc_chunck_len;
+		desc_iova += desc_chunck_len;
+	}
+
+	return false;
+}
+
+int
+rte_vdpa_relay_avail_ring(int vid, uint16_t qid, void *m_vring)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vring_desc desc;
+	struct vhost_virtqueue *vq;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev || !m_vring)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)m_vring;
+	vq = dev->virtqueue[qid];
+	idx = vq->avail->idx;
+	idx_m = s_vring->avail->idx;
+	ret = (uint16_t)(idx - idx_m);
+
+	while (idx_m != idx) {
+		/* avail entry copy */
+		desc_id = vq->avail->ring[idx_m & (vq->size - 1)];
+		s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
+					&dlen, VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+					vq->desc[idx].addr, vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* check if the buf addr is within the guest memory */
+		do {
+			desc = desc_ring[desc_id];
+			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
+						VHOST_ACCESS_RW)) {
+				if (unlikely(idesc))
+					free_ind_table(idesc);
+				return -1;
+			}
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx_m++;
+	}
+
+	rte_smp_wmb();
+	s_vring->avail->idx = idx;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(vq) = idx;
+
+	return ret;
+}
+
+int
+rte_vdpa_relay_used_ring(int vid, uint16_t qid, void *m_vring)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vhost_virtqueue *vq;
+	struct vring_desc desc;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev || !m_vring)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)m_vring;
+	vq = dev->virtqueue[qid];
+	idx = vq->used->idx;
+	idx_m = s_vring->used->idx;
+	ret = (uint16_t)(idx_m - idx);
+
+	while (idx != idx_m) {
+		/* copy used entry, used ring logging is not covered here */
+		vq->used->ring[idx & (vq->size - 1)] =
+			s_vring->used->ring[idx & (vq->size - 1)];
+
+		desc_id = vq->used->ring[idx & (vq->size - 1)].id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+			vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
+					&dlen, VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+					vq->desc[idx].addr, vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* dirty page logging for DMA writeable buffer */
+		do {
+			desc = desc_ring[desc_id];
+			if (desc.flags & VRING_DESC_F_WRITE)
+				vhost_log_write(dev, desc.addr, desc.len);
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx++;
+	}
+
+	rte_smp_wmb();
+	vq->used->idx = idx_m;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vring_used_event(s_vring) = idx_m;
+
+	return ret;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5218f1b12..2164cd6d9 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
+#include <rte_malloc.h>
 
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
@@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		eventfd_write(vq->callfd, (eventfd_t)1);
 }
 
+static __rte_always_inline void *
+alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_addr, uint64_t desc_len)
+{
+	void *idesc;
+	uint64_t src, dst;
+	uint64_t len, remain = desc_len;
+
+	idesc = rte_malloc(__func__, desc_len, 0);
+	if (unlikely(!idesc))
+		return 0;
+
+	dst = (uint64_t)(uintptr_t)idesc;
+
+	while (remain) {
+		len = remain;
+		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
+				VHOST_ACCESS_RO);
+		if (unlikely(!src || !len)) {
+			rte_free(idesc);
+			return 0;
+		}
+
+		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
+
+		remain -= len;
+		dst += len;
+		desc_addr += len;
+	}
+
+	return idesc;
+}
+
+static __rte_always_inline void
+free_ind_table(void *idesc)
+{
+	rte_free(idesc);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..8c657a101 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
 	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
 }
 
-static __rte_always_inline void *
-alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
-		uint64_t desc_addr, uint64_t desc_len)
-{
-	void *idesc;
-	uint64_t src, dst;
-	uint64_t len, remain = desc_len;
-
-	idesc = rte_malloc(__func__, desc_len, 0);
-	if (unlikely(!idesc))
-		return 0;
-
-	dst = (uint64_t)(uintptr_t)idesc;
-
-	while (remain) {
-		len = remain;
-		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
-				VHOST_ACCESS_RO);
-		if (unlikely(!src || !len)) {
-			rte_free(idesc);
-			return 0;
-		}
-
-		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
-
-		remain -= len;
-		dst += len;
-		desc_addr += len;
-	}
-
-	return idesc;
-}
-
-static __rte_always_inline void
-free_ind_table(void *idesc)
-{
-	rte_free(idesc);
-}
-
 static __rte_always_inline void
 do_flush_shadow_used_ring_split(struct virtio_net *dev,
 			struct vhost_virtqueue *vq,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 3/9] net/ifc: dump debug message for error
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 4/9] net/ifc: store only registered device instance Xiao Wang
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Driver probe may fail for different causes, debug message is helpful for
debugging issue.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e844109f3..aacd5f9bf 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -22,7 +22,7 @@
 
 #define DRV_LOG(level, fmt, args...) \
 	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
-		"%s(): " fmt "\n", __func__, ##args)
+		"IFCVF %s(): " fmt "\n", __func__, ##args)
 
 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
@@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->pdev = pci_dev;
 	rte_spinlock_init(&internal->lock);
-	if (ifcvf_vfio_setup(internal) < 0)
-		return -1;
 
-	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0)
-		return -1;
+	if (ifcvf_vfio_setup(internal) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto error;
+	}
+
+	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
+		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
+		goto error;
+	}
 
 	internal->max_queues = IFCVF_MAX_QUEUES;
 	features = ifcvf_get_features(&internal->hw);
@@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
-	if (internal->did < 0)
+	if (internal->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
 		goto error;
+	}
 
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 4/9] net/ifc: store only registered device instance
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (2 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 3/9] net/ifc: dump debug message for error Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang, stable

If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index aacd5f9bf..6fcd50b73 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
-	pthread_mutex_lock(&internal_list_lock);
-	TAILQ_INSERT_TAIL(&internal_list, list, next);
-	pthread_mutex_unlock(&internal_list_lock);
-
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
@@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		goto error;
 	}
 
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 5/9] net/ifc: detect if VDPA mode is specified
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (3 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 4/9] net/ifc: store only registered device instance Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 6/9] net/ifc: add devarg for LM mode Xiao Wang
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.

So if driver doesn't not find this option, it should quit and let the
bus continue the probe.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6fcd50b73..c0e50354a 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -17,6 +17,8 @@
 #include <rte_vfio.h>
 #include <rte_spinlock.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "base/ifcvf.h"
 
@@ -28,6 +30,13 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_VDPA_MODE		"vdpa"
+
+static const char * const ifcvf_valid_arguments[] = {
+	IFCVF_VDPA_MODE,
+	NULL
+};
+
 static int ifcvf_vdpa_logtype;
 
 struct ifcvf_internal {
@@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = {
 	.get_notify_area = ifcvf_get_notify_area,
 };
 
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
 static int
 ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		struct rte_pci_device *pci_dev)
@@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	uint64_t features;
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
+	int vdpa_mode = 0;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			ifcvf_valid_arguments);
+	if (kvlist == NULL)
+		return 1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
+	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
 	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
 	if (list == NULL)
 		goto error;
@@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
+	rte_kvargs_free(kvlist);
 	return 0;
 
 error:
+	rte_kvargs_free(kvlist);
 	rte_free(list);
 	rte_free(internal);
 	return -1;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 6/9] net/ifc: add devarg for LM mode
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (4 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 7/9] net/ifc: use lib API for used ring logging Xiao Wang
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "swlm=1", driver/device will do live migration with a relay thread
dealing with dirty page logging. Without this parameter, device will do
dirty page logging and there's no relay thread consuming CPU resource.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..395c5112f 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>
 
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif
 
 #define IFCVF_VDPA_MODE		"vdpa"
+#define IFCVF_SW_FALLBACK_LM	"swlm"
 
 static const char * const ifcvf_valid_arguments[] = {
 	IFCVF_VDPA_MODE,
+	IFCVF_SW_FALLBACK_LM,
 	NULL
 };
 
@@ -56,6 +59,7 @@ struct ifcvf_internal {
 	rte_atomic32_t dev_attached;
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
+	bool sw_lm;
 };
 
 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
 	int vdpa_mode = 0;
+	int sw_fallback_lm = 0;
 	struct rte_kvargs *kvlist = NULL;
 	int ret = 0;
 
@@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
+	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+				&open_int, &sw_fallback_lm);
+		if (ret < 0)
+			goto error;
+	}
+	internal->sw_lm = sw_fallback_lm;
+
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 7/9] net/ifc: use lib API for used ring logging
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (5 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 6/9] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 395c5112f..f181c5a6e 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -31,6 +31,9 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
 #define IFCVF_VDPA_MODE		"vdpa"
 #define IFCVF_SW_FALLBACK_LM	"swlm"
 
@@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal)
 	return ifcvf_start_hw(&internal->hw);
 }
 
-static void
-ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf)
-{
-	uint32_t i, size;
-	uint64_t pfn;
-
-	pfn = hw->vring[queue].used / PAGE_SIZE;
-	size = hw->vring[queue].size * sizeof(struct vring_used_elem) +
-			sizeof(uint16_t) * 3;
-
-	for (i = 0; i <= size / PAGE_SIZE; i++)
-		__sync_fetch_and_or_8(&log_buf[(pfn + i) / 8],
-				1 << ((pfn + i) % 8));
-}
-
 static void
 vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 {
@@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 	int vid;
 	uint64_t features;
 	uint64_t log_base, log_size;
-	uint8_t *log_buf;
+	uint64_t len;
 
 	vid = internal->vid;
 	ifcvf_stop_hw(hw);
@@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		 * IFCVF marks dirty memory pages for only packet buffer,
 		 * SW helps to mark the used ring as dirty after device stops.
 		 */
-		log_buf = (uint8_t *)(uintptr_t)log_base;
-		for (i = 0; i < hw->nr_vring; i++)
-			ifcvf_used_ring_log(hw, i, log_buf);
+		for (i = 0; i < hw->nr_vring; i++) {
+			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
+			rte_vhost_log_used_vring(vid, i, 0, len);
+		}
 	}
 }
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 8/9] net/ifc: support SW assisted VDPA live migration
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (6 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 7/9] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In SW assisted live migration mode, driver will stop the device and
setup a mediate virtio ring to relay the communication between the
virtio driver and the VDPA device.

This data path intervention will allow SW to help on guest dirty page
logging for live migration.

This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordinly.

User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Make the parameter parsing code shorter.
---
 drivers/net/ifc/base/ifcvf.h |   1 +
 drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
index f026c70ab..8eb70ae9d 100644
--- a/drivers/net/ifc/base/ifcvf.h
+++ b/drivers/net/ifc/base/ifcvf.h
@@ -50,6 +50,7 @@
 #define IFCVF_LM_ENABLE_VF		0x1
 #define IFCVF_LM_ENABLE_PF		0x3
 #define IFCVF_LOG_BASE			0x100000000000
+#define IFCVF_MEDIATE_VRING		0x200000000000
 
 #define IFCVF_32_BIT_MASK		0xffffffff
 
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index f181c5a6e..31ea880b2 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -63,6 +63,9 @@ struct ifcvf_internal {
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
 	bool sw_lm;
+	bool sw_fallback_running;
+	/* mediated vring for sw fallback */
+	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
 };
 
 struct internal_list {
@@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
 				hw->vring[i].last_used_idx);
 
+	if (internal->sw_lm)
+		return;
+
 	rte_vhost_get_negotiated_features(vid, &features);
 	if (RTE_VHOST_NEED_LOG(features)) {
 		ifcvf_disable_logging(hw);
@@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
 	return ret;
 }
 
+static int
+m_ifcvf_start(struct ifcvf_internal *internal)
+{
+	struct ifcvf_hw *hw = &internal->hw;
+	uint32_t i, nr_vring;
+	int vid, ret;
+	struct rte_vhost_vring vq;
+	void *vring_buf;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size;
+	uint64_t gpa;
+
+	vid = internal->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+	rte_vhost_get_negotiated_features(vid, &hw->req_features);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
+		vring_init(&internal->m_vring[i], vq.size, vring_buf,
+				PAGE_SIZE);
+
+		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
+		if (ret < 0) {
+			DRV_LOG(ERR, "mediate vring DMA map failed.");
+			goto error;
+		}
+
+		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
+		if (gpa == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			return -1;
+		}
+		hw->vring[i].desc = gpa;
+
+		hw->vring[i].avail = m_vring_iova +
+			(char *)internal->m_vring[i].avail -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].used = m_vring_iova +
+			(char *)internal->m_vring[i].used -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].size = vq.size;
+
+		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
+				&hw->vring[i].last_used_idx);
+
+		m_vring_iova += size;
+	}
+	hw->nr_vring = nr_vring;
+
+	return ifcvf_start_hw(&internal->hw);
+
+error:
+	for (i = 0; i < nr_vring; i++)
+		if (internal->m_vring[i].desc)
+			rte_free(internal->m_vring[i].desc);
+
+	return -1;
+}
+
+static int
+m_ifcvf_stop(struct ifcvf_internal *internal)
+{
+	int vid;
+	uint32_t i;
+	struct rte_vhost_vring vq;
+	struct ifcvf_hw *hw = &internal->hw;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size, len;
+
+	vid = internal->vid;
+	ifcvf_stop_hw(hw);
+
+	for (i = 0; i < hw->nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+		len = IFCVF_USED_RING_LEN(vq.size);
+		rte_vhost_log_used_vring(vid, i, 0, len);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
+			m_vring_iova, size);
+
+		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
+				hw->vring[i].last_used_idx);
+		rte_free(internal->m_vring[i].desc);
+		m_vring_iova += size;
+	}
+
+	return 0;
+}
+
+static int
+m_enable_vfio_intr(struct ifcvf_internal *internal)
+{
+	uint32_t nr_vring;
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+	int ret;
+
+	nr_vring = rte_vhost_get_vring_num(internal->vid);
+
+	ret = rte_intr_efd_enable(intr_handle, nr_vring);
+	if (ret)
+		return -1;
+
+	ret = rte_intr_enable(intr_handle);
+	if (ret)
+		return -1;
+
+	return 0;
+}
+
+static void
+m_disable_vfio_intr(struct ifcvf_internal *internal)
+{
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+
+	rte_intr_efd_disable(intr_handle);
+	rte_intr_disable(intr_handle);
+}
+
+static void
+update_avail_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_avail_ring(internal->vid, qid, &internal->m_vring[qid]);
+	ifcvf_notify_queue(&internal->hw, qid);
+}
+
+static void
+update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_used_ring(internal->vid, qid, &internal->m_vring[qid]);
+	rte_vhost_vring_call(internal->vid, qid);
+}
+
+static void *
+vring_relay(void *arg)
+{
+	int i, vid, epfd, fd, nfds;
+	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
+	struct rte_vhost_vring vring;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid, q_num;
+	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
+	struct epoll_event ev;
+	int nbytes;
+	uint64_t buf;
+
+	vid = internal->vid;
+	q_num = rte_vhost_get_vring_num(vid);
+	/* prepare the mediate vring */
+	for (qid = 0; qid < q_num; qid++) {
+		rte_vhost_get_vring_base(vid, qid,
+				&internal->m_vring[qid].avail->idx,
+				&internal->m_vring[qid].used->idx);
+		rte_vdpa_relay_avail_ring(vid, qid, &internal->m_vring[qid]);
+	}
+
+	/* add notify fd and interrupt fd to epoll */
+	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
+	if (epfd < 0) {
+		DRV_LOG(ERR, "failed to create epoll instance.");
+		return NULL;
+	}
+	internal->epfd = epfd;
+
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		rte_vhost_get_vhost_vring(vid, qid, &vring);
+		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	intr_handle = &internal->pdev->intr_handle;
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		ev.data.u64 = 1 | qid << 1 |
+			(uint64_t)intr_handle->efds[qid] << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
+				< 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	/* start relay with a first kick */
+	for (qid = 0; qid < q_num; qid++)
+		ifcvf_notify_queue(&internal->hw, qid);
+
+	/* listen to the events and react accordingly */
+	for (;;) {
+		nfds = epoll_wait(epfd, events, q_num * 2, -1);
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			DRV_LOG(ERR, "epoll_wait return fail\n");
+			return NULL;
+		}
+
+		for (i = 0; i < nfds; i++) {
+			fd = (uint32_t)(events[i].data.u64 >> 32);
+			do {
+				nbytes = read(fd, &buf, 8);
+				if (nbytes < 0) {
+					if (errno == EINTR ||
+					    errno == EWOULDBLOCK ||
+					    errno == EAGAIN)
+						continue;
+					DRV_LOG(INFO, "Error reading "
+						"kickfd: %s",
+						strerror(errno));
+				}
+				break;
+			} while (1);
+
+			qid = events[i].data.u32 >> 1;
+
+			if (events[i].data.u32 & 1)
+				update_used_ring(internal, qid);
+			else
+				update_avail_ring(internal, qid);
+		}
+	}
+
+	return NULL;
+}
+
+static int
+setup_vring_relay(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->tid, NULL, vring_relay,
+			(void *)internal);
+	if (ret) {
+		DRV_LOG(ERR, "failed to create ring relay pthread.");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+unset_vring_relay(struct ifcvf_internal *internal)
+{
+	void *status;
+
+	if (internal->tid) {
+		pthread_cancel(internal->tid);
+		pthread_join(internal->tid, &status);
+	}
+	internal->tid = 0;
+
+	if (internal->epfd >= 0)
+		close(internal->epfd);
+	internal->epfd = -1;
+
+	return 0;
+}
+
+static int
+ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	/* stop the direct IO data path */
+	unset_notify_relay(internal);
+	vdpa_ifcvf_stop(internal);
+	vdpa_disable_vfio_intr(internal);
+
+	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
+	if (ret && ret != -ENOTSUP)
+		goto error;
+
+	/* set up interrupt for interrupt relay */
+	ret = m_enable_vfio_intr(internal);
+	if (ret)
+		goto unmap;
+
+	/* config the VF */
+	ret = m_ifcvf_start(internal);
+	if (ret)
+		goto unset_intr;
+
+	/* set up vring relay thread */
+	ret = setup_vring_relay(internal);
+	if (ret)
+		goto stop_vf;
+
+	internal->sw_fallback_running = true;
+
+	return 0;
+
+stop_vf:
+	m_ifcvf_stop(internal);
+unset_intr:
+	m_disable_vfio_intr(internal);
+unmap:
+	ifcvf_dma_map(internal, 0);
+error:
+	return -1;
+}
+
 static int
 ifcvf_dev_config(int vid)
 {
@@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
 	}
 
 	internal = list->internal;
-	rte_atomic32_set(&internal->dev_attached, 0);
-	update_datapath(internal);
+
+	if (internal->sw_fallback_running) {
+		/* unset ring relay */
+		unset_vring_relay(internal);
+
+		/* reset VF */
+		m_ifcvf_stop(internal);
+
+		/* remove interrupt setting */
+		m_disable_vfio_intr(internal);
+
+		/* unset DMA map for guest memory */
+		ifcvf_dma_map(internal, 0);
+
+		internal->sw_fallback_running = false;
+	} else {
+		rte_atomic32_set(&internal->dev_attached, 0);
+		update_datapath(internal);
+	}
 
 	return 0;
 }
@@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
 	internal = list->internal;
 	rte_vhost_get_negotiated_features(vid, &features);
 
-	if (RTE_VHOST_NEED_LOG(features)) {
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+
+	if (internal->sw_lm) {
+		ifcvf_sw_fallback_switchover(internal);
+	} else {
 		rte_vhost_get_log_base(vid, &log_base, &log_size);
 		rte_vfio_container_dma_map(internal->vfio_container_fd,
 				log_base, IFCVF_LOG_BASE, log_size);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 9/9] doc: update ifc NIC document
  2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
                       ` (7 preceding siblings ...)
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-13  1:10     ` Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13  1:10 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Add release note.
---
 doc/guides/nics/ifc.rst                | 7 +++++++
 doc/guides/rel_notes/release_19_02.rst | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
index 48f9adf1d..858f35f74 100644
--- a/doc/guides/nics/ifc.rst
+++ b/doc/guides/nics/ifc.rst
@@ -39,6 +39,12 @@ the driver probe a new container is created for this device, with this
 container vDPA driver can program DMA remapping table with the VM's memory
 region information.
 
+The device argument "swlm=1" will configure the driver into SW assisted live
+migration mode. In this mode, the driver will set up a SW relay thread when LM
+happens, this thread will help device to log dirty pages. Thus this mode does
+not require HW to implement a dirty page logging function block, but will
+consume some percentage of CPU resource depending on the network throughput.
+
 Key IFCVF vDPA driver ops
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -70,6 +76,7 @@ Features
 Features of the IFCVF driver are:
 
 - Compatibility with virtio 0.95 and 1.0.
+- SW assisted vDPA live migration.
 
 
 Prerequisites
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index a94fa86a7..ea3909631 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -54,6 +54,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Add support for SW-assisted VDPA live migration.**
+  This SW-assisted VDPA live migration facility helps VDPA devices without
+  logging capability to perform live migration, a mediate SW relay can help
+  devices to track dirty pages caused by DMA. IFC driver has enabled this
+  SW-assisted live migration mode.
 
 Removed Items
 -------------
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration
  2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-13 10:09       ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
                           ` (8 more replies)
  0 siblings, 9 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In the previous VDPA implementation we have enabled live migration support
by HW accelerator doing all the stuff, including dirty page logging and
device status report/restore. In this mode VDPA sample daemon and device
driver just takes care of the control path and does not involve in data
path, so there's almost 0 CPU resource usage. This mode requires device
to have dirty page logging capability.

This patch series adds live migration support for devices without logging
capability. VDPA driver could set up a relay thread standing between the
guest and device when live migration happens, this relay intervenes into
the communication between guest virtio driver and physical virtio
accelerator, it helps device to do a vring relay and passingly log dirty
pages. Thus some CPU resource will be consumed in this scenario, percentage
depending on the network throughput.

Some new helpers are added into vhost lib for this VDPA SW fallback:
- rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
  datapath.
- rte_vdpa_relay_vring_avail, to relay the available request from guest vring
  to mediate vring.
- rte_vdpa_relay_vring_used, to relay the used response from mediate vring to
  guest vring.

Some existing helpers are also leveraged for SW fallback setup, like VFIO
interrupt configuration, IOMMU table programming, etc.

This patch enables this SW assisted VDPA live migration in ifc driver.
Since ifcvf also supports HW dirty page logging, we add a new devarg
for user to select if the SW mode is used or not.

v3:
* Fix indent in relay code.
* Fix the iova access mode issue of buffer check.
* Rename the relay API to be more generic, and add more API note for used
  ring handling.
* Add kvargs lib dependency in ifc driver.
* Add commit message for the doc update patch for checkpatch warning.

v2:
* Reword the vdpa host notifier control API comment.
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
* Add release note update.

Xiao Wang (9):
  vhost: provide helper for host notifier ctrl
  vhost: provide helpers for virtio ring relay
  net/ifc: dump debug message for error
  net/ifc: store only registered device instance
  net/ifc: detect if VDPA mode is specified
  net/ifc: add devarg for LM mode
  net/ifc: use lib API for used ring logging
  net/ifc: support SW assisted VDPA live migration
  doc: update ifc NIC document

 doc/guides/nics/ifc.rst                |   8 +
 doc/guides/rel_notes/release_19_02.rst |   5 +
 drivers/net/ifc/Makefile               |   1 +
 drivers/net/ifc/base/ifcvf.h           |   1 +
 drivers/net/ifc/ifcvf_vdpa.c           | 461 ++++++++++++++++++++++++++++++---
 lib/librte_vhost/rte_vdpa.h            |  57 ++++
 lib/librte_vhost/rte_vhost_version.map |   3 +
 lib/librte_vhost/vdpa.c                | 194 ++++++++++++++
 lib/librte_vhost/vhost.c               |   3 +-
 lib/librte_vhost/vhost.h               |  40 +++
 lib/librte_vhost/vhost_user.c          |   7 +-
 lib/librte_vhost/virtio_net.c          |  39 ---
 12 files changed, 741 insertions(+), 78 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-14 13:33           ` Maxime Coquelin
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
                           ` (7 subsequent siblings)
  8 siblings, 2 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Reword the vdpa host notifier control API comment.
---
 drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
 lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/vhost.c               |  3 +--
 lib/librte_vhost/vhost_user.c          |  7 +------
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 97a57f182..e844109f3 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
 	return 0;
 }
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index a418da47c..fff657391 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -11,6 +11,8 @@
  * Device specific vhost lib
  */
 
+#include <stdbool.h>
+
 #include <rte_pci.h>
 #include "rte_vhost.h"
 
@@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
  */
 int __rte_experimental
 rte_vdpa_get_device_num(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable/Disable host notifier mapping for a vdpa port.
+ *
+ * @param vid
+ *  vhost device id
+ * @enable
+ *  true for host notifier map, false for host notifier unmap
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_host_notifier_ctrl(int vid, bool enable);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index ae39b6e21..22302e972 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -83,4 +83,5 @@ EXPERIMENTAL {
 	rte_vhost_crypto_finalize_requests;
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
+	rte_vhost_host_notifier_ctrl;
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70ac6bc9c..e7a60e0b4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid)
 	if (dev == NULL)
 		return;
 
-	vhost_user_host_notifier_ctrl(vid, false);
-
+	vhost_destroy_device_notify(dev);
 	dev->vdpa_dev_id = -1;
 }
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 3ea64eba6..5e0da0589 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd)
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"(%d) software relay is used for vDPA, performance may be low.\n",
-				dev->vid);
-		}
 	}
 
 	return 0;
@@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int vhost_user_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 2/9] vhost: provide helpers for virtio ring relay
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 3/9] net/ifc: dump debug message for error Xiao Wang
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediate virtio ring.

The available ring relay will synchronize the available entries, and
helps to do desc validity checking.

The used ring relay will synchronize the used entries from mediate ring
to guest ring, and helps to do dirty page logging for live migration.

The next patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
* Fix indent in relay code.
* Fix the iova access mode issue of buffer check.
* Rename the relay API to be more generic, and add more API note for used
  ring handling.

v2:
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
---
 lib/librte_vhost/rte_vdpa.h            |  39 +++++++
 lib/librte_vhost/rte_vhost_version.map |   2 +
 lib/librte_vhost/vdpa.c                | 194 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h               |  40 +++++++
 lib/librte_vhost/virtio_net.c          |  39 -------
 5 files changed, 275 insertions(+), 39 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index fff657391..02b8d14ed 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -173,4 +173,43 @@ rte_vdpa_get_device_num(void);
  */
 int __rte_experimental
 rte_vhost_host_notifier_ctrl(int vid, bool enable);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the available ring from guest to mediate ring, help to
+ * check desc validity to protect against malicious guest driver.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced available entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the used ring from mediate ring to guest, log dirty
+ * page for each writeable buffer, caller should handle the used
+ * ring logging before device stop.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced used entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 22302e972..dd3b4c1cb 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -84,4 +84,6 @@ EXPERIMENTAL {
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
 	rte_vhost_host_notifier_ctrl;
+	rte_vdpa_relay_vring_avail;
+	rte_vdpa_relay_vring_used;
 };
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index e7d849ee0..dcf6c3b8e 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -122,3 +122,197 @@ rte_vdpa_get_device_num(void)
 {
 	return vdpa_device_num;
 }
+
+static bool
+invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
+{
+	uint64_t desc_addr, desc_chunck_len;
+
+	while (desc_len) {
+		desc_chunck_len = desc_len;
+		desc_addr = vhost_iova_to_vva(dev, vq,
+				desc_iova,
+				&desc_chunck_len,
+				perm);
+
+		if (!desc_addr)
+			return true;
+
+		desc_len -= desc_chunck_len;
+		desc_iova += desc_chunck_len;
+	}
+
+	return false;
+}
+
+int
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vring_desc desc;
+	struct vhost_virtqueue *vq;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+	uint8_t perm;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->avail->idx;
+	idx_m = s_vring->avail->idx;
+	ret = (uint16_t)(idx - idx_m);
+
+	while (idx_m != idx) {
+		/* avail entry copy */
+		desc_id = vq->avail->ring[idx_m & (vq->size - 1)];
+		s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* check if the buf addr is within the guest memory */
+		do {
+			desc = desc_ring[desc_id];
+			perm = desc.flags & VRING_DESC_F_WRITE ?
+				VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
+						perm)) {
+				if (unlikely(idesc))
+					free_ind_table(idesc);
+				return -1;
+			}
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx_m++;
+	}
+
+	rte_smp_wmb();
+	s_vring->avail->idx = idx;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(vq) = idx;
+
+	return ret;
+}
+
+int
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vhost_virtqueue *vq;
+	struct vring_desc desc;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->used->idx;
+	idx_m = s_vring->used->idx;
+	ret = (uint16_t)(idx_m - idx);
+
+	while (idx != idx_m) {
+		/* copy used entry, used ring logging is not covered here */
+		vq->used->ring[idx & (vq->size - 1)] =
+			s_vring->used->ring[idx & (vq->size - 1)];
+
+		desc_id = vq->used->ring[idx & (vq->size - 1)].id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* dirty page logging for DMA writeable buffer */
+		do {
+			desc = desc_ring[desc_id];
+			if (desc.flags & VRING_DESC_F_WRITE)
+				vhost_log_write(dev, desc.addr, desc.len);
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx++;
+	}
+
+	rte_smp_wmb();
+	vq->used->idx = idx_m;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vring_used_event(s_vring) = idx_m;
+
+	return ret;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5218f1b12..2164cd6d9 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
+#include <rte_malloc.h>
 
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
@@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		eventfd_write(vq->callfd, (eventfd_t)1);
 }
 
+static __rte_always_inline void *
+alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_addr, uint64_t desc_len)
+{
+	void *idesc;
+	uint64_t src, dst;
+	uint64_t len, remain = desc_len;
+
+	idesc = rte_malloc(__func__, desc_len, 0);
+	if (unlikely(!idesc))
+		return 0;
+
+	dst = (uint64_t)(uintptr_t)idesc;
+
+	while (remain) {
+		len = remain;
+		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
+				VHOST_ACCESS_RO);
+		if (unlikely(!src || !len)) {
+			rte_free(idesc);
+			return 0;
+		}
+
+		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
+
+		remain -= len;
+		dst += len;
+		desc_addr += len;
+	}
+
+	return idesc;
+}
+
+static __rte_always_inline void
+free_ind_table(void *idesc)
+{
+	rte_free(idesc);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..8c657a101 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
 	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
 }
 
-static __rte_always_inline void *
-alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
-		uint64_t desc_addr, uint64_t desc_len)
-{
-	void *idesc;
-	uint64_t src, dst;
-	uint64_t len, remain = desc_len;
-
-	idesc = rte_malloc(__func__, desc_len, 0);
-	if (unlikely(!idesc))
-		return 0;
-
-	dst = (uint64_t)(uintptr_t)idesc;
-
-	while (remain) {
-		len = remain;
-		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
-				VHOST_ACCESS_RO);
-		if (unlikely(!src || !len)) {
-			rte_free(idesc);
-			return 0;
-		}
-
-		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
-
-		remain -= len;
-		dst += len;
-		desc_addr += len;
-	}
-
-	return idesc;
-}
-
-static __rte_always_inline void
-free_ind_table(void *idesc)
-{
-	rte_free(idesc);
-}
-
 static __rte_always_inline void
 do_flush_shadow_used_ring_split(struct virtio_net *dev,
 			struct vhost_virtqueue *vq,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 3/9] net/ifc: dump debug message for error
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 4/9] net/ifc: store only registered device instance Xiao Wang
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Driver probe may fail for different causes, debug message is helpful for
debugging issue.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e844109f3..aacd5f9bf 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -22,7 +22,7 @@
 
 #define DRV_LOG(level, fmt, args...) \
 	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
-		"%s(): " fmt "\n", __func__, ##args)
+		"IFCVF %s(): " fmt "\n", __func__, ##args)
 
 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
@@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->pdev = pci_dev;
 	rte_spinlock_init(&internal->lock);
-	if (ifcvf_vfio_setup(internal) < 0)
-		return -1;
 
-	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0)
-		return -1;
+	if (ifcvf_vfio_setup(internal) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto error;
+	}
+
+	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
+		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
+		goto error;
+	}
 
 	internal->max_queues = IFCVF_MAX_QUEUES;
 	features = ifcvf_get_features(&internal->hw);
@@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
-	if (internal->did < 0)
+	if (internal->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
 		goto error;
+	}
 
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 4/9] net/ifc: store only registered device instance
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (2 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 3/9] net/ifc: dump debug message for error Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang, stable

If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index aacd5f9bf..6fcd50b73 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
-	pthread_mutex_lock(&internal_list_lock);
-	TAILQ_INSERT_TAIL(&internal_list, list, next);
-	pthread_mutex_unlock(&internal_list_lock);
-
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
@@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		goto error;
 	}
 
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 5/9] net/ifc: detect if VDPA mode is specified
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (3 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 4/9] net/ifc: store only registered device instance Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 6/9] net/ifc: add devarg for LM mode Xiao Wang
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.

So if driver doesn't not find this option, it should quit and let the
bus continue the probe.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
* Add kvargs lib dependency in ifc driver.
---
 drivers/net/ifc/Makefile     |  1 +
 drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile
index 39b36ae5d..7755a87eb 100644
--- a/drivers/net/ifc/Makefile
+++ b/drivers/net/ifc/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_ifc.a
 
 LDLIBS += -lpthread
 LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
+LDLIBS += -lrte_kvargs
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6fcd50b73..c0e50354a 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -17,6 +17,8 @@
 #include <rte_vfio.h>
 #include <rte_spinlock.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "base/ifcvf.h"
 
@@ -28,6 +30,13 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_VDPA_MODE		"vdpa"
+
+static const char * const ifcvf_valid_arguments[] = {
+	IFCVF_VDPA_MODE,
+	NULL
+};
+
 static int ifcvf_vdpa_logtype;
 
 struct ifcvf_internal {
@@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = {
 	.get_notify_area = ifcvf_get_notify_area,
 };
 
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
 static int
 ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		struct rte_pci_device *pci_dev)
@@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	uint64_t features;
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
+	int vdpa_mode = 0;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			ifcvf_valid_arguments);
+	if (kvlist == NULL)
+		return 1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
+	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
 	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
 	if (list == NULL)
 		goto error;
@@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
+	rte_kvargs_free(kvlist);
 	return 0;
 
 error:
+	rte_kvargs_free(kvlist);
 	rte_free(list);
 	rte_free(internal);
 	return -1;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 6/9] net/ifc: add devarg for LM mode
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (4 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 7/9] net/ifc: use lib API for used ring logging Xiao Wang
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "swlm=1", driver/device will do live migration with a relay thread
dealing with dirty page logging. Without this parameter, device will do
dirty page logging and there's no relay thread consuming CPU resource.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..395c5112f 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>
 
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif
 
 #define IFCVF_VDPA_MODE		"vdpa"
+#define IFCVF_SW_FALLBACK_LM	"swlm"
 
 static const char * const ifcvf_valid_arguments[] = {
 	IFCVF_VDPA_MODE,
+	IFCVF_SW_FALLBACK_LM,
 	NULL
 };
 
@@ -56,6 +59,7 @@ struct ifcvf_internal {
 	rte_atomic32_t dev_attached;
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
+	bool sw_lm;
 };
 
 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
 	int vdpa_mode = 0;
+	int sw_fallback_lm = 0;
 	struct rte_kvargs *kvlist = NULL;
 	int ret = 0;
 
@@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
+	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+				&open_int, &sw_fallback_lm);
+		if (ret < 0)
+			goto error;
+	}
+	internal->sw_lm = sw_fallback_lm;
+
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 7/9] net/ifc: use lib API for used ring logging
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (5 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 6/9] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 395c5112f..f181c5a6e 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -31,6 +31,9 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
 #define IFCVF_VDPA_MODE		"vdpa"
 #define IFCVF_SW_FALLBACK_LM	"swlm"
 
@@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal)
 	return ifcvf_start_hw(&internal->hw);
 }
 
-static void
-ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf)
-{
-	uint32_t i, size;
-	uint64_t pfn;
-
-	pfn = hw->vring[queue].used / PAGE_SIZE;
-	size = hw->vring[queue].size * sizeof(struct vring_used_elem) +
-			sizeof(uint16_t) * 3;
-
-	for (i = 0; i <= size / PAGE_SIZE; i++)
-		__sync_fetch_and_or_8(&log_buf[(pfn + i) / 8],
-				1 << ((pfn + i) % 8));
-}
-
 static void
 vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 {
@@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 	int vid;
 	uint64_t features;
 	uint64_t log_base, log_size;
-	uint8_t *log_buf;
+	uint64_t len;
 
 	vid = internal->vid;
 	ifcvf_stop_hw(hw);
@@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		 * IFCVF marks dirty memory pages for only packet buffer,
 		 * SW helps to mark the used ring as dirty after device stops.
 		 */
-		log_buf = (uint8_t *)(uintptr_t)log_base;
-		for (i = 0; i < hw->nr_vring; i++)
-			ifcvf_used_ring_log(hw, i, log_buf);
+		for (i = 0; i < hw->nr_vring; i++) {
+			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
+			rte_vhost_log_used_vring(vid, i, 0, len);
+		}
 	}
 }
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 8/9] net/ifc: support SW assisted VDPA live migration
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (6 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 7/9] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 9/9] doc: update ifc NIC document Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In SW assisted live migration mode, driver will stop the device and
setup a mediate virtio ring to relay the communication between the
virtio driver and the VDPA device.

This data path intervention will allow SW to help on guest dirty page
logging for live migration.

This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordinly.

User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v2:
* Make the parameter parsing code shorter.
---
 drivers/net/ifc/base/ifcvf.h |   1 +
 drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
index f026c70ab..8eb70ae9d 100644
--- a/drivers/net/ifc/base/ifcvf.h
+++ b/drivers/net/ifc/base/ifcvf.h
@@ -50,6 +50,7 @@
 #define IFCVF_LM_ENABLE_VF		0x1
 #define IFCVF_LM_ENABLE_PF		0x3
 #define IFCVF_LOG_BASE			0x100000000000
+#define IFCVF_MEDIATE_VRING		0x200000000000
 
 #define IFCVF_32_BIT_MASK		0xffffffff
 
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index f181c5a6e..61757d0b4 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -63,6 +63,9 @@ struct ifcvf_internal {
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
 	bool sw_lm;
+	bool sw_fallback_running;
+	/* mediated vring for sw fallback */
+	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
 };
 
 struct internal_list {
@@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
 				hw->vring[i].last_used_idx);
 
+	if (internal->sw_lm)
+		return;
+
 	rte_vhost_get_negotiated_features(vid, &features);
 	if (RTE_VHOST_NEED_LOG(features)) {
 		ifcvf_disable_logging(hw);
@@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
 	return ret;
 }
 
+static int
+m_ifcvf_start(struct ifcvf_internal *internal)
+{
+	struct ifcvf_hw *hw = &internal->hw;
+	uint32_t i, nr_vring;
+	int vid, ret;
+	struct rte_vhost_vring vq;
+	void *vring_buf;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size;
+	uint64_t gpa;
+
+	vid = internal->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+	rte_vhost_get_negotiated_features(vid, &hw->req_features);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
+		vring_init(&internal->m_vring[i], vq.size, vring_buf,
+				PAGE_SIZE);
+
+		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
+		if (ret < 0) {
+			DRV_LOG(ERR, "mediate vring DMA map failed.");
+			goto error;
+		}
+
+		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
+		if (gpa == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			return -1;
+		}
+		hw->vring[i].desc = gpa;
+
+		hw->vring[i].avail = m_vring_iova +
+			(char *)internal->m_vring[i].avail -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].used = m_vring_iova +
+			(char *)internal->m_vring[i].used -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].size = vq.size;
+
+		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
+				&hw->vring[i].last_used_idx);
+
+		m_vring_iova += size;
+	}
+	hw->nr_vring = nr_vring;
+
+	return ifcvf_start_hw(&internal->hw);
+
+error:
+	for (i = 0; i < nr_vring; i++)
+		if (internal->m_vring[i].desc)
+			rte_free(internal->m_vring[i].desc);
+
+	return -1;
+}
+
+static int
+m_ifcvf_stop(struct ifcvf_internal *internal)
+{
+	int vid;
+	uint32_t i;
+	struct rte_vhost_vring vq;
+	struct ifcvf_hw *hw = &internal->hw;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size, len;
+
+	vid = internal->vid;
+	ifcvf_stop_hw(hw);
+
+	for (i = 0; i < hw->nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+		len = IFCVF_USED_RING_LEN(vq.size);
+		rte_vhost_log_used_vring(vid, i, 0, len);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
+			m_vring_iova, size);
+
+		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
+				hw->vring[i].last_used_idx);
+		rte_free(internal->m_vring[i].desc);
+		m_vring_iova += size;
+	}
+
+	return 0;
+}
+
+static int
+m_enable_vfio_intr(struct ifcvf_internal *internal)
+{
+	uint32_t nr_vring;
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+	int ret;
+
+	nr_vring = rte_vhost_get_vring_num(internal->vid);
+
+	ret = rte_intr_efd_enable(intr_handle, nr_vring);
+	if (ret)
+		return -1;
+
+	ret = rte_intr_enable(intr_handle);
+	if (ret)
+		return -1;
+
+	return 0;
+}
+
+static void
+m_disable_vfio_intr(struct ifcvf_internal *internal)
+{
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+
+	rte_intr_efd_disable(intr_handle);
+	rte_intr_disable(intr_handle);
+}
+
+static void
+update_avail_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_avail(internal->vid, qid, &internal->m_vring[qid]);
+	ifcvf_notify_queue(&internal->hw, qid);
+}
+
+static void
+update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_used(internal->vid, qid, &internal->m_vring[qid]);
+	rte_vhost_vring_call(internal->vid, qid);
+}
+
+static void *
+vring_relay(void *arg)
+{
+	int i, vid, epfd, fd, nfds;
+	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
+	struct rte_vhost_vring vring;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid, q_num;
+	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
+	struct epoll_event ev;
+	int nbytes;
+	uint64_t buf;
+
+	vid = internal->vid;
+	q_num = rte_vhost_get_vring_num(vid);
+	/* prepare the mediate vring */
+	for (qid = 0; qid < q_num; qid++) {
+		rte_vhost_get_vring_base(vid, qid,
+				&internal->m_vring[qid].avail->idx,
+				&internal->m_vring[qid].used->idx);
+		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
+	}
+
+	/* add notify fd and interrupt fd to epoll */
+	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
+	if (epfd < 0) {
+		DRV_LOG(ERR, "failed to create epoll instance.");
+		return NULL;
+	}
+	internal->epfd = epfd;
+
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		rte_vhost_get_vhost_vring(vid, qid, &vring);
+		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	intr_handle = &internal->pdev->intr_handle;
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		ev.data.u64 = 1 | qid << 1 |
+			(uint64_t)intr_handle->efds[qid] << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
+				< 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	/* start relay with a first kick */
+	for (qid = 0; qid < q_num; qid++)
+		ifcvf_notify_queue(&internal->hw, qid);
+
+	/* listen to the events and react accordingly */
+	for (;;) {
+		nfds = epoll_wait(epfd, events, q_num * 2, -1);
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			DRV_LOG(ERR, "epoll_wait return fail\n");
+			return NULL;
+		}
+
+		for (i = 0; i < nfds; i++) {
+			fd = (uint32_t)(events[i].data.u64 >> 32);
+			do {
+				nbytes = read(fd, &buf, 8);
+				if (nbytes < 0) {
+					if (errno == EINTR ||
+					    errno == EWOULDBLOCK ||
+					    errno == EAGAIN)
+						continue;
+					DRV_LOG(INFO, "Error reading "
+						"kickfd: %s",
+						strerror(errno));
+				}
+				break;
+			} while (1);
+
+			qid = events[i].data.u32 >> 1;
+
+			if (events[i].data.u32 & 1)
+				update_used_ring(internal, qid);
+			else
+				update_avail_ring(internal, qid);
+		}
+	}
+
+	return NULL;
+}
+
+static int
+setup_vring_relay(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->tid, NULL, vring_relay,
+			(void *)internal);
+	if (ret) {
+		DRV_LOG(ERR, "failed to create ring relay pthread.");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+unset_vring_relay(struct ifcvf_internal *internal)
+{
+	void *status;
+
+	if (internal->tid) {
+		pthread_cancel(internal->tid);
+		pthread_join(internal->tid, &status);
+	}
+	internal->tid = 0;
+
+	if (internal->epfd >= 0)
+		close(internal->epfd);
+	internal->epfd = -1;
+
+	return 0;
+}
+
+static int
+ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	/* stop the direct IO data path */
+	unset_notify_relay(internal);
+	vdpa_ifcvf_stop(internal);
+	vdpa_disable_vfio_intr(internal);
+
+	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
+	if (ret && ret != -ENOTSUP)
+		goto error;
+
+	/* set up interrupt for interrupt relay */
+	ret = m_enable_vfio_intr(internal);
+	if (ret)
+		goto unmap;
+
+	/* config the VF */
+	ret = m_ifcvf_start(internal);
+	if (ret)
+		goto unset_intr;
+
+	/* set up vring relay thread */
+	ret = setup_vring_relay(internal);
+	if (ret)
+		goto stop_vf;
+
+	internal->sw_fallback_running = true;
+
+	return 0;
+
+stop_vf:
+	m_ifcvf_stop(internal);
+unset_intr:
+	m_disable_vfio_intr(internal);
+unmap:
+	ifcvf_dma_map(internal, 0);
+error:
+	return -1;
+}
+
 static int
 ifcvf_dev_config(int vid)
 {
@@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
 	}
 
 	internal = list->internal;
-	rte_atomic32_set(&internal->dev_attached, 0);
-	update_datapath(internal);
+
+	if (internal->sw_fallback_running) {
+		/* unset ring relay */
+		unset_vring_relay(internal);
+
+		/* reset VF */
+		m_ifcvf_stop(internal);
+
+		/* remove interrupt setting */
+		m_disable_vfio_intr(internal);
+
+		/* unset DMA map for guest memory */
+		ifcvf_dma_map(internal, 0);
+
+		internal->sw_fallback_running = false;
+	} else {
+		rte_atomic32_set(&internal->dev_attached, 0);
+		update_datapath(internal);
+	}
 
 	return 0;
 }
@@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
 	internal = list->internal;
 	rte_vhost_get_negotiated_features(vid, &features);
 
-	if (RTE_VHOST_NEED_LOG(features)) {
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+
+	if (internal->sw_lm) {
+		ifcvf_sw_fallback_switchover(internal);
+	} else {
 		rte_vhost_get_log_base(vid, &log_base, &log_size);
 		rte_vfio_container_dma_map(internal->vfio_container_fd,
 				log_base, IFCVF_LOG_BASE, log_size);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 9/9] doc: update ifc NIC document
  2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
                           ` (7 preceding siblings ...)
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-13 10:09         ` Xiao Wang
  8 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-13 10:09 UTC (permalink / raw)
  To: alejandro.lucero, tiwei.bie
  Cc: maxime.coquelin, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Add the SW assisted VDPA live migration feature into NIC doc.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
v3:
* Add commit message for the doc update patch.
* More description on the device argument.

v2:
* Add release note.
---
 doc/guides/nics/ifc.rst                | 8 ++++++++
 doc/guides/rel_notes/release_19_02.rst | 5 +++++
 2 files changed, 13 insertions(+)

diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
index 48f9adf1d..eb55d329a 100644
--- a/doc/guides/nics/ifc.rst
+++ b/doc/guides/nics/ifc.rst
@@ -39,6 +39,13 @@ the driver probe a new container is created for this device, with this
 container vDPA driver can program DMA remapping table with the VM's memory
 region information.
 
+The device argument "swlm=1" will configure the driver into SW assisted live
+migration mode. In this mode, the driver will set up a SW relay thread when LM
+happens, this thread will help device to log dirty pages. Thus this mode does
+not require HW to implement a dirty page logging function block, but will
+consume some percentage of CPU resource depending on the network throughput.
+If no "swlm=1" specified, driver will rely on device's logging capability.
+
 Key IFCVF vDPA driver ops
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -70,6 +77,7 @@ Features
 Features of the IFCVF driver are:
 
 - Compatibility with virtio 0.95 and 1.0.
+- SW assisted vDPA live migration.
 
 
 Prerequisites
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index a94fa86a7..ea3909631 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -54,6 +54,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Add support for SW-assisted VDPA live migration.**
+  This SW-assisted VDPA live migration facility helps VDPA devices without
+  logging capability to perform live migration, a mediate SW relay can help
+  devices to track dirty pages caused by DMA. IFC driver has enabled this
+  SW-assisted live migration mode.
 
 Removed Items
 -------------
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-14 13:33           ` Maxime Coquelin
  2018-12-14 19:05             ` Wang, Xiao W
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
  1 sibling, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-14 13:33 UTC (permalink / raw)
  To: Xiao Wang, alejandro.lucero, tiwei.bie; +Cc: dev, zhihong.wang, xiaolong.ye



On 12/13/18 11:09 AM, Xiao Wang wrote:
> VDPA driver can decide if it needs to enable/disable the host notifier
> mapping, so exposing a API can allow flexibility. A later patch will
> base on this.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
> v2:
> * Reword the vdpa host notifier control API comment.
> ---
>   drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
>   lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
>   lib/librte_vhost/rte_vhost_version.map |  1 +
>   lib/librte_vhost/vhost.c               |  3 +--
>   lib/librte_vhost/vhost_user.c          |  7 +------
>   5 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> index 97a57f182..e844109f3 100644
> --- a/drivers/net/ifc/ifcvf_vdpa.c
> +++ b/drivers/net/ifc/ifcvf_vdpa.c
> @@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
>   	rte_atomic32_set(&internal->dev_attached, 1);
>   	update_datapath(internal);
>   
> +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> +
>   	return 0;
>   }
>   
> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> index a418da47c..fff657391 100644
> --- a/lib/librte_vhost/rte_vdpa.h
> +++ b/lib/librte_vhost/rte_vdpa.h
> @@ -11,6 +11,8 @@
>    * Device specific vhost lib
>    */
>   
> +#include <stdbool.h>
> +
>   #include <rte_pci.h>
>   #include "rte_vhost.h"
>   
> @@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
>    */
>   int __rte_experimental
>   rte_vdpa_get_device_num(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enable/Disable host notifier mapping for a vdpa port.
> + *
> + * @param vid
> + *  vhost device id
> + * @enable
> + *  true for host notifier map, false for host notifier unmap
> + * @return
> + *  0 on success, -1 on failure
> + */
> +int __rte_experimental
> +rte_vhost_host_notifier_ctrl(int vid, bool enable);
>   #endif /* _RTE_VDPA_H_ */
> diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
> index ae39b6e21..22302e972 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -83,4 +83,5 @@ EXPERIMENTAL {
>   	rte_vhost_crypto_finalize_requests;
>   	rte_vhost_crypto_set_zero_copy;
>   	rte_vhost_va_from_guest_pa;
> +	rte_vhost_host_notifier_ctrl;
>   };
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 70ac6bc9c..e7a60e0b4 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid)
>   	if (dev == NULL)
>   		return;
>   
> -	vhost_user_host_notifier_ctrl(vid, false);
> -
> +	vhost_destroy_device_notify(dev);
It seems that is addition is not mentioned in the commit message.
Why is it needed now?


>   	dev->vdpa_dev_id = -1;
>   }
>   
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 3ea64eba6..5e0da0589 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd)
>   		if (vdpa_dev->ops->dev_conf)
>   			vdpa_dev->ops->dev_conf(dev->vid);
>   		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
> -		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
> -			RTE_LOG(INFO, VHOST_CONFIG,
> -				"(%d) software relay is used for vDPA, performance may be low.\n",
> -				dev->vid);
> -		}
>   	}
>   
>   	return 0;
> @@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
>   	return process_slave_message_reply(dev, &msg);
>   }
>   
> -int vhost_user_host_notifier_ctrl(int vid, bool enable)
> +int rte_vhost_host_notifier_ctrl(int vid, bool enable)
>   {
>   	struct virtio_net *dev;
>   	struct rte_vdpa_device *vdpa_dev;
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl
  2018-12-14 13:33           ` Maxime Coquelin
@ 2018-12-14 19:05             ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-14 19:05 UTC (permalink / raw)
  To: Maxime Coquelin, alejandro.lucero, Bie, Tiwei
  Cc: dev, Wang, Zhihong, Ye, Xiaolong

Hi,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, December 14, 2018 5:33 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>;
> alejandro.lucero@netronome.com; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>; Ye, Xiaolong
> <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v3 1/9] vhost: provide helper for host notifier ctrl
> 
> 
> 
> On 12/13/18 11:09 AM, Xiao Wang wrote:
> > VDPA driver can decide if it needs to enable/disable the host notifier
> > mapping, so exposing a API can allow flexibility. A later patch will
> > base on this.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> > v2:
> > * Reword the vdpa host notifier control API comment.
> > ---
> >   drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
> >   lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
> >   lib/librte_vhost/rte_vhost_version.map |  1 +
> >   lib/librte_vhost/vhost.c               |  3 +--
> >   lib/librte_vhost/vhost_user.c          |  7 +------
> >   5 files changed, 24 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> > index 97a57f182..e844109f3 100644
> > --- a/drivers/net/ifc/ifcvf_vdpa.c
> > +++ b/drivers/net/ifc/ifcvf_vdpa.c
> > @@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
> >   	rte_atomic32_set(&internal->dev_attached, 1);
> >   	update_datapath(internal);
> >
> > +	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
> > +		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
> > +
> >   	return 0;
> >   }
> >
> > diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> > index a418da47c..fff657391 100644
> > --- a/lib/librte_vhost/rte_vdpa.h
> > +++ b/lib/librte_vhost/rte_vdpa.h
> > @@ -11,6 +11,8 @@
> >    * Device specific vhost lib
> >    */
> >
> > +#include <stdbool.h>
> > +
> >   #include <rte_pci.h>
> >   #include "rte_vhost.h"
> >
> > @@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
> >    */
> >   int __rte_experimental
> >   rte_vdpa_get_device_num(void);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enable/Disable host notifier mapping for a vdpa port.
> > + *
> > + * @param vid
> > + *  vhost device id
> > + * @enable
> > + *  true for host notifier map, false for host notifier unmap
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_host_notifier_ctrl(int vid, bool enable);
> >   #endif /* _RTE_VDPA_H_ */
> > diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> > index ae39b6e21..22302e972 100644
> > --- a/lib/librte_vhost/rte_vhost_version.map
> > +++ b/lib/librte_vhost/rte_vhost_version.map
> > @@ -83,4 +83,5 @@ EXPERIMENTAL {
> >   	rte_vhost_crypto_finalize_requests;
> >   	rte_vhost_crypto_set_zero_copy;
> >   	rte_vhost_va_from_guest_pa;
> > +	rte_vhost_host_notifier_ctrl;
> >   };
> > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> > index 70ac6bc9c..e7a60e0b4 100644
> > --- a/lib/librte_vhost/vhost.c
> > +++ b/lib/librte_vhost/vhost.c
> > @@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid)
> >   	if (dev == NULL)
> >   		return;
> >
> > -	vhost_user_host_notifier_ctrl(vid, false);
> > -
> > +	vhost_destroy_device_notify(dev);
> It seems that is addition is not mentioned in the commit message.
> Why is it needed now?

Compared with the vhost_attach_vdpa_device, I think we should not just disable host notifier, but also destroy the vhost port. Also, this internal API is currently not used.
Yes, we need to mention this point in the commit message. BTW, I prefer to remove this unused internal API, by a separate patch.

BRs,
Xiao

> 
> 
> >   	dev->vdpa_dev_id = -1;
> >   }
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 3ea64eba6..5e0da0589 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd)
> >   		if (vdpa_dev->ops->dev_conf)
> >   			vdpa_dev->ops->dev_conf(dev->vid);
> >   		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
> > -		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
> > -			RTE_LOG(INFO, VHOST_CONFIG,
> > -				"(%d) software relay is used for vDPA,
> performance may be low.\n",
> > -				dev->vid);
> > -		}
> >   	}
> >
> >   	return 0;
> > @@ -2144,7 +2139,7 @@ static int
> vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
> >   	return process_slave_message_reply(dev, &msg);
> >   }
> >
> > -int vhost_user_host_notifier_ctrl(int vid, bool enable)
> > +int rte_vhost_host_notifier_ctrl(int vid, bool enable)
> >   {
> >   	struct virtio_net *dev;
> >   	struct rte_vdpa_device *vdpa_dev;
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration
  2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
  2018-12-14 13:33           ` Maxime Coquelin
@ 2018-12-14 21:16           ` Xiao Wang
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API Xiao Wang
                               ` (10 more replies)
  1 sibling, 11 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In the previous VDPA implementation we have enabled live migration support
by HW accelerator doing all the stuff, including dirty page logging and
device status report/restore. In this mode VDPA sample daemon and device
driver just takes care of the control path and does not involve in data
path, so there's almost 0 CPU resource usage. This mode requires device
to have dirty page logging capability.

This patch series adds live migration support for devices without logging
capability. VDPA driver could set up a relay thread standing between the
guest and device when live migration happens, this relay intervenes into
the communication between guest virtio driver and physical virtio
accelerator, it helps device to do a vring relay and passingly log dirty
pages. Thus some CPU resource will be consumed in this scenario, percentage
depending on the network throughput.

Some new helpers are added into vhost lib for this VDPA SW fallback:
- rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
  datapath.
- rte_vdpa_relay_vring_avail, to relay the available request from guest vring
  to mediate vring.
- rte_vdpa_relay_vring_used, to relay the used response from mediate vring to
  guest vring.

Some existing helpers are also leveraged for SW fallback setup, like VFIO
interrupt configuration, IOMMU table programming, etc.

This patch enables this SW assisted VDPA live migration in ifc driver.
Since ifcvf also supports HW dirty page logging, we add a new devarg
for user to select if the SW mode is used or not.

v4:
* Add a patch to remove the unused vhost internal API: vhost_detach_vdpa_device().

v3:
* Fix indent in relay code.
* Fix the iova access mode issue of buffer check.
* Rename the relay API to be more generic, and add more API note for used
  ring handling.
* Add kvargs lib dependency in ifc driver.
* Add commit message for the doc update patch for checkpatch warning.

v2:
* Reword the vdpa host notifier control API comment.
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
* Add release note update.

Xiao Wang (10):
  vhost: remove unused internal API
  vhost: provide helper for host notifier ctrl
  vhost: provide helpers for virtio ring relay
  net/ifc: dump debug message for error
  net/ifc: store only registered device instance
  net/ifc: detect if VDPA mode is specified
  net/ifc: add devarg for LM mode
  net/ifc: use lib API for used ring logging
  net/ifc: support SW assisted VDPA live migration
  doc: update ifc NIC document

 doc/guides/nics/ifc.rst                |   8 +
 doc/guides/rel_notes/release_19_02.rst |   6 +
 drivers/net/ifc/Makefile               |   1 +
 drivers/net/ifc/base/ifcvf.h           |   1 +
 drivers/net/ifc/ifcvf_vdpa.c           | 461 ++++++++++++++++++++++++++++++---
 lib/librte_vhost/rte_vdpa.h            |  57 ++++
 lib/librte_vhost/rte_vhost_version.map |   3 +
 lib/librte_vhost/vdpa.c                | 194 ++++++++++++++
 lib/librte_vhost/vhost.c               |  13 -
 lib/librte_vhost/vhost.h               |  41 ++-
 lib/librte_vhost/vhost_user.c          |   7 +-
 lib/librte_vhost/virtio_net.c          |  39 ---
 12 files changed, 741 insertions(+), 90 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  8:58               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
                               ` (9 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 lib/librte_vhost/vhost.c | 13 -------------
 lib/librte_vhost/vhost.h |  1 -
 2 files changed, 14 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70ac6bc9c..b32babee4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -400,19 +400,6 @@ vhost_attach_vdpa_device(int vid, int did)
 	dev->vdpa_dev_id = did;
 }
 
-void
-vhost_detach_vdpa_device(int vid)
-{
-	struct virtio_net *dev = get_device(vid);
-
-	if (dev == NULL)
-		return;
-
-	vhost_user_host_notifier_ctrl(vid, false);
-
-	dev->vdpa_dev_id = -1;
-}
-
 void
 vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 552b9298d..d5bab4803 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -629,7 +629,6 @@ void free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq);
 int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
 
 void vhost_attach_vdpa_device(int vid, int did);
-void vhost_detach_vdpa_device(int vid);
 
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
 void vhost_enable_dequeue_zero_copy(int vid);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:00               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
                               ` (8 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
 lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/vhost_user.c          |  7 +------
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 97a57f182..e844109f3 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
 	return 0;
 }
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index a418da47c..fff657391 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -11,6 +11,8 @@
  * Device specific vhost lib
  */
 
+#include <stdbool.h>
+
 #include <rte_pci.h>
 #include "rte_vhost.h"
 
@@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
  */
 int __rte_experimental
 rte_vdpa_get_device_num(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable/Disable host notifier mapping for a vdpa port.
+ *
+ * @param vid
+ *  vhost device id
+ * @enable
+ *  true for host notifier map, false for host notifier unmap
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_host_notifier_ctrl(int vid, bool enable);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index ae39b6e21..22302e972 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -83,4 +83,5 @@ EXPERIMENTAL {
 	rte_vhost_crypto_finalize_requests;
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
+	rte_vhost_host_notifier_ctrl;
 };
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 557213491..8fec773d5 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2049,11 +2049,6 @@ vhost_user_msg_handler(int vid, int fd)
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"(%d) software relay is used for vDPA, performance may be low.\n",
-				dev->vid);
-		}
 	}
 
 	return 0;
@@ -2148,7 +2143,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int vhost_user_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API Xiao Wang
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:10               ` Maxime Coquelin
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error Xiao Wang
                               ` (7 subsequent siblings)
  10 siblings, 2 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediate virtio ring.

The available ring relay will synchronize the available entries, and
helps to do desc validity checking.

The used ring relay will synchronize the used entries from mediate ring
to guest ring, and helps to do dirty page logging for live migration.

The next patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 lib/librte_vhost/rte_vdpa.h            |  39 +++++++
 lib/librte_vhost/rte_vhost_version.map |   2 +
 lib/librte_vhost/vdpa.c                | 194 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h               |  40 +++++++
 lib/librte_vhost/virtio_net.c          |  39 -------
 5 files changed, 275 insertions(+), 39 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index fff657391..02b8d14ed 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -173,4 +173,43 @@ rte_vdpa_get_device_num(void);
  */
 int __rte_experimental
 rte_vhost_host_notifier_ctrl(int vid, bool enable);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the available ring from guest to mediate ring, help to
+ * check desc validity to protect against malicious guest driver.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced available entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the used ring from mediate ring to guest, log dirty
+ * page for each writeable buffer, caller should handle the used
+ * ring logging before device stop.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediate virtio ring pointer
+ * @return
+ *  number of synced used entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 22302e972..dd3b4c1cb 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -84,4 +84,6 @@ EXPERIMENTAL {
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
 	rte_vhost_host_notifier_ctrl;
+	rte_vdpa_relay_vring_avail;
+	rte_vdpa_relay_vring_used;
 };
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index e7d849ee0..dcf6c3b8e 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -122,3 +122,197 @@ rte_vdpa_get_device_num(void)
 {
 	return vdpa_device_num;
 }
+
+static bool
+invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
+{
+	uint64_t desc_addr, desc_chunck_len;
+
+	while (desc_len) {
+		desc_chunck_len = desc_len;
+		desc_addr = vhost_iova_to_vva(dev, vq,
+				desc_iova,
+				&desc_chunck_len,
+				perm);
+
+		if (!desc_addr)
+			return true;
+
+		desc_len -= desc_chunck_len;
+		desc_iova += desc_chunck_len;
+	}
+
+	return false;
+}
+
+int
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vring_desc desc;
+	struct vhost_virtqueue *vq;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+	uint8_t perm;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->avail->idx;
+	idx_m = s_vring->avail->idx;
+	ret = (uint16_t)(idx - idx_m);
+
+	while (idx_m != idx) {
+		/* avail entry copy */
+		desc_id = vq->avail->ring[idx_m & (vq->size - 1)];
+		s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* check if the buf addr is within the guest memory */
+		do {
+			desc = desc_ring[desc_id];
+			perm = desc.flags & VRING_DESC_F_WRITE ?
+				VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
+						perm)) {
+				if (unlikely(idesc))
+					free_ind_table(idesc);
+				return -1;
+			}
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx_m++;
+	}
+
+	rte_smp_wmb();
+	s_vring->avail->idx = idx;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(vq) = idx;
+
+	return ret;
+}
+
+int
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vhost_virtqueue *vq;
+	struct vring_desc desc;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->used->idx;
+	idx_m = s_vring->used->idx;
+	ret = (uint16_t)(idx_m - idx);
+
+	while (idx != idx_m) {
+		/* copy used entry, used ring logging is not covered here */
+		vq->used->ring[idx & (vq->size - 1)] =
+			s_vring->used->ring[idx & (vq->size - 1)];
+
+		desc_id = vq->used->ring[idx & (vq->size - 1)].id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* dirty page logging for DMA writeable buffer */
+		do {
+			desc = desc_ring[desc_id];
+			if (desc.flags & VRING_DESC_F_WRITE)
+				vhost_log_write(dev, desc.addr, desc.len);
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx++;
+	}
+
+	rte_smp_wmb();
+	vq->used->idx = idx_m;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vring_used_event(s_vring) = idx_m;
+
+	return ret;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d5bab4803..3b3265c4b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
+#include <rte_malloc.h>
 
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
@@ -754,4 +755,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		eventfd_write(vq->callfd, (eventfd_t)1);
 }
 
+static __rte_always_inline void *
+alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_addr, uint64_t desc_len)
+{
+	void *idesc;
+	uint64_t src, dst;
+	uint64_t len, remain = desc_len;
+
+	idesc = rte_malloc(__func__, desc_len, 0);
+	if (unlikely(!idesc))
+		return 0;
+
+	dst = (uint64_t)(uintptr_t)idesc;
+
+	while (remain) {
+		len = remain;
+		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
+				VHOST_ACCESS_RO);
+		if (unlikely(!src || !len)) {
+			rte_free(idesc);
+			return 0;
+		}
+
+		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
+
+		remain -= len;
+		dst += len;
+		desc_addr += len;
+	}
+
+	return idesc;
+}
+
+static __rte_always_inline void
+free_ind_table(void *idesc)
+{
+	rte_free(idesc);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..8c657a101 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
 	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
 }
 
-static __rte_always_inline void *
-alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
-		uint64_t desc_addr, uint64_t desc_len)
-{
-	void *idesc;
-	uint64_t src, dst;
-	uint64_t len, remain = desc_len;
-
-	idesc = rte_malloc(__func__, desc_len, 0);
-	if (unlikely(!idesc))
-		return 0;
-
-	dst = (uint64_t)(uintptr_t)idesc;
-
-	while (remain) {
-		len = remain;
-		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
-				VHOST_ACCESS_RO);
-		if (unlikely(!src || !len)) {
-			rte_free(idesc);
-			return 0;
-		}
-
-		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
-
-		remain -= len;
-		dst += len;
-		desc_addr += len;
-	}
-
-	return idesc;
-}
-
-static __rte_always_inline void
-free_ind_table(void *idesc)
-{
-	rte_free(idesc);
-}
-
 static __rte_always_inline void
 do_flush_shadow_used_ring_split(struct virtio_net *dev,
 			struct vhost_virtqueue *vq,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (2 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:11               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance Xiao Wang
                               ` (6 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Driver probe may fail for different causes, debug message is helpful for
debugging issue.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e844109f3..aacd5f9bf 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -22,7 +22,7 @@
 
 #define DRV_LOG(level, fmt, args...) \
 	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
-		"%s(): " fmt "\n", __func__, ##args)
+		"IFCVF %s(): " fmt "\n", __func__, ##args)
 
 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
@@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->pdev = pci_dev;
 	rte_spinlock_init(&internal->lock);
-	if (ifcvf_vfio_setup(internal) < 0)
-		return -1;
 
-	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0)
-		return -1;
+	if (ifcvf_vfio_setup(internal) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto error;
+	}
+
+	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
+		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
+		goto error;
+	}
 
 	internal->max_queues = IFCVF_MAX_QUEUES;
 	features = ifcvf_get_features(&internal->hw);
@@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
-	if (internal->did < 0)
+	if (internal->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
 		goto error;
+	}
 
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (3 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:12               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
                               ` (5 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang, stable

If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index aacd5f9bf..6fcd50b73 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
-	pthread_mutex_lock(&internal_list_lock);
-	TAILQ_INSERT_TAIL(&internal_list, list, next);
-	pthread_mutex_unlock(&internal_list_lock);
-
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
@@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		goto error;
 	}
 
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (4 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:17               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode Xiao Wang
                               ` (4 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.

So if driver doesn't not find this option, it should quit and let the
bus continue the probe.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/Makefile     |  1 +
 drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile
index 39b36ae5d..7755a87eb 100644
--- a/drivers/net/ifc/Makefile
+++ b/drivers/net/ifc/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_ifc.a
 
 LDLIBS += -lpthread
 LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
+LDLIBS += -lrte_kvargs
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6fcd50b73..c0e50354a 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -17,6 +17,8 @@
 #include <rte_vfio.h>
 #include <rte_spinlock.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "base/ifcvf.h"
 
@@ -28,6 +30,13 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_VDPA_MODE		"vdpa"
+
+static const char * const ifcvf_valid_arguments[] = {
+	IFCVF_VDPA_MODE,
+	NULL
+};
+
 static int ifcvf_vdpa_logtype;
 
 struct ifcvf_internal {
@@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = {
 	.get_notify_area = ifcvf_get_notify_area,
 };
 
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
 static int
 ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		struct rte_pci_device *pci_dev)
@@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	uint64_t features;
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
+	int vdpa_mode = 0;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			ifcvf_valid_arguments);
+	if (kvlist == NULL)
+		return 1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
+	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
 	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
 	if (list == NULL)
 		goto error;
@@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
+	rte_kvargs_free(kvlist);
 	return 0;
 
 error:
+	rte_kvargs_free(kvlist);
 	rte_free(list);
 	rte_free(internal);
 	return -1;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (5 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:21               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging Xiao Wang
                               ` (3 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "swlm=1", driver/device will do live migration with a relay thread
dealing with dirty page logging. Without this parameter, device will do
dirty page logging and there's no relay thread consuming CPU resource.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..395c5112f 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>
 
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif
 
 #define IFCVF_VDPA_MODE		"vdpa"
+#define IFCVF_SW_FALLBACK_LM	"swlm"
 
 static const char * const ifcvf_valid_arguments[] = {
 	IFCVF_VDPA_MODE,
+	IFCVF_SW_FALLBACK_LM,
 	NULL
 };
 
@@ -56,6 +59,7 @@ struct ifcvf_internal {
 	rte_atomic32_t dev_attached;
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
+	bool sw_lm;
 };
 
 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
 	int vdpa_mode = 0;
+	int sw_fallback_lm = 0;
 	struct rte_kvargs *kvlist = NULL;
 	int ret = 0;
 
@@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
+	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+				&open_int, &sw_fallback_lm);
+		if (ret < 0)
+			goto error;
+	}
+	internal->sw_lm = sw_fallback_lm;
+
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (6 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:24               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
                               ` (2 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 395c5112f..f181c5a6e 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -31,6 +31,9 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
 #define IFCVF_VDPA_MODE		"vdpa"
 #define IFCVF_SW_FALLBACK_LM	"swlm"
 
@@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal)
 	return ifcvf_start_hw(&internal->hw);
 }
 
-static void
-ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf)
-{
-	uint32_t i, size;
-	uint64_t pfn;
-
-	pfn = hw->vring[queue].used / PAGE_SIZE;
-	size = hw->vring[queue].size * sizeof(struct vring_used_elem) +
-			sizeof(uint16_t) * 3;
-
-	for (i = 0; i <= size / PAGE_SIZE; i++)
-		__sync_fetch_and_or_8(&log_buf[(pfn + i) / 8],
-				1 << ((pfn + i) % 8));
-}
-
 static void
 vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 {
@@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 	int vid;
 	uint64_t features;
 	uint64_t log_base, log_size;
-	uint8_t *log_buf;
+	uint64_t len;
 
 	vid = internal->vid;
 	ifcvf_stop_hw(hw);
@@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		 * IFCVF marks dirty memory pages for only packet buffer,
 		 * SW helps to mark the used ring as dirty after device stops.
 		 */
-		log_buf = (uint8_t *)(uintptr_t)log_base;
-		for (i = 0; i < hw->nr_vring; i++)
-			ifcvf_used_ring_log(hw, i, log_buf);
+		for (i = 0; i < hw->nr_vring; i++) {
+			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
+			rte_vhost_log_used_vring(vid, i, 0, len);
+		}
 	}
 }
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (7 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:35               ` Maxime Coquelin
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document Xiao Wang
  2018-12-18 14:01             ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Maxime Coquelin
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In SW assisted live migration mode, driver will stop the device and
setup a mediate virtio ring to relay the communication between the
virtio driver and the VDPA device.

This data path intervention will allow SW to help on guest dirty page
logging for live migration.

This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordinly.

User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/base/ifcvf.h |   1 +
 drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
index c15c69107..e8a30d2c6 100644
--- a/drivers/net/ifc/base/ifcvf.h
+++ b/drivers/net/ifc/base/ifcvf.h
@@ -50,6 +50,7 @@
 #define IFCVF_LM_ENABLE_VF		0x1
 #define IFCVF_LM_ENABLE_PF		0x3
 #define IFCVF_LOG_BASE			0x100000000000
+#define IFCVF_MEDIATE_VRING		0x200000000000
 
 #define IFCVF_32_BIT_MASK		0xffffffff
 
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index f181c5a6e..61757d0b4 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -63,6 +63,9 @@ struct ifcvf_internal {
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
 	bool sw_lm;
+	bool sw_fallback_running;
+	/* mediated vring for sw fallback */
+	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
 };
 
 struct internal_list {
@@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
 				hw->vring[i].last_used_idx);
 
+	if (internal->sw_lm)
+		return;
+
 	rte_vhost_get_negotiated_features(vid, &features);
 	if (RTE_VHOST_NEED_LOG(features)) {
 		ifcvf_disable_logging(hw);
@@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
 	return ret;
 }
 
+static int
+m_ifcvf_start(struct ifcvf_internal *internal)
+{
+	struct ifcvf_hw *hw = &internal->hw;
+	uint32_t i, nr_vring;
+	int vid, ret;
+	struct rte_vhost_vring vq;
+	void *vring_buf;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size;
+	uint64_t gpa;
+
+	vid = internal->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+	rte_vhost_get_negotiated_features(vid, &hw->req_features);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
+		vring_init(&internal->m_vring[i], vq.size, vring_buf,
+				PAGE_SIZE);
+
+		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
+		if (ret < 0) {
+			DRV_LOG(ERR, "mediate vring DMA map failed.");
+			goto error;
+		}
+
+		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
+		if (gpa == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			return -1;
+		}
+		hw->vring[i].desc = gpa;
+
+		hw->vring[i].avail = m_vring_iova +
+			(char *)internal->m_vring[i].avail -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].used = m_vring_iova +
+			(char *)internal->m_vring[i].used -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].size = vq.size;
+
+		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
+				&hw->vring[i].last_used_idx);
+
+		m_vring_iova += size;
+	}
+	hw->nr_vring = nr_vring;
+
+	return ifcvf_start_hw(&internal->hw);
+
+error:
+	for (i = 0; i < nr_vring; i++)
+		if (internal->m_vring[i].desc)
+			rte_free(internal->m_vring[i].desc);
+
+	return -1;
+}
+
+static int
+m_ifcvf_stop(struct ifcvf_internal *internal)
+{
+	int vid;
+	uint32_t i;
+	struct rte_vhost_vring vq;
+	struct ifcvf_hw *hw = &internal->hw;
+	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
+	uint64_t size, len;
+
+	vid = internal->vid;
+	ifcvf_stop_hw(hw);
+
+	for (i = 0; i < hw->nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+		len = IFCVF_USED_RING_LEN(vq.size);
+		rte_vhost_log_used_vring(vid, i, 0, len);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
+			m_vring_iova, size);
+
+		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
+				hw->vring[i].last_used_idx);
+		rte_free(internal->m_vring[i].desc);
+		m_vring_iova += size;
+	}
+
+	return 0;
+}
+
+static int
+m_enable_vfio_intr(struct ifcvf_internal *internal)
+{
+	uint32_t nr_vring;
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+	int ret;
+
+	nr_vring = rte_vhost_get_vring_num(internal->vid);
+
+	ret = rte_intr_efd_enable(intr_handle, nr_vring);
+	if (ret)
+		return -1;
+
+	ret = rte_intr_enable(intr_handle);
+	if (ret)
+		return -1;
+
+	return 0;
+}
+
+static void
+m_disable_vfio_intr(struct ifcvf_internal *internal)
+{
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+
+	rte_intr_efd_disable(intr_handle);
+	rte_intr_disable(intr_handle);
+}
+
+static void
+update_avail_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_avail(internal->vid, qid, &internal->m_vring[qid]);
+	ifcvf_notify_queue(&internal->hw, qid);
+}
+
+static void
+update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_used(internal->vid, qid, &internal->m_vring[qid]);
+	rte_vhost_vring_call(internal->vid, qid);
+}
+
+static void *
+vring_relay(void *arg)
+{
+	int i, vid, epfd, fd, nfds;
+	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
+	struct rte_vhost_vring vring;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid, q_num;
+	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
+	struct epoll_event ev;
+	int nbytes;
+	uint64_t buf;
+
+	vid = internal->vid;
+	q_num = rte_vhost_get_vring_num(vid);
+	/* prepare the mediate vring */
+	for (qid = 0; qid < q_num; qid++) {
+		rte_vhost_get_vring_base(vid, qid,
+				&internal->m_vring[qid].avail->idx,
+				&internal->m_vring[qid].used->idx);
+		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
+	}
+
+	/* add notify fd and interrupt fd to epoll */
+	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
+	if (epfd < 0) {
+		DRV_LOG(ERR, "failed to create epoll instance.");
+		return NULL;
+	}
+	internal->epfd = epfd;
+
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		rte_vhost_get_vhost_vring(vid, qid, &vring);
+		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	intr_handle = &internal->pdev->intr_handle;
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		ev.data.u64 = 1 | qid << 1 |
+			(uint64_t)intr_handle->efds[qid] << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
+				< 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	/* start relay with a first kick */
+	for (qid = 0; qid < q_num; qid++)
+		ifcvf_notify_queue(&internal->hw, qid);
+
+	/* listen to the events and react accordingly */
+	for (;;) {
+		nfds = epoll_wait(epfd, events, q_num * 2, -1);
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			DRV_LOG(ERR, "epoll_wait return fail\n");
+			return NULL;
+		}
+
+		for (i = 0; i < nfds; i++) {
+			fd = (uint32_t)(events[i].data.u64 >> 32);
+			do {
+				nbytes = read(fd, &buf, 8);
+				if (nbytes < 0) {
+					if (errno == EINTR ||
+					    errno == EWOULDBLOCK ||
+					    errno == EAGAIN)
+						continue;
+					DRV_LOG(INFO, "Error reading "
+						"kickfd: %s",
+						strerror(errno));
+				}
+				break;
+			} while (1);
+
+			qid = events[i].data.u32 >> 1;
+
+			if (events[i].data.u32 & 1)
+				update_used_ring(internal, qid);
+			else
+				update_avail_ring(internal, qid);
+		}
+	}
+
+	return NULL;
+}
+
+static int
+setup_vring_relay(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->tid, NULL, vring_relay,
+			(void *)internal);
+	if (ret) {
+		DRV_LOG(ERR, "failed to create ring relay pthread.");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+unset_vring_relay(struct ifcvf_internal *internal)
+{
+	void *status;
+
+	if (internal->tid) {
+		pthread_cancel(internal->tid);
+		pthread_join(internal->tid, &status);
+	}
+	internal->tid = 0;
+
+	if (internal->epfd >= 0)
+		close(internal->epfd);
+	internal->epfd = -1;
+
+	return 0;
+}
+
+static int
+ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	/* stop the direct IO data path */
+	unset_notify_relay(internal);
+	vdpa_ifcvf_stop(internal);
+	vdpa_disable_vfio_intr(internal);
+
+	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
+	if (ret && ret != -ENOTSUP)
+		goto error;
+
+	/* set up interrupt for interrupt relay */
+	ret = m_enable_vfio_intr(internal);
+	if (ret)
+		goto unmap;
+
+	/* config the VF */
+	ret = m_ifcvf_start(internal);
+	if (ret)
+		goto unset_intr;
+
+	/* set up vring relay thread */
+	ret = setup_vring_relay(internal);
+	if (ret)
+		goto stop_vf;
+
+	internal->sw_fallback_running = true;
+
+	return 0;
+
+stop_vf:
+	m_ifcvf_stop(internal);
+unset_intr:
+	m_disable_vfio_intr(internal);
+unmap:
+	ifcvf_dma_map(internal, 0);
+error:
+	return -1;
+}
+
 static int
 ifcvf_dev_config(int vid)
 {
@@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
 	}
 
 	internal = list->internal;
-	rte_atomic32_set(&internal->dev_attached, 0);
-	update_datapath(internal);
+
+	if (internal->sw_fallback_running) {
+		/* unset ring relay */
+		unset_vring_relay(internal);
+
+		/* reset VF */
+		m_ifcvf_stop(internal);
+
+		/* remove interrupt setting */
+		m_disable_vfio_intr(internal);
+
+		/* unset DMA map for guest memory */
+		ifcvf_dma_map(internal, 0);
+
+		internal->sw_fallback_running = false;
+	} else {
+		rte_atomic32_set(&internal->dev_attached, 0);
+		update_datapath(internal);
+	}
 
 	return 0;
 }
@@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
 	internal = list->internal;
 	rte_vhost_get_negotiated_features(vid, &features);
 
-	if (RTE_VHOST_NEED_LOG(features)) {
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+
+	if (internal->sw_lm) {
+		ifcvf_sw_fallback_switchover(internal);
+	} else {
 		rte_vhost_get_log_base(vid, &log_base, &log_size);
 		rte_vfio_container_dma_map(internal->vfio_container_fd,
 				log_base, IFCVF_LOG_BASE, log_size);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (8 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-14 21:16             ` Xiao Wang
  2018-12-16  9:36               ` Maxime Coquelin
  2018-12-18 14:01             ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Maxime Coquelin
  10 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-14 21:16 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Add the SW assisted VDPA live migration feature into NIC doc.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 doc/guides/nics/ifc.rst                | 8 ++++++++
 doc/guides/rel_notes/release_19_02.rst | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
index 48f9adf1d..eb55d329a 100644
--- a/doc/guides/nics/ifc.rst
+++ b/doc/guides/nics/ifc.rst
@@ -39,6 +39,13 @@ the driver probe a new container is created for this device, with this
 container vDPA driver can program DMA remapping table with the VM's memory
 region information.
 
+The device argument "swlm=1" will configure the driver into SW assisted live
+migration mode. In this mode, the driver will set up a SW relay thread when LM
+happens, this thread will help device to log dirty pages. Thus this mode does
+not require HW to implement a dirty page logging function block, but will
+consume some percentage of CPU resource depending on the network throughput.
+If no "swlm=1" specified, driver will rely on device's logging capability.
+
 Key IFCVF vDPA driver ops
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -70,6 +77,7 @@ Features
 Features of the IFCVF driver are:
 
 - Compatibility with virtio 0.95 and 1.0.
+- SW assisted vDPA live migration.
 
 
 Prerequisites
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index e86ef9511..ced6af8f0 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -60,6 +60,12 @@ New Features
   * Added the handler to get firmware version string.
   * Added support for multicast filtering.
 
+* **Added support for SW-assisted VDPA live migration.**
+
+  This SW-assisted VDPA live migration facility helps VDPA devices without
+  logging capability to perform live migration, a mediate SW relay can help
+  devices to track dirty pages caused by DMA. IFC driver has enabled this
+  SW-assisted live migration mode.
 
 Removed Items
 -------------
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API Xiao Wang
@ 2018-12-16  8:58               ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  8:58 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> vhost_detach_vdpa_device() is internally defined but not used, remove
> it in this patch.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   lib/librte_vhost/vhost.c | 13 -------------
>   lib/librte_vhost/vhost.h |  1 -
>   2 files changed, 14 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-16  9:00               ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:00 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> VDPA driver can decide if it needs to enable/disable the host notifier
> mapping, so exposing a API can allow flexibility. A later patch will
> base on this.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
>   lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
>   lib/librte_vhost/rte_vhost_version.map |  1 +
>   lib/librte_vhost/vhost_user.c          |  7 +------
>   4 files changed, 23 insertions(+), 6 deletions(-)


Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-16  9:10               ` Maxime Coquelin
  2018-12-17  8:51                 ` Wang, Xiao W
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
  1 sibling, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:10 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> This patch provides two helpers for vdpa device driver to perform a
> relay between the guest virtio ring and a mediate virtio ring.

s/mediate/mediated/ ?
I'm not 100% sure, but if it is mediated, please change everywhere else
in the patch.

> 
> The available ring relay will synchronize the available entries, and
> helps to do desc validity checking.

s/helps/help/

> 
> The used ring relay will synchronize the used entries from mediate ring
> to guest ring, and helps to do dirty page logging for live migration.

s/helps/help/

> 
> The next patch will leverage these two helpers.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   lib/librte_vhost/rte_vdpa.h            |  39 +++++++
>   lib/librte_vhost/rte_vhost_version.map |   2 +
>   lib/librte_vhost/vdpa.c                | 194 +++++++++++++++++++++++++++++++++
>   lib/librte_vhost/vhost.h               |  40 +++++++
>   lib/librte_vhost/virtio_net.c          |  39 -------
>   5 files changed, 275 insertions(+), 39 deletions(-)
> 


Appart from that:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error Xiao Wang
@ 2018-12-16  9:11               ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:11 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> Driver probe may fail for different causes, debug message is helpful for
> debugging issue.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
>   1 file changed, 13 insertions(+), 6 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance Xiao Wang
@ 2018-12-16  9:12               ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:12 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, stable



On 12/14/18 10:16 PM, Xiao Wang wrote:
> If driver fails to register ifc VF device into vhost lib, then this
> device should not be stored.
> 
> Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
> cc: stable@dpdk.org
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-12-16  9:17               ` Maxime Coquelin
  2018-12-17  8:54                 ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:17 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> If user wants the VF to be used in VDPA (vhost data path acceleration)
> mode, then the user can add a "vdpa=1" parameter for the device.
> 
> So if driver doesn't not find this option, it should quit and let the

s/doesn't not/does not/

> bus continue the probe.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/Makefile     |  1 +
>   drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 48 insertions(+)
> 

Should this option be documented somewhere?

Apart from that:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-16  9:21               ` Maxime Coquelin
  2018-12-17  9:00                 ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:21 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> This patch series enables a new method for live migration, i.e. software
> assisted live migration. This patch provides a device argument for user
> to choose the methold.
> 
> When "swlm=1", driver/device will do live migration with a relay thread
> dealing with dirty page logging. Without this parameter, device will do
> dirty page logging and there's no relay thread consuming CPU resource.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> index c0e50354a..395c5112f 100644
> --- a/drivers/net/ifc/ifcvf_vdpa.c
> +++ b/drivers/net/ifc/ifcvf_vdpa.c
> @@ -8,6 +8,7 @@
>   #include <sys/ioctl.h>
>   #include <sys/epoll.h>
>   #include <linux/virtio_net.h>
> +#include <stdbool.h>
>   
>   #include <rte_malloc.h>
>   #include <rte_memory.h>
> @@ -31,9 +32,11 @@
>   #endif
>   
>   #define IFCVF_VDPA_MODE		"vdpa"
> +#define IFCVF_SW_FALLBACK_LM	"swlm"


The patch looks good, except that I don't like the "swlm" name.
Maybe we could have something less obscure, even if a little bt longer?

What about "sw-live-migration"?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-12-16  9:24               ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:24 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> Vhost lib has already provided a helper for used ring logging, driver
> could use it to reduce code.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
>   1 file changed, 8 insertions(+), 19 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-16  9:35               ` Maxime Coquelin
  2018-12-17  9:12                 ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:35 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> In SW assisted live migration mode, driver will stop the device and
> setup a mediate virtio ring to relay the communication between the
> virtio driver and the VDPA device.
> 
> This data path intervention will allow SW to help on guest dirty page
> logging for live migration.
> 
> This SW fallback is event driven relay thread, so when the network
> throughput is low, this SW fallback will take little CPU resource, but
> when the throughput goes up, the relay thread's CPU usage will goes up
> accordinly.

s/accordinly/accordingly/

> 
> User needs to take all the factors including CPU usage, guest perf
> degradation, etc. into consideration when selecting the live migration
> support mode.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/base/ifcvf.h |   1 +
>   drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 344 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
> index c15c69107..e8a30d2c6 100644
> --- a/drivers/net/ifc/base/ifcvf.h
> +++ b/drivers/net/ifc/base/ifcvf.h
> @@ -50,6 +50,7 @@
>   #define IFCVF_LM_ENABLE_VF		0x1
>   #define IFCVF_LM_ENABLE_PF		0x3
>   #define IFCVF_LOG_BASE			0x100000000000
> +#define IFCVF_MEDIATE_VRING		0x200000000000

MEDIATED?

>   
>   #define IFCVF_32_BIT_MASK		0xffffffff
>   
> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> index f181c5a6e..61757d0b4 100644
> --- a/drivers/net/ifc/ifcvf_vdpa.c
> +++ b/drivers/net/ifc/ifcvf_vdpa.c
> @@ -63,6 +63,9 @@ struct ifcvf_internal {
>   	rte_atomic32_t running;
>   	rte_spinlock_t lock;
>   	bool sw_lm;
> +	bool sw_fallback_running;
> +	/* mediated vring for sw fallback */
> +	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
>   };
>   
>   struct internal_list {
> @@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
>   		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
>   				hw->vring[i].last_used_idx);
>   
> +	if (internal->sw_lm)
> +		return;
> +
>   	rte_vhost_get_negotiated_features(vid, &features);
>   	if (RTE_VHOST_NEED_LOG(features)) {
>   		ifcvf_disable_logging(hw);
> @@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
>   	return ret;
>   }
>   
> +static int
> +m_ifcvf_start(struct ifcvf_internal *internal)
> +{
> +	struct ifcvf_hw *hw = &internal->hw;
> +	uint32_t i, nr_vring;
> +	int vid, ret;
> +	struct rte_vhost_vring vq;
> +	void *vring_buf;
> +	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
> +	uint64_t size;
> +	uint64_t gpa;
> +
> +	vid = internal->vid;
> +	nr_vring = rte_vhost_get_vring_num(vid);
> +	rte_vhost_get_negotiated_features(vid, &hw->req_features);
> +
> +	for (i = 0; i < nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(vid, i, &vq);
> +
> +		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> +				PAGE_SIZE);
> +		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
> +		vring_init(&internal->m_vring[i], vq.size, vring_buf,
> +				PAGE_SIZE);
> +
> +		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
> +			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
> +		if (ret < 0) {
> +			DRV_LOG(ERR, "mediate vring DMA map failed.");
> +			goto error;
> +		}
> +
> +		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
> +		if (gpa == 0) {
> +			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
> +			return -1;
> +		}
> +		hw->vring[i].desc = gpa;
> +
> +		hw->vring[i].avail = m_vring_iova +
> +			(char *)internal->m_vring[i].avail -
> +			(char *)internal->m_vring[i].desc;
> +
> +		hw->vring[i].used = m_vring_iova +
> +			(char *)internal->m_vring[i].used -
> +			(char *)internal->m_vring[i].desc;
> +
> +		hw->vring[i].size = vq.size;
> +
> +		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
> +				&hw->vring[i].last_used_idx);
> +
> +		m_vring_iova += size;
> +	}
> +	hw->nr_vring = nr_vring;
> +
> +	return ifcvf_start_hw(&internal->hw);
> +
> +error:
> +	for (i = 0; i < nr_vring; i++)
> +		if (internal->m_vring[i].desc)
> +			rte_free(internal->m_vring[i].desc);
> +
> +	return -1;
> +}
> +
> +static int
> +m_ifcvf_stop(struct ifcvf_internal *internal)
> +{
> +	int vid;
> +	uint32_t i;
> +	struct rte_vhost_vring vq;
> +	struct ifcvf_hw *hw = &internal->hw;
> +	uint64_t m_vring_iova = IFCVF_MEDIATE_VRING;
> +	uint64_t size, len;
> +
> +	vid = internal->vid;
> +	ifcvf_stop_hw(hw);
> +
> +	for (i = 0; i < hw->nr_vring; i++) {
> +		rte_vhost_get_vhost_vring(vid, i, &vq);
> +		len = IFCVF_USED_RING_LEN(vq.size);
> +		rte_vhost_log_used_vring(vid, i, 0, len);
> +
> +		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
> +				PAGE_SIZE);
> +		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
> +			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
> +			m_vring_iova, size);
> +
> +		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
> +				hw->vring[i].last_used_idx);
> +		rte_free(internal->m_vring[i].desc);
> +		m_vring_iova += size;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +m_enable_vfio_intr(struct ifcvf_internal *internal)
> +{
> +	uint32_t nr_vring;
> +	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
> +	int ret;
> +
> +	nr_vring = rte_vhost_get_vring_num(internal->vid);
> +
> +	ret = rte_intr_efd_enable(intr_handle, nr_vring);
> +	if (ret)
> +		return -1;
> +
> +	ret = rte_intr_enable(intr_handle);
> +	if (ret)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static void
> +m_disable_vfio_intr(struct ifcvf_internal *internal)
> +{
> +	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
> +
> +	rte_intr_efd_disable(intr_handle);
> +	rte_intr_disable(intr_handle);
> +}
> +
> +static void
> +update_avail_ring(struct ifcvf_internal *internal, uint16_t qid)
> +{
> +	rte_vdpa_relay_vring_avail(internal->vid, qid, &internal->m_vring[qid]);
> +	ifcvf_notify_queue(&internal->hw, qid);
> +}
> +
> +static void
> +update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
> +{
> +	rte_vdpa_relay_vring_used(internal->vid, qid, &internal->m_vring[qid]);
> +	rte_vhost_vring_call(internal->vid, qid);
> +}
> +
> +static void *
> +vring_relay(void *arg)
> +{
> +	int i, vid, epfd, fd, nfds;
> +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> +	struct rte_vhost_vring vring;
> +	struct rte_intr_handle *intr_handle;
> +	uint16_t qid, q_num;
> +	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
> +	struct epoll_event ev;
> +	int nbytes;
> +	uint64_t buf;
> +
> +	vid = internal->vid;
> +	q_num = rte_vhost_get_vring_num(vid);
> +	/* prepare the mediate vring */
> +	for (qid = 0; qid < q_num; qid++) {
> +		rte_vhost_get_vring_base(vid, qid,
> +				&internal->m_vring[qid].avail->idx,
> +				&internal->m_vring[qid].used->idx);
> +		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
> +	}
> +
> +	/* add notify fd and interrupt fd to epoll */
> +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> +	if (epfd < 0) {
> +		DRV_LOG(ERR, "failed to create epoll instance.");
> +		return NULL;
> +	}
> +	internal->epfd = epfd;
> +
> +	for (qid = 0; qid < q_num; qid++) {
> +		ev.events = EPOLLIN | EPOLLPRI;
> +		rte_vhost_get_vhost_vring(vid, qid, &vring);
> +		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
> +			return NULL;
> +		}
> +	}
> +
> +	intr_handle = &internal->pdev->intr_handle;
> +	for (qid = 0; qid < q_num; qid++) {
> +		ev.events = EPOLLIN | EPOLLPRI;
> +		ev.data.u64 = 1 | qid << 1 |
> +			(uint64_t)intr_handle->efds[qid] << 32;
> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
> +				< 0) {
> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
> +			return NULL;
> +		}
> +	}
> +
> +	/* start relay with a first kick */
> +	for (qid = 0; qid < q_num; qid++)
> +		ifcvf_notify_queue(&internal->hw, qid);
> +
> +	/* listen to the events and react accordingly */
> +	for (;;) {
> +		nfds = epoll_wait(epfd, events, q_num * 2, -1);
> +		if (nfds < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			DRV_LOG(ERR, "epoll_wait return fail\n");
> +			return NULL;
> +		}
> +
> +		for (i = 0; i < nfds; i++) {
> +			fd = (uint32_t)(events[i].data.u64 >> 32);
> +			do {
> +				nbytes = read(fd, &buf, 8);
> +				if (nbytes < 0) {
> +					if (errno == EINTR ||
> +					    errno == EWOULDBLOCK ||
> +					    errno == EAGAIN)
> +						continue;
> +					DRV_LOG(INFO, "Error reading "
> +						"kickfd: %s",
> +						strerror(errno));
> +				}
> +				break;
> +			} while (1);
> +
> +			qid = events[i].data.u32 >> 1;
> +
> +			if (events[i].data.u32 & 1)
> +				update_used_ring(internal, qid);
> +			else
> +				update_avail_ring(internal, qid);
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +setup_vring_relay(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +
> +	ret = pthread_create(&internal->tid, NULL, vring_relay,
> +			(void *)internal);

So it will be scheduled without any affinity?
Shouldn't it use a pmd thread instead?

> +	if (ret) {
> +		DRV_LOG(ERR, "failed to create ring relay pthread.");
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +static int
> +unset_vring_relay(struct ifcvf_internal *internal)
> +{
> +	void *status;
> +
> +	if (internal->tid) {
> +		pthread_cancel(internal->tid);
> +		pthread_join(internal->tid, &status);
> +	}
> +	internal->tid = 0;
> +
> +	if (internal->epfd >= 0)
> +		close(internal->epfd);
> +	internal->epfd = -1;
> +
> +	return 0;
> +}
> +
> +static int
> +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
> +{
> +	int ret;
> +
> +	/* stop the direct IO data path */
> +	unset_notify_relay(internal);
> +	vdpa_ifcvf_stop(internal);
> +	vdpa_disable_vfio_intr(internal);
> +
> +	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
> +	if (ret && ret != -ENOTSUP)
> +		goto error;
> +
> +	/* set up interrupt for interrupt relay */
> +	ret = m_enable_vfio_intr(internal);
> +	if (ret)
> +		goto unmap;
> +
> +	/* config the VF */
> +	ret = m_ifcvf_start(internal);
> +	if (ret)
> +		goto unset_intr;
> +
> +	/* set up vring relay thread */
> +	ret = setup_vring_relay(internal);
> +	if (ret)
> +		goto stop_vf;
> +
> +	internal->sw_fallback_running = true;
> +
> +	return 0;
> +
> +stop_vf:
> +	m_ifcvf_stop(internal);
> +unset_intr:
> +	m_disable_vfio_intr(internal);
> +unmap:
> +	ifcvf_dma_map(internal, 0);
> +error:
> +	return -1;
> +}
> +
>   static int
>   ifcvf_dev_config(int vid)
>   {
> @@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
>   	}
>   
>   	internal = list->internal;
> -	rte_atomic32_set(&internal->dev_attached, 0);
> -	update_datapath(internal);
> +
> +	if (internal->sw_fallback_running) {
> +		/* unset ring relay */
> +		unset_vring_relay(internal);
> +
> +		/* reset VF */
> +		m_ifcvf_stop(internal);
> +
> +		/* remove interrupt setting */
> +		m_disable_vfio_intr(internal);
> +
> +		/* unset DMA map for guest memory */
> +		ifcvf_dma_map(internal, 0);
> +
> +		internal->sw_fallback_running = false;
> +	} else {
> +		rte_atomic32_set(&internal->dev_attached, 0);
> +		update_datapath(internal);
> +	}
>   
>   	return 0;
>   }
> @@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
>   	internal = list->internal;
>   	rte_vhost_get_negotiated_features(vid, &features);
>   
> -	if (RTE_VHOST_NEED_LOG(features)) {
> +	if (!RTE_VHOST_NEED_LOG(features))
> +		return 0;
> +
> +	if (internal->sw_lm) {
> +		ifcvf_sw_fallback_switchover(internal);
> +	} else {
>   		rte_vhost_get_log_base(vid, &log_base, &log_size);
>   		rte_vfio_container_dma_map(internal->vfio_container_fd,
>   				log_base, IFCVF_LOG_BASE, log_size);
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document Xiao Wang
@ 2018-12-16  9:36               ` Maxime Coquelin
  2018-12-17  9:15                 ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-16  9:36 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> Add the SW assisted VDPA live migration feature into NIC doc.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   doc/guides/nics/ifc.rst                | 8 ++++++++
>   doc/guides/rel_notes/release_19_02.rst | 6 ++++++
>   2 files changed, 14 insertions(+)
> 
> diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
> index 48f9adf1d..eb55d329a 100644
> --- a/doc/guides/nics/ifc.rst
> +++ b/doc/guides/nics/ifc.rst
> @@ -39,6 +39,13 @@ the driver probe a new container is created for this device, with this
>   container vDPA driver can program DMA remapping table with the VM's memory
>   region information.
>   
> +The device argument "swlm=1" will configure the driver into SW assisted live
> +migration mode. In this mode, the driver will set up a SW relay thread when LM
> +happens, this thread will help device to log dirty pages. Thus this mode does
> +not require HW to implement a dirty page logging function block, but will
> +consume some percentage of CPU resource depending on the network throughput.
> +If no "swlm=1" specified, driver will rely on device's logging capability.
> +

Ok, so that's documented here.
What about documenting vdpa option too?

>   Key IFCVF vDPA driver ops
>   ~~~~~~~~~~~~~~~~~~~~~~~~~
>   
> @@ -70,6 +77,7 @@ Features
>   Features of the IFCVF driver are:
>   
>   - Compatibility with virtio 0.95 and 1.0.
> +- SW assisted vDPA live migration.
>   
>   
>   Prerequisites
> diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
> index e86ef9511..ced6af8f0 100644
> --- a/doc/guides/rel_notes/release_19_02.rst
> +++ b/doc/guides/rel_notes/release_19_02.rst
> @@ -60,6 +60,12 @@ New Features
>     * Added the handler to get firmware version string.
>     * Added support for multicast filtering.
>   
> +* **Added support for SW-assisted VDPA live migration.**
> +
> +  This SW-assisted VDPA live migration facility helps VDPA devices without
> +  logging capability to perform live migration, a mediate SW relay can help
> +  devices to track dirty pages caused by DMA. IFC driver has enabled this
> +  SW-assisted live migration mode.
>   
>   Removed Items
>   -------------
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-16  9:10               ` Maxime Coquelin
@ 2018-12-17  8:51                 ` Wang, Xiao W
  2018-12-17 11:02                   ` Maxime Coquelin
  0 siblings, 1 reply; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17  8:51 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Sunday, December 16, 2018 1:11 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> 
> 
> 
> On 12/14/18 10:16 PM, Xiao Wang wrote:
> > This patch provides two helpers for vdpa device driver to perform a
> > relay between the guest virtio ring and a mediate virtio ring.
> 
> s/mediate/mediated/ ?
> I'm not 100% sure, but if it is mediated, please change everywhere else
> in the patch.

"mediate" can also be used as an adjective, so "mediate" is OK here.

> 
> >
> > The available ring relay will synchronize the available entries, and
> > helps to do desc validity checking.
> 
> s/helps/help/

Yes, will update.

> 
> >
> > The used ring relay will synchronize the used entries from mediate ring
> > to guest ring, and helps to do dirty page logging for live migration.
> 
> s/helps/help/

Will update.

Thanks for the comments,
Xiao

> 
> >
> > The next patch will leverage these two helpers.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> >   lib/librte_vhost/rte_vdpa.h            |  39 +++++++
> >   lib/librte_vhost/rte_vhost_version.map |   2 +
> >   lib/librte_vhost/vdpa.c                | 194
> +++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/vhost.h               |  40 +++++++
> >   lib/librte_vhost/virtio_net.c          |  39 -------
> >   5 files changed, 275 insertions(+), 39 deletions(-)
> >
> 
> 
> Appart from that:
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified
  2018-12-16  9:17               ` Maxime Coquelin
@ 2018-12-17  8:54                 ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17  8:54 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Sunday, December 16, 2018 1:17 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified
> 
> 
> 
> On 12/14/18 10:16 PM, Xiao Wang wrote:
> > If user wants the VF to be used in VDPA (vhost data path acceleration)
> > mode, then the user can add a "vdpa=1" parameter for the device.
> >
> > So if driver doesn't not find this option, it should quit and let the
> 
> s/doesn't not/does not/

Yes, I will fix the typo.

> 
> > bus continue the probe.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> >   drivers/net/ifc/Makefile     |  1 +
> >   drivers/net/ifc/ifcvf_vdpa.c | 47
> ++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 48 insertions(+)
> >
> 
> Should this option be documented somewhere?

Will add a section for this in the last doc patch.

Thanks,
Xiao

> 
> Apart from that:
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode
  2018-12-16  9:21               ` Maxime Coquelin
@ 2018-12-17  9:00                 ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17  9:00 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Sunday, December 16, 2018 1:21 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 07/10] net/ifc: add devarg for LM mode
> 
> 
> 
> On 12/14/18 10:16 PM, Xiao Wang wrote:
> > This patch series enables a new method for live migration, i.e. software
> > assisted live migration. This patch provides a device argument for user
> > to choose the methold.
> >
> > When "swlm=1", driver/device will do live migration with a relay thread
> > dealing with dirty page logging. Without this parameter, device will do
> > dirty page logging and there's no relay thread consuming CPU resource.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> >   drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
> >   1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> > index c0e50354a..395c5112f 100644
> > --- a/drivers/net/ifc/ifcvf_vdpa.c
> > +++ b/drivers/net/ifc/ifcvf_vdpa.c
> > @@ -8,6 +8,7 @@
> >   #include <sys/ioctl.h>
> >   #include <sys/epoll.h>
> >   #include <linux/virtio_net.h>
> > +#include <stdbool.h>
> >
> >   #include <rte_malloc.h>
> >   #include <rte_memory.h>
> > @@ -31,9 +32,11 @@
> >   #endif
> >
> >   #define IFCVF_VDPA_MODE		"vdpa"
> > +#define IFCVF_SW_FALLBACK_LM	"swlm"
> 
> 
> The patch looks good, except that I don't like the "swlm" name.
> Maybe we could have something less obscure, even if a little bt longer?
> 
> What about "sw-live-migration"?

Agree with you, making it clear is more reader-friendly than a short name.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-16  9:35               ` Maxime Coquelin
@ 2018-12-17  9:12                 ` Wang, Xiao W
  2018-12-17 11:08                   ` Maxime Coquelin
  0 siblings, 1 reply; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17  9:12 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Sunday, December 16, 2018 1:35 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
> 
> 
> 
> On 12/14/18 10:16 PM, Xiao Wang wrote:
> > In SW assisted live migration mode, driver will stop the device and
> > setup a mediate virtio ring to relay the communication between the
> > virtio driver and the VDPA device.
> >
> > This data path intervention will allow SW to help on guest dirty page
> > logging for live migration.
> >
> > This SW fallback is event driven relay thread, so when the network
> > throughput is low, this SW fallback will take little CPU resource, but
> > when the throughput goes up, the relay thread's CPU usage will goes up
> > accordinly.
> 
> s/accordinly/accordingly/
> 

Will fix it in next version.

> >
> > User needs to take all the factors including CPU usage, guest perf
> > degradation, etc. into consideration when selecting the live migration
> > support mode.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> >   drivers/net/ifc/base/ifcvf.h |   1 +
> >   drivers/net/ifc/ifcvf_vdpa.c | 346
> ++++++++++++++++++++++++++++++++++++++++++-
> >   2 files changed, 344 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
> > index c15c69107..e8a30d2c6 100644
> > --- a/drivers/net/ifc/base/ifcvf.h
> > +++ b/drivers/net/ifc/base/ifcvf.h
> > @@ -50,6 +50,7 @@
> >   #define IFCVF_LM_ENABLE_VF		0x1
> >   #define IFCVF_LM_ENABLE_PF		0x3
> >   #define IFCVF_LOG_BASE			0x100000000000
> > +#define IFCVF_MEDIATE_VRING		0x200000000000
> 
> MEDIATED?

"mediate" is used as adjective here.

> 
> >
> >   #define IFCVF_32_BIT_MASK		0xffffffff
> >
> > diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
> > index f181c5a6e..61757d0b4 100644
> > --- a/drivers/net/ifc/ifcvf_vdpa.c
> > +++ b/drivers/net/ifc/ifcvf_vdpa.c
> > @@ -63,6 +63,9 @@ struct ifcvf_internal {
> >   	rte_atomic32_t running;
> >   	rte_spinlock_t lock;
> >   	bool sw_lm;

[...]

> > +static void *
> > +vring_relay(void *arg)
> > +{
> > +	int i, vid, epfd, fd, nfds;
> > +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
> > +	struct rte_vhost_vring vring;
> > +	struct rte_intr_handle *intr_handle;
> > +	uint16_t qid, q_num;
> > +	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
> > +	struct epoll_event ev;
> > +	int nbytes;
> > +	uint64_t buf;
> > +
> > +	vid = internal->vid;
> > +	q_num = rte_vhost_get_vring_num(vid);
> > +	/* prepare the mediate vring */
> > +	for (qid = 0; qid < q_num; qid++) {
> > +		rte_vhost_get_vring_base(vid, qid,
> > +				&internal->m_vring[qid].avail->idx,
> > +				&internal->m_vring[qid].used->idx);
> > +		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
> > +	}
> > +
> > +	/* add notify fd and interrupt fd to epoll */
> > +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
> > +	if (epfd < 0) {
> > +		DRV_LOG(ERR, "failed to create epoll instance.");
> > +		return NULL;
> > +	}
> > +	internal->epfd = epfd;
> > +
> > +	for (qid = 0; qid < q_num; qid++) {
> > +		ev.events = EPOLLIN | EPOLLPRI;
> > +		rte_vhost_get_vhost_vring(vid, qid, &vring);
> > +		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
> > +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
> > +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
> > +			return NULL;
> > +		}
> > +	}
> > +
> > +	intr_handle = &internal->pdev->intr_handle;
> > +	for (qid = 0; qid < q_num; qid++) {
> > +		ev.events = EPOLLIN | EPOLLPRI;
> > +		ev.data.u64 = 1 | qid << 1 |
> > +			(uint64_t)intr_handle->efds[qid] << 32;
> > +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid],
> &ev)
> > +				< 0) {
> > +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
> > +			return NULL;
> > +		}
> > +	}
> > +
> > +	/* start relay with a first kick */
> > +	for (qid = 0; qid < q_num; qid++)
> > +		ifcvf_notify_queue(&internal->hw, qid);
> > +
> > +	/* listen to the events and react accordingly */
> > +	for (;;) {
> > +		nfds = epoll_wait(epfd, events, q_num * 2, -1);
> > +		if (nfds < 0) {
> > +			if (errno == EINTR)
> > +				continue;
> > +			DRV_LOG(ERR, "epoll_wait return fail\n");
> > +			return NULL;
> > +		}
> > +
> > +		for (i = 0; i < nfds; i++) {
> > +			fd = (uint32_t)(events[i].data.u64 >> 32);
> > +			do {
> > +				nbytes = read(fd, &buf, 8);
> > +				if (nbytes < 0) {
> > +					if (errno == EINTR ||
> > +					    errno == EWOULDBLOCK ||
> > +					    errno == EAGAIN)
> > +						continue;
> > +					DRV_LOG(INFO, "Error reading "
> > +						"kickfd: %s",
> > +						strerror(errno));
> > +				}
> > +				break;
> > +			} while (1);
> > +
> > +			qid = events[i].data.u32 >> 1;
> > +
> > +			if (events[i].data.u32 & 1)
> > +				update_used_ring(internal, qid);
> > +			else
> > +				update_avail_ring(internal, qid);
> > +		}
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +static int
> > +setup_vring_relay(struct ifcvf_internal *internal)
> > +{
> > +	int ret;
> > +
> > +	ret = pthread_create(&internal->tid, NULL, vring_relay,
> > +			(void *)internal);
> 
> So it will be scheduled without any affinity?
> Shouldn't it use a pmd thread instead?

The new thread will inherit the thread affinity from its parent thread. As you know, vdpa is trying to
 minimize CPU usage for virtio HW acceleration, and we assign just one core to vdpa daemon
 (doc/guides/sample_app_ug/vdpa.rst), so there's no dedicated pmd worker core.

Thanks,
Xiao

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document
  2018-12-16  9:36               ` Maxime Coquelin
@ 2018-12-17  9:15                 ` Wang, Xiao W
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17  9:15 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Sunday, December 16, 2018 1:36 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 10/10] doc: update ifc NIC document
> 
> 
> 
> On 12/14/18 10:16 PM, Xiao Wang wrote:
> > Add the SW assisted VDPA live migration feature into NIC doc.
> >
> > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > ---
> >   doc/guides/nics/ifc.rst                | 8 ++++++++
> >   doc/guides/rel_notes/release_19_02.rst | 6 ++++++
> >   2 files changed, 14 insertions(+)
> >
> > diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
> > index 48f9adf1d..eb55d329a 100644
> > --- a/doc/guides/nics/ifc.rst
> > +++ b/doc/guides/nics/ifc.rst
> > @@ -39,6 +39,13 @@ the driver probe a new container is created for this
> device, with this
> >   container vDPA driver can program DMA remapping table with the VM's
> memory
> >   region information.
> >
> > +The device argument "swlm=1" will configure the driver into SW assisted
> live
> > +migration mode. In this mode, the driver will set up a SW relay thread when
> LM
> > +happens, this thread will help device to log dirty pages. Thus this mode
> does
> > +not require HW to implement a dirty page logging function block, but will
> > +consume some percentage of CPU resource depending on the network
> throughput.
> > +If no "swlm=1" specified, driver will rely on device's logging capability.
> > +
> 
> Ok, so that's documented here.
> What about documenting vdpa option too?

Yes, will explain all the devargs in this doc.

Thanks,
Xiao

> 
> >   Key IFCVF vDPA driver ops
> >   ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > @@ -70,6 +77,7 @@ Features
> >   Features of the IFCVF driver are:
> >
> >   - Compatibility with virtio 0.95 and 1.0.
> > +- SW assisted vDPA live migration.
> >
> >
> >   Prerequisites
> > diff --git a/doc/guides/rel_notes/release_19_02.rst
> b/doc/guides/rel_notes/release_19_02.rst
> > index e86ef9511..ced6af8f0 100644
> > --- a/doc/guides/rel_notes/release_19_02.rst
> > +++ b/doc/guides/rel_notes/release_19_02.rst
> > @@ -60,6 +60,12 @@ New Features
> >     * Added the handler to get firmware version string.
> >     * Added support for multicast filtering.
> >
> > +* **Added support for SW-assisted VDPA live migration.**
> > +
> > +  This SW-assisted VDPA live migration facility helps VDPA devices without
> > +  logging capability to perform live migration, a mediate SW relay can help
> > +  devices to track dirty pages caused by DMA. IFC driver has enabled this
> > +  SW-assisted live migration mode.
> >
> >   Removed Items
> >   -------------
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-17  8:51                 ` Wang, Xiao W
@ 2018-12-17 11:02                   ` Maxime Coquelin
  2018-12-17 14:41                     ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-17 11:02 UTC (permalink / raw)
  To: Wang, Xiao W, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Hi Xiao,

On 12/17/18 9:51 AM, Wang, Xiao W wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Sunday, December 16, 2018 1:11 AM
>> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
>> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
>> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
>> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
>>
>>
>>
>> On 12/14/18 10:16 PM, Xiao Wang wrote:
>>> This patch provides two helpers for vdpa device driver to perform a
>>> relay between the guest virtio ring and a mediate virtio ring.
>>
>> s/mediate/mediated/ ?
>> I'm not 100% sure, but if it is mediated, please change everywhere else
>> in the patch.
> 
> "mediate" can also be used as an adjective, so "mediate" is OK here.

I got the confirmation from a native speaker that mediate sounds wrong
in this context, and mediated should be used.

>>
>>>
>>> The available ring relay will synchronize the available entries, and
>>> helps to do desc validity checking.
>>
>> s/helps/help/
> 
> Yes, will update.
> 
>>
>>>
>>> The used ring relay will synchronize the used entries from mediate ring
>>> to guest ring, and helps to do dirty page logging for live migration.
>>
>> s/helps/help/
> 
> Will update.
> 
> Thanks for the comments,
> Xiao
> 
>>
>>>
>>> The next patch will leverage these two helpers.
>>>
>>> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
>>> ---
>>>    lib/librte_vhost/rte_vdpa.h            |  39 +++++++
>>>    lib/librte_vhost/rte_vhost_version.map |   2 +
>>>    lib/librte_vhost/vdpa.c                | 194
>> +++++++++++++++++++++++++++++++++
>>>    lib/librte_vhost/vhost.h               |  40 +++++++
>>>    lib/librte_vhost/virtio_net.c          |  39 -------
>>>    5 files changed, 275 insertions(+), 39 deletions(-)
>>>
>>
>>
>> Appart from that:
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>
>> Thanks,
>> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-17  9:12                 ` Wang, Xiao W
@ 2018-12-17 11:08                   ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-17 11:08 UTC (permalink / raw)
  To: Wang, Xiao W, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong



On 12/17/18 10:12 AM, Wang, Xiao W wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Sunday, December 16, 2018 1:35 AM
>> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
>> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
>> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
>> Subject: Re: [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration
>>
>>
>>
>> On 12/14/18 10:16 PM, Xiao Wang wrote:
>>> In SW assisted live migration mode, driver will stop the device and
>>> setup a mediate virtio ring to relay the communication between the
>>> virtio driver and the VDPA device.
>>>
>>> This data path intervention will allow SW to help on guest dirty page
>>> logging for live migration.
>>>
>>> This SW fallback is event driven relay thread, so when the network
>>> throughput is low, this SW fallback will take little CPU resource, but
>>> when the throughput goes up, the relay thread's CPU usage will goes up
>>> accordinly.
>>
>> s/accordinly/accordingly/
>>
> 
> Will fix it in next version.
> 
>>>
>>> User needs to take all the factors including CPU usage, guest perf
>>> degradation, etc. into consideration when selecting the live migration
>>> support mode.
>>>
>>> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
>>> ---
>>>    drivers/net/ifc/base/ifcvf.h |   1 +
>>>    drivers/net/ifc/ifcvf_vdpa.c | 346
>> ++++++++++++++++++++++++++++++++++++++++++-
>>>    2 files changed, 344 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
>>> index c15c69107..e8a30d2c6 100644
>>> --- a/drivers/net/ifc/base/ifcvf.h
>>> +++ b/drivers/net/ifc/base/ifcvf.h
>>> @@ -50,6 +50,7 @@
>>>    #define IFCVF_LM_ENABLE_VF		0x1
>>>    #define IFCVF_LM_ENABLE_PF		0x3
>>>    #define IFCVF_LOG_BASE			0x100000000000
>>> +#define IFCVF_MEDIATE_VRING		0x200000000000
>>
>> MEDIATED?
> 
> "mediate" is used as adjective here.
> 
>>
>>>
>>>    #define IFCVF_32_BIT_MASK		0xffffffff
>>>
>>> diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
>>> index f181c5a6e..61757d0b4 100644
>>> --- a/drivers/net/ifc/ifcvf_vdpa.c
>>> +++ b/drivers/net/ifc/ifcvf_vdpa.c
>>> @@ -63,6 +63,9 @@ struct ifcvf_internal {
>>>    	rte_atomic32_t running;
>>>    	rte_spinlock_t lock;
>>>    	bool sw_lm;
> 
> [...]
> 
>>> +static void *
>>> +vring_relay(void *arg)
>>> +{
>>> +	int i, vid, epfd, fd, nfds;
>>> +	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
>>> +	struct rte_vhost_vring vring;
>>> +	struct rte_intr_handle *intr_handle;
>>> +	uint16_t qid, q_num;
>>> +	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
>>> +	struct epoll_event ev;
>>> +	int nbytes;
>>> +	uint64_t buf;
>>> +
>>> +	vid = internal->vid;
>>> +	q_num = rte_vhost_get_vring_num(vid);
>>> +	/* prepare the mediate vring */
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		rte_vhost_get_vring_base(vid, qid,
>>> +				&internal->m_vring[qid].avail->idx,
>>> +				&internal->m_vring[qid].used->idx);
>>> +		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
>>> +	}
>>> +
>>> +	/* add notify fd and interrupt fd to epoll */
>>> +	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
>>> +	if (epfd < 0) {
>>> +		DRV_LOG(ERR, "failed to create epoll instance.");
>>> +		return NULL;
>>> +	}
>>> +	internal->epfd = epfd;
>>> +
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		ev.events = EPOLLIN | EPOLLPRI;
>>> +		rte_vhost_get_vhost_vring(vid, qid, &vring);
>>> +		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
>>> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
>>> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
>>> +			return NULL;
>>> +		}
>>> +	}
>>> +
>>> +	intr_handle = &internal->pdev->intr_handle;
>>> +	for (qid = 0; qid < q_num; qid++) {
>>> +		ev.events = EPOLLIN | EPOLLPRI;
>>> +		ev.data.u64 = 1 | qid << 1 |
>>> +			(uint64_t)intr_handle->efds[qid] << 32;
>>> +		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid],
>> &ev)
>>> +				< 0) {
>>> +			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
>>> +			return NULL;
>>> +		}
>>> +	}
>>> +
>>> +	/* start relay with a first kick */
>>> +	for (qid = 0; qid < q_num; qid++)
>>> +		ifcvf_notify_queue(&internal->hw, qid);
>>> +
>>> +	/* listen to the events and react accordingly */
>>> +	for (;;) {
>>> +		nfds = epoll_wait(epfd, events, q_num * 2, -1);
>>> +		if (nfds < 0) {
>>> +			if (errno == EINTR)
>>> +				continue;
>>> +			DRV_LOG(ERR, "epoll_wait return fail\n");
>>> +			return NULL;
>>> +		}
>>> +
>>> +		for (i = 0; i < nfds; i++) {
>>> +			fd = (uint32_t)(events[i].data.u64 >> 32);
>>> +			do {
>>> +				nbytes = read(fd, &buf, 8);
>>> +				if (nbytes < 0) {
>>> +					if (errno == EINTR ||
>>> +					    errno == EWOULDBLOCK ||
>>> +					    errno == EAGAIN)
>>> +						continue;
>>> +					DRV_LOG(INFO, "Error reading "
>>> +						"kickfd: %s",
>>> +						strerror(errno));
>>> +				}
>>> +				break;
>>> +			} while (1);
>>> +
>>> +			qid = events[i].data.u32 >> 1;
>>> +
>>> +			if (events[i].data.u32 & 1)
>>> +				update_used_ring(internal, qid);
>>> +			else
>>> +				update_avail_ring(internal, qid);
>>> +		}
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static int
>>> +setup_vring_relay(struct ifcvf_internal *internal)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = pthread_create(&internal->tid, NULL, vring_relay,
>>> +			(void *)internal);
>>
>> So it will be scheduled without any affinity?
>> Shouldn't it use a pmd thread instead?
> 
> The new thread will inherit the thread affinity from its parent thread. As you know, vdpa is trying to
>   minimize CPU usage for virtio HW acceleration, and we assign just one core to vdpa daemon
>   (doc/guides/sample_app_ug/vdpa.rst), so there's no dedicated pmd worker core.

OK, thanks for the clarification!

> Thanks,
> Xiao
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-17 11:02                   ` Maxime Coquelin
@ 2018-12-17 14:41                     ` Wang, Xiao W
  2018-12-17 19:00                       ` Maxime Coquelin
  0 siblings, 1 reply; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-17 14:41 UTC (permalink / raw)
  To: Maxime Coquelin, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong

Thanks for the confirmation.

BRs,
Xiao

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Monday, December 17, 2018 7:03 PM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> 
> Hi Xiao,
> 
> On 12/17/18 9:51 AM, Wang, Xiao W wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Sunday, December 16, 2018 1:11 AM
> >> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> >> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> >> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> >> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> >>
> >>
> >>
> >> On 12/14/18 10:16 PM, Xiao Wang wrote:
> >>> This patch provides two helpers for vdpa device driver to perform a
> >>> relay between the guest virtio ring and a mediate virtio ring.
> >>
> >> s/mediate/mediated/ ?
> >> I'm not 100% sure, but if it is mediated, please change everywhere else
> >> in the patch.
> >
> > "mediate" can also be used as an adjective, so "mediate" is OK here.
> 
> I got the confirmation from a native speaker that mediate sounds wrong
> in this context, and mediated should be used.
> 
> >>
> >>>
> >>> The available ring relay will synchronize the available entries, and
> >>> helps to do desc validity checking.
> >>
> >> s/helps/help/
> >
> > Yes, will update.
> >
> >>
> >>>
> >>> The used ring relay will synchronize the used entries from mediate ring
> >>> to guest ring, and helps to do dirty page logging for live migration.
> >>
> >> s/helps/help/
> >
> > Will update.
> >
> > Thanks for the comments,
> > Xiao
> >
> >>
> >>>
> >>> The next patch will leverage these two helpers.
> >>>
> >>> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> >>> ---
> >>>    lib/librte_vhost/rte_vdpa.h            |  39 +++++++
> >>>    lib/librte_vhost/rte_vhost_version.map |   2 +
> >>>    lib/librte_vhost/vdpa.c                | 194
> >> +++++++++++++++++++++++++++++++++
> >>>    lib/librte_vhost/vhost.h               |  40 +++++++
> >>>    lib/librte_vhost/virtio_net.c          |  39 -------
> >>>    5 files changed, 275 insertions(+), 39 deletions(-)
> >>>
> >>
> >>
> >> Appart from that:
> >> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>
> >> Thanks,
> >> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-17 14:41                     ` Wang, Xiao W
@ 2018-12-17 19:00                       ` Maxime Coquelin
  2018-12-18  8:27                         ` Wang, Xiao W
  0 siblings, 1 reply; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-17 19:00 UTC (permalink / raw)
  To: Wang, Xiao W, Bie, Tiwei
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong



On 12/17/18 3:41 PM, Wang, Xiao W wrote:
> Thanks for the confirmation.

Please note that CI reports a checkpatch issue:
http://patches.dpdk.org/patch/48935/

Thanks,
Maxime

> BRs,
> Xiao
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Monday, December 17, 2018 7:03 PM
>> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
>> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
>> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
>> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
>>
>> Hi Xiao,
>>
>> On 12/17/18 9:51 AM, Wang, Xiao W wrote:
>>> Hi Maxime,
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Sunday, December 16, 2018 1:11 AM
>>>> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>
>>>> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
>>>> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
>>>> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
>>>>
>>>>
>>>>
>>>> On 12/14/18 10:16 PM, Xiao Wang wrote:
>>>>> This patch provides two helpers for vdpa device driver to perform a
>>>>> relay between the guest virtio ring and a mediate virtio ring.
>>>>
>>>> s/mediate/mediated/ ?
>>>> I'm not 100% sure, but if it is mediated, please change everywhere else
>>>> in the patch.
>>>
>>> "mediate" can also be used as an adjective, so "mediate" is OK here.
>>
>> I got the confirmation from a native speaker that mediate sounds wrong
>> in this context, and mediated should be used.
>>
>>>>
>>>>>
>>>>> The available ring relay will synchronize the available entries, and
>>>>> helps to do desc validity checking.
>>>>
>>>> s/helps/help/
>>>
>>> Yes, will update.
>>>
>>>>
>>>>>
>>>>> The used ring relay will synchronize the used entries from mediate ring
>>>>> to guest ring, and helps to do dirty page logging for live migration.
>>>>
>>>> s/helps/help/
>>>
>>> Will update.
>>>
>>> Thanks for the comments,
>>> Xiao
>>>
>>>>
>>>>>
>>>>> The next patch will leverage these two helpers.
>>>>>
>>>>> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
>>>>> ---
>>>>>     lib/librte_vhost/rte_vdpa.h            |  39 +++++++
>>>>>     lib/librte_vhost/rte_vhost_version.map |   2 +
>>>>>     lib/librte_vhost/vdpa.c                | 194
>>>> +++++++++++++++++++++++++++++++++
>>>>>     lib/librte_vhost/vhost.h               |  40 +++++++
>>>>>     lib/librte_vhost/virtio_net.c          |  39 -------
>>>>>     5 files changed, 275 insertions(+), 39 deletions(-)
>>>>>
>>>>
>>>>
>>>> Appart from that:
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>
>>>> Thanks,
>>>> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
  2018-12-16  9:10               ` Maxime Coquelin
@ 2018-12-18  8:01               ` Xiao Wang
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 01/10] vhost: remove unused internal API Xiao Wang
                                   ` (9 more replies)
  1 sibling, 10 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:01 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In the previous VDPA implementation we have enabled live migration support
by HW accelerator doing all the stuff, including dirty page logging and
device status report/restore. In this mode VDPA sample daemon and device
driver just takes care of the control path and does not involve in data
path, so there's almost 0 CPU resource usage. This mode requires device
to have dirty page logging capability.

This patch series adds live migration support for devices without logging
capability. VDPA driver could set up a relay thread standing between the
guest and device when live migration happens, this relay intervenes into
the communication between guest virtio driver and physical virtio
accelerator, it helps device to do a vring relay and passingly log dirty
pages. Thus some CPU resource will be consumed in this scenario, percentage
depending on the network throughput.

Some new helpers are added into vhost lib for this VDPA SW fallback:
- rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
  datapath.
- rte_vdpa_relay_vring_avail, to relay the available request from guest vring
  to mediate vring.
- rte_vdpa_relay_vring_used, to relay the used response from mediate vring to
  guest vring.

Some existing helpers are also leveraged for SW fallback setup, like VFIO
interrupt configuration, IOMMU table programming, etc.

This patch enables this SW assisted VDPA live migration in ifc driver.
Since ifcvf also supports HW dirty page logging, we add a new devarg
for user to select if the SW mode is used or not.

v5:
* Change the devargs parameter from "swlm" to "sw-live-migration" to avoid
  obscure.
* Change "mediate" to "mediated".
* Fix some commit message.
* Add description on "vdpa" parameter in ifc doc.
* Add __rte_experimental modifier for new API in vdpa.c, not only in rte_vdpa.h

v4:
* Add a patch to remove the unused vhost internal API: vhost_detach_vdpa_device().

v3:
* Fix indent in relay code.
* Fix the iova access mode issue of buffer check.
* Rename the relay API to be more generic, and add more API note for used
  ring handling.
* Add kvargs lib dependency in ifc driver.
* Add commit message for the doc update patch for checkpatch warning.

v2:
* Reword the vdpa host notifier control API comment.
* Make the vring relay API parameter as "void *" to accomodate the future
  potential new ring layout, e.g. packed ring.
* Add parameter check for the new API.
* Add memory barrier for ring idx update.
* Remove the used ring logging in the relay.
* Some comment fix and code cleaning according to Tiwei's comment.
* Add release note update.

Xiao Wang (10):
  vhost: remove unused internal API
  vhost: provide helper for host notifier ctrl
  vhost: provide helpers for virtio ring relay
  net/ifc: dump debug message for error
  net/ifc: store only registered device instance
  net/ifc: detect if VDPA mode is specified
  net/ifc: add devarg for LM mode
  net/ifc: use lib API for used ring logging
  net/ifc: support SW assisted VDPA live migration
  doc: update ifc NIC document

 doc/guides/nics/ifc.rst                |  12 +-
 doc/guides/rel_notes/release_19_02.rst |   6 +
 drivers/net/ifc/Makefile               |   1 +
 drivers/net/ifc/base/ifcvf.h           |   1 +
 drivers/net/ifc/ifcvf_vdpa.c           | 461 ++++++++++++++++++++++++++++++---
 lib/librte_vhost/rte_vdpa.h            |  57 ++++
 lib/librte_vhost/rte_vhost_version.map |   3 +
 lib/librte_vhost/vdpa.c                | 194 ++++++++++++++
 lib/librte_vhost/vhost.c               |  13 -
 lib/librte_vhost/vhost.h               |  41 ++-
 lib/librte_vhost/vhost_user.c          |   7 +-
 lib/librte_vhost/virtio_net.c          |  39 ---
 12 files changed, 744 insertions(+), 91 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 01/10] vhost: remove unused internal API
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
@ 2018-12-18  8:01                 ` Xiao Wang
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
                                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:01 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

vhost_detach_vdpa_device() is internally defined but not used, remove
it in this patch.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 13 -------------
 lib/librte_vhost/vhost.h |  1 -
 2 files changed, 14 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70ac6bc9c..b32babee4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -400,19 +400,6 @@ vhost_attach_vdpa_device(int vid, int did)
 	dev->vdpa_dev_id = did;
 }
 
-void
-vhost_detach_vdpa_device(int vid)
-{
-	struct virtio_net *dev = get_device(vid);
-
-	if (dev == NULL)
-		return;
-
-	vhost_user_host_notifier_ctrl(vid, false);
-
-	dev->vdpa_dev_id = -1;
-}
-
 void
 vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 552b9298d..d5bab4803 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -629,7 +629,6 @@ void free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq);
 int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
 
 void vhost_attach_vdpa_device(int vid, int did);
-void vhost_detach_vdpa_device(int vid);
 
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
 void vhost_enable_dequeue_zero_copy(int vid);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 01/10] vhost: remove unused internal API Xiao Wang
@ 2018-12-18  8:01                 ` Xiao Wang
  2018-12-18 15:37                   ` Ferruh Yigit
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
                                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:01 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

VDPA driver can decide if it needs to enable/disable the host notifier
mapping, so exposing a API can allow flexibility. A later patch will
base on this.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/ifcvf_vdpa.c           |  3 +++
 lib/librte_vhost/rte_vdpa.h            | 18 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/vhost_user.c          |  7 +------
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 97a57f182..e844109f3 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -556,6 +556,9 @@ ifcvf_dev_config(int vid)
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
+	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
+
 	return 0;
 }
 
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index a418da47c..fff657391 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -11,6 +11,8 @@
  * Device specific vhost lib
  */
 
+#include <stdbool.h>
+
 #include <rte_pci.h>
 #include "rte_vhost.h"
 
@@ -155,4 +157,20 @@ rte_vdpa_get_device(int did);
  */
 int __rte_experimental
 rte_vdpa_get_device_num(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable/Disable host notifier mapping for a vdpa port.
+ *
+ * @param vid
+ *  vhost device id
+ * @enable
+ *  true for host notifier map, false for host notifier unmap
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_host_notifier_ctrl(int vid, bool enable);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index ae39b6e21..22302e972 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -83,4 +83,5 @@ EXPERIMENTAL {
 	rte_vhost_crypto_finalize_requests;
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
+	rte_vhost_host_notifier_ctrl;
 };
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 557213491..8fec773d5 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2049,11 +2049,6 @@ vhost_user_msg_handler(int vid, int fd)
 		if (vdpa_dev->ops->dev_conf)
 			vdpa_dev->ops->dev_conf(dev->vid);
 		dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED;
-		if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) {
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"(%d) software relay is used for vDPA, performance may be low.\n",
-				dev->vid);
-		}
 	}
 
 	return 0;
@@ -2148,7 +2143,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int vhost_user_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 03/10] vhost: provide helpers for virtio ring relay
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 01/10] vhost: remove unused internal API Xiao Wang
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 04/10] net/ifc: dump debug message for error Xiao Wang
                                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch provides two helpers for vdpa device driver to perform a
relay between the guest virtio ring and a mediated virtio ring.

The available ring relay will synchronize the available entries, and
help to do desc validity checking.

The used ring relay will synchronize the used entries from mediated ring
to guest ring, and help to do dirty page logging for live migration.

The later patch will leverage these two helpers.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/rte_vdpa.h            |  39 +++++++
 lib/librte_vhost/rte_vhost_version.map |   2 +
 lib/librte_vhost/vdpa.c                | 194 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h               |  40 +++++++
 lib/librte_vhost/virtio_net.c          |  39 -------
 5 files changed, 275 insertions(+), 39 deletions(-)

diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index fff657391..462df6bf7 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -173,4 +173,43 @@ rte_vdpa_get_device_num(void);
  */
 int __rte_experimental
 rte_vhost_host_notifier_ctrl(int vid, bool enable);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the available ring from guest to mediated ring, help to
+ * check desc validity to protect against malicious guest driver.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediated virtio ring pointer
+ * @return
+ *  number of synced available entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Synchronize the used ring from mediated ring to guest, log dirty
+ * page for each writeable buffer, caller should handle the used
+ * ring logging before device stop.
+ *
+ * @param vid
+ *  vhost device id
+ * @param qid
+ *  vhost queue id
+ * @param vring_m
+ *  mediated virtio ring pointer
+ * @return
+ *  number of synced used entries on success, -1 on failure
+ */
+int __rte_experimental
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m);
 #endif /* _RTE_VDPA_H_ */
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 22302e972..dd3b4c1cb 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -84,4 +84,6 @@ EXPERIMENTAL {
 	rte_vhost_crypto_set_zero_copy;
 	rte_vhost_va_from_guest_pa;
 	rte_vhost_host_notifier_ctrl;
+	rte_vdpa_relay_vring_avail;
+	rte_vdpa_relay_vring_used;
 };
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index e7d849ee0..240a1fe3a 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -122,3 +122,197 @@ rte_vdpa_get_device_num(void)
 {
 	return vdpa_device_num;
 }
+
+static bool
+invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
+{
+	uint64_t desc_addr, desc_chunck_len;
+
+	while (desc_len) {
+		desc_chunck_len = desc_len;
+		desc_addr = vhost_iova_to_vva(dev, vq,
+				desc_iova,
+				&desc_chunck_len,
+				perm);
+
+		if (!desc_addr)
+			return true;
+
+		desc_len -= desc_chunck_len;
+		desc_iova += desc_chunck_len;
+	}
+
+	return false;
+}
+
+int __rte_experimental
+rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vring_desc desc;
+	struct vhost_virtqueue *vq;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+	uint8_t perm;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->avail->idx;
+	idx_m = s_vring->avail->idx;
+	ret = (uint16_t)(idx - idx_m);
+
+	while (idx_m != idx) {
+		/* avail entry copy */
+		desc_id = vq->avail->ring[idx_m & (vq->size - 1)];
+		s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* check if the buf addr is within the guest memory */
+		do {
+			desc = desc_ring[desc_id];
+			perm = desc.flags & VRING_DESC_F_WRITE ?
+				VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+			if (invalid_desc_check(dev, vq, desc.addr, desc.len,
+						perm)) {
+				if (unlikely(idesc))
+					free_ind_table(idesc);
+				return -1;
+			}
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx_m++;
+	}
+
+	rte_smp_wmb();
+	s_vring->avail->idx = idx;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vhost_avail_event(vq) = idx;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m)
+{
+	struct virtio_net *dev = get_device(vid);
+	uint16_t idx, idx_m, desc_id;
+	struct vhost_virtqueue *vq;
+	struct vring_desc desc;
+	struct vring_desc *desc_ring;
+	struct vring_desc *idesc = NULL;
+	struct vring *s_vring;
+	uint64_t dlen;
+	int ret;
+
+	if (!dev || !vring_m)
+		return -1;
+
+	if (qid >= dev->nr_vring)
+		return -1;
+
+	if (vq_is_packed(dev))
+		return -1;
+
+	s_vring = (struct vring *)vring_m;
+	vq = dev->virtqueue[qid];
+	idx = vq->used->idx;
+	idx_m = s_vring->used->idx;
+	ret = (uint16_t)(idx_m - idx);
+
+	while (idx != idx_m) {
+		/* copy used entry, used ring logging is not covered here */
+		vq->used->ring[idx & (vq->size - 1)] =
+			s_vring->used->ring[idx & (vq->size - 1)];
+
+		desc_id = vq->used->ring[idx & (vq->size - 1)].id;
+		desc_ring = vq->desc;
+
+		if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
+			dlen = vq->desc[desc_id].len;
+			desc_ring = (struct vring_desc *)(uintptr_t)
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_id].addr, &dlen,
+						VHOST_ACCESS_RO);
+			if (unlikely(!desc_ring))
+				return -1;
+
+			if (unlikely(dlen < vq->desc[idx].len)) {
+				idesc = alloc_copy_ind_table(dev, vq,
+						vq->desc[idx].addr,
+						vq->desc[idx].len);
+				if (unlikely(!idesc))
+					return -1;
+
+				desc_ring = idesc;
+			}
+
+			desc_id = 0;
+		}
+
+		/* dirty page logging for DMA writeable buffer */
+		do {
+			desc = desc_ring[desc_id];
+			if (desc.flags & VRING_DESC_F_WRITE)
+				vhost_log_write(dev, desc.addr, desc.len);
+			desc_id = desc.next;
+		} while (desc.flags & VRING_DESC_F_NEXT);
+
+		if (unlikely(idesc)) {
+			free_ind_table(idesc);
+			idesc = NULL;
+		}
+
+		idx++;
+	}
+
+	rte_smp_wmb();
+	vq->used->idx = idx_m;
+
+	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
+		vring_used_event(s_vring) = idx_m;
+
+	return ret;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d5bab4803..3b3265c4b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -18,6 +18,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
+#include <rte_malloc.h>
 
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
@@ -754,4 +755,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		eventfd_write(vq->callfd, (eventfd_t)1);
 }
 
+static __rte_always_inline void *
+alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t desc_addr, uint64_t desc_len)
+{
+	void *idesc;
+	uint64_t src, dst;
+	uint64_t len, remain = desc_len;
+
+	idesc = rte_malloc(__func__, desc_len, 0);
+	if (unlikely(!idesc))
+		return 0;
+
+	dst = (uint64_t)(uintptr_t)idesc;
+
+	while (remain) {
+		len = remain;
+		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
+				VHOST_ACCESS_RO);
+		if (unlikely(!src || !len)) {
+			rte_free(idesc);
+			return 0;
+		}
+
+		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
+
+		remain -= len;
+		dst += len;
+		desc_addr += len;
+	}
+
+	return idesc;
+}
+
+static __rte_always_inline void
+free_ind_table(void *idesc)
+{
+	rte_free(idesc);
+}
+
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..8c657a101 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring)
 	return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring;
 }
 
-static __rte_always_inline void *
-alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
-		uint64_t desc_addr, uint64_t desc_len)
-{
-	void *idesc;
-	uint64_t src, dst;
-	uint64_t len, remain = desc_len;
-
-	idesc = rte_malloc(__func__, desc_len, 0);
-	if (unlikely(!idesc))
-		return 0;
-
-	dst = (uint64_t)(uintptr_t)idesc;
-
-	while (remain) {
-		len = remain;
-		src = vhost_iova_to_vva(dev, vq, desc_addr, &len,
-				VHOST_ACCESS_RO);
-		if (unlikely(!src || !len)) {
-			rte_free(idesc);
-			return 0;
-		}
-
-		rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len);
-
-		remain -= len;
-		dst += len;
-		desc_addr += len;
-	}
-
-	return idesc;
-}
-
-static __rte_always_inline void
-free_ind_table(void *idesc)
-{
-	rte_free(idesc);
-}
-
 static __rte_always_inline void
 do_flush_shadow_used_ring_split(struct virtio_net *dev,
 			struct vhost_virtqueue *vq,
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 04/10] net/ifc: dump debug message for error
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (2 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 05/10] net/ifc: store only registered device instance Xiao Wang
                                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Driver probe may fail for different causes, debug message is helpful for
debugging issue.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index e844109f3..aacd5f9bf 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -22,7 +22,7 @@
 
 #define DRV_LOG(level, fmt, args...) \
 	rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \
-		"%s(): " fmt "\n", __func__, ##args)
+		"IFCVF %s(): " fmt "\n", __func__, ##args)
 
 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
@@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->pdev = pci_dev;
 	rte_spinlock_init(&internal->lock);
-	if (ifcvf_vfio_setup(internal) < 0)
-		return -1;
 
-	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0)
-		return -1;
+	if (ifcvf_vfio_setup(internal) < 0) {
+		DRV_LOG(ERR, "failed to setup device %s", pci_dev->name);
+		goto error;
+	}
+
+	if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) {
+		DRV_LOG(ERR, "failed to init device %s", pci_dev->name);
+		goto error;
+	}
 
 	internal->max_queues = IFCVF_MAX_QUEUES;
 	features = ifcvf_get_features(&internal->hw);
@@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
-	if (internal->did < 0)
+	if (internal->did < 0) {
+		DRV_LOG(ERR, "failed to register device %s", pci_dev->name);
 		goto error;
+	}
 
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 05/10] net/ifc: store only registered device instance
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (3 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 04/10] net/ifc: dump debug message for error Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
                                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang, stable

If driver fails to register ifc VF device into vhost lib, then this
device should not be stored.

Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver")
cc: stable@dpdk.org

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index aacd5f9bf..6fcd50b73 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
-	pthread_mutex_lock(&internal_list_lock);
-	TAILQ_INSERT_TAIL(&internal_list, list, next);
-	pthread_mutex_unlock(&internal_list_lock);
-
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
@@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		goto error;
 	}
 
+	pthread_mutex_lock(&internal_list_lock);
+	TAILQ_INSERT_TAIL(&internal_list, list, next);
+	pthread_mutex_unlock(&internal_list_lock);
+
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 06/10] net/ifc: detect if VDPA mode is specified
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (4 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 05/10] net/ifc: store only registered device instance Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode Xiao Wang
                                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

If user wants the VF to be used in VDPA (vhost data path acceleration)
mode, then the user can add a "vdpa=1" parameter for the device.

So if driver does not find this option, it should quit and let the bus
continue the probe.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/Makefile     |  1 +
 drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile
index 39b36ae5d..7755a87eb 100644
--- a/drivers/net/ifc/Makefile
+++ b/drivers/net/ifc/Makefile
@@ -10,6 +10,7 @@ LIB = librte_pmd_ifc.a
 
 LDLIBS += -lpthread
 LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci
+LDLIBS += -lrte_kvargs
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 6fcd50b73..c0e50354a 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -17,6 +17,8 @@
 #include <rte_vfio.h>
 #include <rte_spinlock.h>
 #include <rte_log.h>
+#include <rte_kvargs.h>
+#include <rte_devargs.h>
 
 #include "base/ifcvf.h"
 
@@ -28,6 +30,13 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_VDPA_MODE		"vdpa"
+
+static const char * const ifcvf_valid_arguments[] = {
+	IFCVF_VDPA_MODE,
+	NULL
+};
+
 static int ifcvf_vdpa_logtype;
 
 struct ifcvf_internal {
@@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = {
 	.get_notify_area = ifcvf_get_notify_area,
 };
 
+static inline int
+open_int(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	uint16_t *n = extra_args;
+
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*n = (uint16_t)strtoul(value, NULL, 0);
+	if (*n == USHRT_MAX && errno == ERANGE)
+		return -1;
+
+	return 0;
+}
+
 static int
 ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		struct rte_pci_device *pci_dev)
@@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	uint64_t features;
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
+	int vdpa_mode = 0;
+	struct rte_kvargs *kvlist = NULL;
+	int ret = 0;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
+	kvlist = rte_kvargs_parse(pci_dev->device.devargs->args,
+			ifcvf_valid_arguments);
+	if (kvlist == NULL)
+		return 1;
+
+	/* probe only when vdpa mode is specified */
+	if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
+	ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int,
+			&vdpa_mode);
+	if (ret < 0 || vdpa_mode == 0) {
+		rte_kvargs_free(kvlist);
+		return 1;
+	}
+
 	list = rte_zmalloc("ifcvf", sizeof(*list), 0);
 	if (list == NULL)
 		goto error;
@@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	rte_atomic32_set(&internal->started, 1);
 	update_datapath(internal);
 
+	rte_kvargs_free(kvlist);
 	return 0;
 
 error:
+	rte_kvargs_free(kvlist);
 	rte_free(list);
 	rte_free(internal);
 	return -1;
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (5 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18 11:23                   ` Maxime Coquelin
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 08/10] net/ifc: use lib API for used ring logging Xiao Wang
                                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

This patch series enables a new method for live migration, i.e. software
assisted live migration. This patch provides a device argument for user
to choose the methold.

When "sw-live-migration=1", driver/device will do live migration with a
relay thread dealing with dirty page logging. Without this parameter,
device will do dirty page logging and there's no relay thread consuming
CPU resource.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index c0e50354a..2f73d3c7c 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -8,6 +8,7 @@
 #include <sys/ioctl.h>
 #include <sys/epoll.h>
 #include <linux/virtio_net.h>
+#include <stdbool.h>
 
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -31,9 +32,11 @@
 #endif
 
 #define IFCVF_VDPA_MODE		"vdpa"
+#define IFCVF_SW_FALLBACK_LM	"sw-live-migration"
 
 static const char * const ifcvf_valid_arguments[] = {
 	IFCVF_VDPA_MODE,
+	IFCVF_SW_FALLBACK_LM,
 	NULL
 };
 
@@ -56,6 +59,7 @@ struct ifcvf_internal {
 	rte_atomic32_t dev_attached;
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
+	bool sw_lm;
 };
 
 struct internal_list {
@@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct ifcvf_internal *internal = NULL;
 	struct internal_list *list = NULL;
 	int vdpa_mode = 0;
+	int sw_fallback_lm = 0;
 	struct rte_kvargs *kvlist = NULL;
 	int ret = 0;
 
@@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	internal->dev_addr.type = PCI_ADDR;
 	list->internal = internal;
 
+	if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) {
+		ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM,
+				&open_int, &sw_fallback_lm);
+		if (ret < 0)
+			goto error;
+	}
+	internal->sw_lm = sw_fallback_lm;
+
 	internal->did = rte_vdpa_register_device(&internal->dev_addr,
 				&ifcvf_ops);
 	if (internal->did < 0) {
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 08/10] net/ifc: use lib API for used ring logging
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (6 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document Xiao Wang
  9 siblings, 0 replies; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Vhost lib has already provided a helper for used ring logging, driver
could use it to reduce code.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 2f73d3c7c..4edba3e5f 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -31,6 +31,9 @@
 #define PAGE_SIZE 4096
 #endif
 
+#define IFCVF_USED_RING_LEN(size) \
+	((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3)
+
 #define IFCVF_VDPA_MODE		"vdpa"
 #define IFCVF_SW_FALLBACK_LM	"sw-live-migration"
 
@@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal)
 	return ifcvf_start_hw(&internal->hw);
 }
 
-static void
-ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf)
-{
-	uint32_t i, size;
-	uint64_t pfn;
-
-	pfn = hw->vring[queue].used / PAGE_SIZE;
-	size = hw->vring[queue].size * sizeof(struct vring_used_elem) +
-			sizeof(uint16_t) * 3;
-
-	for (i = 0; i <= size / PAGE_SIZE; i++)
-		__sync_fetch_and_or_8(&log_buf[(pfn + i) / 8],
-				1 << ((pfn + i) % 8));
-}
-
 static void
 vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 {
@@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 	int vid;
 	uint64_t features;
 	uint64_t log_base, log_size;
-	uint8_t *log_buf;
+	uint64_t len;
 
 	vid = internal->vid;
 	ifcvf_stop_hw(hw);
@@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		 * IFCVF marks dirty memory pages for only packet buffer,
 		 * SW helps to mark the used ring as dirty after device stops.
 		 */
-		log_buf = (uint8_t *)(uintptr_t)log_base;
-		for (i = 0; i < hw->nr_vring; i++)
-			ifcvf_used_ring_log(hw, i, log_buf);
+		for (i = 0; i < hw->nr_vring; i++) {
+			len = IFCVF_USED_RING_LEN(hw->vring[i].size);
+			rte_vhost_log_used_vring(vid, i, 0, len);
+		}
 	}
 }
 
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (7 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 08/10] net/ifc: use lib API for used ring logging Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18 11:33                   ` Maxime Coquelin
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document Xiao Wang
  9 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

In SW assisted live migration mode, driver will stop the device and
setup a mediated virtio ring to relay the communication between the
virtio driver and the VDPA device.

This data path intervention will allow SW to help on guest dirty page
logging for live migration.

This SW fallback is event driven relay thread, so when the network
throughput is low, this SW fallback will take little CPU resource, but
when the throughput goes up, the relay thread's CPU usage will goes up
accordingly.

User needs to take all the factors including CPU usage, guest perf
degradation, etc. into consideration when selecting the live migration
support mode.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 drivers/net/ifc/base/ifcvf.h |   1 +
 drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 344 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h
index c15c69107..9be2770fe 100644
--- a/drivers/net/ifc/base/ifcvf.h
+++ b/drivers/net/ifc/base/ifcvf.h
@@ -50,6 +50,7 @@
 #define IFCVF_LM_ENABLE_VF		0x1
 #define IFCVF_LM_ENABLE_PF		0x3
 #define IFCVF_LOG_BASE			0x100000000000
+#define IFCVF_MEDIATED_VRING		0x200000000000
 
 #define IFCVF_32_BIT_MASK		0xffffffff
 
diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c
index 4edba3e5f..972033b86 100644
--- a/drivers/net/ifc/ifcvf_vdpa.c
+++ b/drivers/net/ifc/ifcvf_vdpa.c
@@ -63,6 +63,9 @@ struct ifcvf_internal {
 	rte_atomic32_t running;
 	rte_spinlock_t lock;
 	bool sw_lm;
+	bool sw_fallback_running;
+	/* mediated vring for sw fallback */
+	struct vring m_vring[IFCVF_MAX_QUEUES * 2];
 };
 
 struct internal_list {
@@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal)
 		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
 				hw->vring[i].last_used_idx);
 
+	if (internal->sw_lm)
+		return;
+
 	rte_vhost_get_negotiated_features(vid, &features);
 	if (RTE_VHOST_NEED_LOG(features)) {
 		ifcvf_disable_logging(hw);
@@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal)
 	return ret;
 }
 
+static int
+m_ifcvf_start(struct ifcvf_internal *internal)
+{
+	struct ifcvf_hw *hw = &internal->hw;
+	uint32_t i, nr_vring;
+	int vid, ret;
+	struct rte_vhost_vring vq;
+	void *vring_buf;
+	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
+	uint64_t size;
+	uint64_t gpa;
+
+	vid = internal->vid;
+	nr_vring = rte_vhost_get_vring_num(vid);
+	rte_vhost_get_negotiated_features(vid, &hw->req_features);
+
+	for (i = 0; i < nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE);
+		vring_init(&internal->m_vring[i], vq.size, vring_buf,
+				PAGE_SIZE);
+
+		ret = rte_vfio_container_dma_map(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)vring_buf, m_vring_iova, size);
+		if (ret < 0) {
+			DRV_LOG(ERR, "mediated vring DMA map failed.");
+			goto error;
+		}
+
+		gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc);
+		if (gpa == 0) {
+			DRV_LOG(ERR, "Fail to get GPA for descriptor ring.");
+			return -1;
+		}
+		hw->vring[i].desc = gpa;
+
+		hw->vring[i].avail = m_vring_iova +
+			(char *)internal->m_vring[i].avail -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].used = m_vring_iova +
+			(char *)internal->m_vring[i].used -
+			(char *)internal->m_vring[i].desc;
+
+		hw->vring[i].size = vq.size;
+
+		rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx,
+				&hw->vring[i].last_used_idx);
+
+		m_vring_iova += size;
+	}
+	hw->nr_vring = nr_vring;
+
+	return ifcvf_start_hw(&internal->hw);
+
+error:
+	for (i = 0; i < nr_vring; i++)
+		if (internal->m_vring[i].desc)
+			rte_free(internal->m_vring[i].desc);
+
+	return -1;
+}
+
+static int
+m_ifcvf_stop(struct ifcvf_internal *internal)
+{
+	int vid;
+	uint32_t i;
+	struct rte_vhost_vring vq;
+	struct ifcvf_hw *hw = &internal->hw;
+	uint64_t m_vring_iova = IFCVF_MEDIATED_VRING;
+	uint64_t size, len;
+
+	vid = internal->vid;
+	ifcvf_stop_hw(hw);
+
+	for (i = 0; i < hw->nr_vring; i++) {
+		rte_vhost_get_vhost_vring(vid, i, &vq);
+		len = IFCVF_USED_RING_LEN(vq.size);
+		rte_vhost_log_used_vring(vid, i, 0, len);
+
+		size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE),
+				PAGE_SIZE);
+		rte_vfio_container_dma_unmap(internal->vfio_container_fd,
+			(uint64_t)(uintptr_t)internal->m_vring[i].desc,
+			m_vring_iova, size);
+
+		rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx,
+				hw->vring[i].last_used_idx);
+		rte_free(internal->m_vring[i].desc);
+		m_vring_iova += size;
+	}
+
+	return 0;
+}
+
+static int
+m_enable_vfio_intr(struct ifcvf_internal *internal)
+{
+	uint32_t nr_vring;
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+	int ret;
+
+	nr_vring = rte_vhost_get_vring_num(internal->vid);
+
+	ret = rte_intr_efd_enable(intr_handle, nr_vring);
+	if (ret)
+		return -1;
+
+	ret = rte_intr_enable(intr_handle);
+	if (ret)
+		return -1;
+
+	return 0;
+}
+
+static void
+m_disable_vfio_intr(struct ifcvf_internal *internal)
+{
+	struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle;
+
+	rte_intr_efd_disable(intr_handle);
+	rte_intr_disable(intr_handle);
+}
+
+static void
+update_avail_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_avail(internal->vid, qid, &internal->m_vring[qid]);
+	ifcvf_notify_queue(&internal->hw, qid);
+}
+
+static void
+update_used_ring(struct ifcvf_internal *internal, uint16_t qid)
+{
+	rte_vdpa_relay_vring_used(internal->vid, qid, &internal->m_vring[qid]);
+	rte_vhost_vring_call(internal->vid, qid);
+}
+
+static void *
+vring_relay(void *arg)
+{
+	int i, vid, epfd, fd, nfds;
+	struct ifcvf_internal *internal = (struct ifcvf_internal *)arg;
+	struct rte_vhost_vring vring;
+	struct rte_intr_handle *intr_handle;
+	uint16_t qid, q_num;
+	struct epoll_event events[IFCVF_MAX_QUEUES * 4];
+	struct epoll_event ev;
+	int nbytes;
+	uint64_t buf;
+
+	vid = internal->vid;
+	q_num = rte_vhost_get_vring_num(vid);
+	/* prepare the mediated vring */
+	for (qid = 0; qid < q_num; qid++) {
+		rte_vhost_get_vring_base(vid, qid,
+				&internal->m_vring[qid].avail->idx,
+				&internal->m_vring[qid].used->idx);
+		rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]);
+	}
+
+	/* add notify fd and interrupt fd to epoll */
+	epfd = epoll_create(IFCVF_MAX_QUEUES * 2);
+	if (epfd < 0) {
+		DRV_LOG(ERR, "failed to create epoll instance.");
+		return NULL;
+	}
+	internal->epfd = epfd;
+
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		rte_vhost_get_vhost_vring(vid, qid, &vring);
+		ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	intr_handle = &internal->pdev->intr_handle;
+	for (qid = 0; qid < q_num; qid++) {
+		ev.events = EPOLLIN | EPOLLPRI;
+		ev.data.u64 = 1 | qid << 1 |
+			(uint64_t)intr_handle->efds[qid] << 32;
+		if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev)
+				< 0) {
+			DRV_LOG(ERR, "epoll add error: %s", strerror(errno));
+			return NULL;
+		}
+	}
+
+	/* start relay with a first kick */
+	for (qid = 0; qid < q_num; qid++)
+		ifcvf_notify_queue(&internal->hw, qid);
+
+	/* listen to the events and react accordingly */
+	for (;;) {
+		nfds = epoll_wait(epfd, events, q_num * 2, -1);
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			DRV_LOG(ERR, "epoll_wait return fail\n");
+			return NULL;
+		}
+
+		for (i = 0; i < nfds; i++) {
+			fd = (uint32_t)(events[i].data.u64 >> 32);
+			do {
+				nbytes = read(fd, &buf, 8);
+				if (nbytes < 0) {
+					if (errno == EINTR ||
+					    errno == EWOULDBLOCK ||
+					    errno == EAGAIN)
+						continue;
+					DRV_LOG(INFO, "Error reading "
+						"kickfd: %s",
+						strerror(errno));
+				}
+				break;
+			} while (1);
+
+			qid = events[i].data.u32 >> 1;
+
+			if (events[i].data.u32 & 1)
+				update_used_ring(internal, qid);
+			else
+				update_avail_ring(internal, qid);
+		}
+	}
+
+	return NULL;
+}
+
+static int
+setup_vring_relay(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	ret = pthread_create(&internal->tid, NULL, vring_relay,
+			(void *)internal);
+	if (ret) {
+		DRV_LOG(ERR, "failed to create ring relay pthread.");
+		return -1;
+	}
+	return 0;
+}
+
+static int
+unset_vring_relay(struct ifcvf_internal *internal)
+{
+	void *status;
+
+	if (internal->tid) {
+		pthread_cancel(internal->tid);
+		pthread_join(internal->tid, &status);
+	}
+	internal->tid = 0;
+
+	if (internal->epfd >= 0)
+		close(internal->epfd);
+	internal->epfd = -1;
+
+	return 0;
+}
+
+static int
+ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal)
+{
+	int ret;
+
+	/* stop the direct IO data path */
+	unset_notify_relay(internal);
+	vdpa_ifcvf_stop(internal);
+	vdpa_disable_vfio_intr(internal);
+
+	ret = rte_vhost_host_notifier_ctrl(internal->vid, false);
+	if (ret && ret != -ENOTSUP)
+		goto error;
+
+	/* set up interrupt for interrupt relay */
+	ret = m_enable_vfio_intr(internal);
+	if (ret)
+		goto unmap;
+
+	/* config the VF */
+	ret = m_ifcvf_start(internal);
+	if (ret)
+		goto unset_intr;
+
+	/* set up vring relay thread */
+	ret = setup_vring_relay(internal);
+	if (ret)
+		goto stop_vf;
+
+	internal->sw_fallback_running = true;
+
+	return 0;
+
+stop_vf:
+	m_ifcvf_stop(internal);
+unset_intr:
+	m_disable_vfio_intr(internal);
+unmap:
+	ifcvf_dma_map(internal, 0);
+error:
+	return -1;
+}
+
 static int
 ifcvf_dev_config(int vid)
 {
@@ -579,8 +897,25 @@ ifcvf_dev_close(int vid)
 	}
 
 	internal = list->internal;
-	rte_atomic32_set(&internal->dev_attached, 0);
-	update_datapath(internal);
+
+	if (internal->sw_fallback_running) {
+		/* unset ring relay */
+		unset_vring_relay(internal);
+
+		/* reset VF */
+		m_ifcvf_stop(internal);
+
+		/* remove interrupt setting */
+		m_disable_vfio_intr(internal);
+
+		/* unset DMA map for guest memory */
+		ifcvf_dma_map(internal, 0);
+
+		internal->sw_fallback_running = false;
+	} else {
+		rte_atomic32_set(&internal->dev_attached, 0);
+		update_datapath(internal);
+	}
 
 	return 0;
 }
@@ -604,7 +939,12 @@ ifcvf_set_features(int vid)
 	internal = list->internal;
 	rte_vhost_get_negotiated_features(vid, &features);
 
-	if (RTE_VHOST_NEED_LOG(features)) {
+	if (!RTE_VHOST_NEED_LOG(features))
+		return 0;
+
+	if (internal->sw_lm) {
+		ifcvf_sw_fallback_switchover(internal);
+	} else {
 		rte_vhost_get_log_base(vid, &log_base, &log_size);
 		rte_vfio_container_dma_map(internal->vfio_container_fd,
 				log_base, IFCVF_LOG_BASE, log_size);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document
  2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
                                   ` (8 preceding siblings ...)
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-18  8:02                 ` Xiao Wang
  2018-12-18 11:35                   ` Maxime Coquelin
  9 siblings, 1 reply; 86+ messages in thread
From: Xiao Wang @ 2018-12-18  8:02 UTC (permalink / raw)
  To: tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye, Xiao Wang

Add the SW assisted VDPA live migration feature into NIC doc.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
---
 doc/guides/nics/ifc.rst                | 12 +++++++++++-
 doc/guides/rel_notes/release_19_02.rst |  6 ++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst
index 48f9adf1d..bdf7b4e4a 100644
--- a/doc/guides/nics/ifc.rst
+++ b/doc/guides/nics/ifc.rst
@@ -31,7 +31,8 @@ IFCVF's vendor ID and device ID are same as that of virtio net pci device,
 with its specific subsystem vendor ID and device ID. To let the device be
 probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that this
 device is to be used in vDPA mode, rather than polling mode, virtio pmd will
-skip when it detects this message.
+skip when it detects this message. If no this parameter specified, device
+will not be used as a vDPA device, and it will be driven by virtio pmd.
 
 Different VF devices serve different virtio frontends which are in different
 VMs, so each VF needs to have its own DMA address translation service. During
@@ -39,6 +40,14 @@ the driver probe a new container is created for this device, with this
 container vDPA driver can program DMA remapping table with the VM's memory
 region information.
 
+The device argument "sw-live-migration=1" will configure the driver into SW
+assisted live migration mode. In this mode, the driver will set up a SW relay
+thread when LM happens, this thread will help device to log dirty pages. Thus
+this mode does not require HW to implement a dirty page logging function block,
+but will consume some percentage of CPU resource depending on the network
+throughput. If no this parameter specified, driver will rely on device's logging
+capability.
+
 Key IFCVF vDPA driver ops
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -70,6 +79,7 @@ Features
 Features of the IFCVF driver are:
 
 - Compatibility with virtio 0.95 and 1.0.
+- SW assisted vDPA live migration.
 
 
 Prerequisites
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index e86ef9511..131216e19 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -60,6 +60,12 @@ New Features
   * Added the handler to get firmware version string.
   * Added support for multicast filtering.
 
+* **Added support for SW-assisted VDPA live migration.**
+
+  This SW-assisted VDPA live migration facility helps VDPA devices without
+  logging capability to perform live migration, a mediated SW relay can help
+  devices to track dirty pages caused by DMA. IFC driver has enabled this
+  SW-assisted live migration mode.
 
 Removed Items
 -------------
-- 
2.15.1

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-17 19:00                       ` Maxime Coquelin
@ 2018-12-18  8:27                         ` Wang, Xiao W
  2018-12-18  8:44                           ` Thomas Monjalon
  0 siblings, 1 reply; 86+ messages in thread
From: Wang, Xiao W @ 2018-12-18  8:27 UTC (permalink / raw)
  To: Maxime Coquelin, thomas
  Cc: alejandro.lucero, dev, Wang, Zhihong, Ye, Xiaolong, Bie, Tiwei

Hi,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, December 18, 2018 3:01 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> 
> 
> 
> On 12/17/18 3:41 PM, Wang, Xiao W wrote:
> > Thanks for the confirmation.
> 
> Please note that CI reports a checkpatch issue:
> http://patches.dpdk.org/patch/48935/

+ Thomas.

I've tried the checkpatch.pl from CentOS 7.4 & 7.5 and also from the latest kernel, get no warning in my
self-check with dpdk/devtools/checkpatches.sh.
I don't know what checkpatch.pl the CI uses, it depends on the DPDK_CHECKPATCH_PATH environment
variable setting. In the v5 patch, I add the __rte_experimental flag for the new API even in the vdpa.c file,
but CI still reports this warning.

BRs,
Xiao

> 
> Thanks,
> Maxime
> 
> > BRs,
> > Xiao
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Monday, December 17, 2018 7:03 PM
> >> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>
> >> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> >> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> >> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> >>
> >> Hi Xiao,
> >>
> >> On 12/17/18 9:51 AM, Wang, Xiao W wrote:
> >>> Hi Maxime,
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >>>> Sent: Sunday, December 16, 2018 1:11 AM
> >>>> To: Wang, Xiao W <xiao.w.wang@intel.com>; Bie, Tiwei
> >> <tiwei.bie@intel.com>
> >>>> Cc: alejandro.lucero@netronome.com; dev@dpdk.org; Wang, Zhihong
> >>>> <zhihong.wang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>
> >>>> Subject: Re: [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
> >>>>
> >>>>
> >>>>
> >>>> On 12/14/18 10:16 PM, Xiao Wang wrote:
> >>>>> This patch provides two helpers for vdpa device driver to perform a
> >>>>> relay between the guest virtio ring and a mediate virtio ring.
> >>>>
> >>>> s/mediate/mediated/ ?
> >>>> I'm not 100% sure, but if it is mediated, please change everywhere else
> >>>> in the patch.
> >>>
> >>> "mediate" can also be used as an adjective, so "mediate" is OK here.
> >>
> >> I got the confirmation from a native speaker that mediate sounds wrong
> >> in this context, and mediated should be used.
> >>
> >>>>
> >>>>>
> >>>>> The available ring relay will synchronize the available entries, and
> >>>>> helps to do desc validity checking.
> >>>>
> >>>> s/helps/help/
> >>>
> >>> Yes, will update.
> >>>
> >>>>
> >>>>>
> >>>>> The used ring relay will synchronize the used entries from mediate ring
> >>>>> to guest ring, and helps to do dirty page logging for live migration.
> >>>>
> >>>> s/helps/help/
> >>>
> >>> Will update.
> >>>
> >>> Thanks for the comments,
> >>> Xiao
> >>>
> >>>>
> >>>>>
> >>>>> The next patch will leverage these two helpers.
> >>>>>
> >>>>> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> >>>>> ---
> >>>>>     lib/librte_vhost/rte_vdpa.h            |  39 +++++++
> >>>>>     lib/librte_vhost/rte_vhost_version.map |   2 +
> >>>>>     lib/librte_vhost/vdpa.c                | 194
> >>>> +++++++++++++++++++++++++++++++++
> >>>>>     lib/librte_vhost/vhost.h               |  40 +++++++
> >>>>>     lib/librte_vhost/virtio_net.c          |  39 -------
> >>>>>     5 files changed, 275 insertions(+), 39 deletions(-)
> >>>>>
> >>>>
> >>>>
> >>>> Appart from that:
> >>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>
> >>>> Thanks,
> >>>> Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay
  2018-12-18  8:27                         ` Wang, Xiao W
@ 2018-12-18  8:44                           ` Thomas Monjalon
  0 siblings, 0 replies; 86+ messages in thread
From: Thomas Monjalon @ 2018-12-18  8:44 UTC (permalink / raw)
  To: Wang, Xiao W
  Cc: Maxime Coquelin, alejandro.lucero, dev, Wang, Zhihong, Ye,
	Xiaolong, Bie, Tiwei

18/12/2018 09:27, Wang, Xiao W:
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > On 12/17/18 3:41 PM, Wang, Xiao W wrote:
> > > Thanks for the confirmation.
> > 
> > Please note that CI reports a checkpatch issue:
> > http://patches.dpdk.org/patch/48935/
> 
> + Thomas.
> 
> I've tried the checkpatch.pl from CentOS 7.4 & 7.5 and also from the latest kernel, get no warning in my
> self-check with dpdk/devtools/checkpatches.sh.
> I don't know what checkpatch.pl the CI uses, it depends on the DPDK_CHECKPATCH_PATH environment
> variable setting. In the v5 patch, I add the __rte_experimental flag for the new API even in the vdpa.c file,
> but CI still reports this warning.

It required to be updated on dpdk.org.
It should be fixed now.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode Xiao Wang
@ 2018-12-18 11:23                   ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-18 11:23 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/18/18 9:02 AM, Xiao Wang wrote:
> This patch series enables a new method for live migration, i.e. software
> assisted live migration. This patch provides a device argument for user
> to choose the methold.
> 
> When "sw-live-migration=1", driver/device will do live migration with a
> relay thread dealing with dirty page logging. Without this parameter,
> device will do dirty page logging and there's no relay thread consuming
> CPU resource.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
@ 2018-12-18 11:33                   ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-18 11:33 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/18/18 9:02 AM, Xiao Wang wrote:
> In SW assisted live migration mode, driver will stop the device and
> setup a mediated virtio ring to relay the communication between the
> virtio driver and the VDPA device.
> 
> This data path intervention will allow SW to help on guest dirty page
> logging for live migration.
> 
> This SW fallback is event driven relay thread, so when the network
> throughput is low, this SW fallback will take little CPU resource, but
> when the throughput goes up, the relay thread's CPU usage will goes up
> accordingly.
> 
> User needs to take all the factors including CPU usage, guest perf
> degradation, etc. into consideration when selecting the live migration
> support mode.
> 
> Signed-off-by: Xiao Wang<xiao.w.wang@intel.com>
> ---
>   drivers/net/ifc/base/ifcvf.h |   1 +
>   drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++-
>   2 files changed, 344 insertions(+), 3 deletions(-)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document
  2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document Xiao Wang
@ 2018-12-18 11:35                   ` Maxime Coquelin
  0 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-18 11:35 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/18/18 9:02 AM, Xiao Wang wrote:
> Add the SW assisted VDPA live migration feature into NIC doc.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> ---
>   doc/guides/nics/ifc.rst                | 12 +++++++++++-
>   doc/guides/rel_notes/release_19_02.rst |  6 ++++++
>   2 files changed, 17 insertions(+), 1 deletion(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration
  2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
                               ` (9 preceding siblings ...)
  2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document Xiao Wang
@ 2018-12-18 14:01             ` Maxime Coquelin
  10 siblings, 0 replies; 86+ messages in thread
From: Maxime Coquelin @ 2018-12-18 14:01 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie; +Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye



On 12/14/18 10:16 PM, Xiao Wang wrote:
> In the previous VDPA implementation we have enabled live migration support
> by HW accelerator doing all the stuff, including dirty page logging and
> device status report/restore. In this mode VDPA sample daemon and device
> driver just takes care of the control path and does not involve in data
> path, so there's almost 0 CPU resource usage. This mode requires device
> to have dirty page logging capability.
> 
> This patch series adds live migration support for devices without logging
> capability. VDPA driver could set up a relay thread standing between the
> guest and device when live migration happens, this relay intervenes into
> the communication between guest virtio driver and physical virtio
> accelerator, it helps device to do a vring relay and passingly log dirty
> pages. Thus some CPU resource will be consumed in this scenario, percentage
> depending on the network throughput.
> 
> Some new helpers are added into vhost lib for this VDPA SW fallback:
> - rte_vhost_host_notifier_ctrl, to enable/disable the VDPA direct-IO
>    datapath.
> - rte_vdpa_relay_vring_avail, to relay the available request from guest vring
>    to mediate vring.
> - rte_vdpa_relay_vring_used, to relay the used response from mediate vring to
>    guest vring.
> 
> Some existing helpers are also leveraged for SW fallback setup, like VFIO
> interrupt configuration, IOMMU table programming, etc.
> 
> This patch enables this SW assisted VDPA live migration in ifc driver.
> Since ifcvf also supports HW dirty page logging, we add a new devarg
> for user to select if the SW mode is used or not.
> 
> v4:
> * Add a patch to remove the unused vhost internal API: vhost_detach_vdpa_device().
> 
> v3:
> * Fix indent in relay code.
> * Fix the iova access mode issue of buffer check.
> * Rename the relay API to be more generic, and add more API note for used
>    ring handling.
> * Add kvargs lib dependency in ifc driver.
> * Add commit message for the doc update patch for checkpatch warning.
> 
> v2:
> * Reword the vdpa host notifier control API comment.
> * Make the vring relay API parameter as "void *" to accomodate the future
>    potential new ring layout, e.g. packed ring.
> * Add parameter check for the new API.
> * Add memory barrier for ring idx update.
> * Remove the used ring logging in the relay.
> * Some comment fix and code cleaning according to Tiwei's comment.
> * Add release note update.
> 
> Xiao Wang (10):
>    vhost: remove unused internal API
>    vhost: provide helper for host notifier ctrl
>    vhost: provide helpers for virtio ring relay
>    net/ifc: dump debug message for error
>    net/ifc: store only registered device instance
>    net/ifc: detect if VDPA mode is specified
>    net/ifc: add devarg for LM mode
>    net/ifc: use lib API for used ring logging
>    net/ifc: support SW assisted VDPA live migration
>    doc: update ifc NIC document
> 
>   doc/guides/nics/ifc.rst                |   8 +
>   doc/guides/rel_notes/release_19_02.rst |   6 +
>   drivers/net/ifc/Makefile               |   1 +
>   drivers/net/ifc/base/ifcvf.h           |   1 +
>   drivers/net/ifc/ifcvf_vdpa.c           | 461 ++++++++++++++++++++++++++++++---
>   lib/librte_vhost/rte_vdpa.h            |  57 ++++
>   lib/librte_vhost/rte_vhost_version.map |   3 +
>   lib/librte_vhost/vdpa.c                | 194 ++++++++++++++
>   lib/librte_vhost/vhost.c               |  13 -
>   lib/librte_vhost/vhost.h               |  41 ++-
>   lib/librte_vhost/vhost_user.c          |   7 +-
>   lib/librte_vhost/virtio_net.c          |  39 ---
>   12 files changed, 741 insertions(+), 90 deletions(-)
> 


Applied to dpdk-next-virtio

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl
  2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
@ 2018-12-18 15:37                   ` Ferruh Yigit
  0 siblings, 0 replies; 86+ messages in thread
From: Ferruh Yigit @ 2018-12-18 15:37 UTC (permalink / raw)
  To: Xiao Wang, tiwei.bie, maxime.coquelin
  Cc: alejandro.lucero, dev, zhihong.wang, xiaolong.ye

On 12/18/2018 8:01 AM, Xiao Wang wrote:
> @@ -155,4 +157,20 @@
> 
>  rte_vdpa_get_device(int did);
> 
> */
> int __rte_experimental
> rte_vdpa_get_device_num(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enable/Disable host notifier mapping for a vdpa port.
> + *
> + * @param vid
> + * vhost device id
> + * @enable
> + * true for host notifier map, false for host notifier unmap

'@enable' is causing doc build warning, will fix on the repo.

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2018-12-18 15:37 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-28  9:45 [dpdk-dev] [PATCH 0/9] support SW assisted VDPA live migration Xiao Wang
2018-11-28  9:45 ` [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
2018-12-04  6:22   ` Tiwei Bie
2018-12-12  6:51     ` Wang, Xiao W
2018-12-13  1:10   ` [dpdk-dev] [PATCH v2 0/9] support SW assisted VDPA live migration Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
2018-12-13 10:09       ` [dpdk-dev] [PATCH v3 0/9] support SW assisted VDPA live migration Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 1/9] vhost: provide helper for host notifier ctrl Xiao Wang
2018-12-14 13:33           ` Maxime Coquelin
2018-12-14 19:05             ` Wang, Xiao W
2018-12-14 21:16           ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Xiao Wang
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API Xiao Wang
2018-12-16  8:58               ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
2018-12-16  9:00               ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
2018-12-16  9:10               ` Maxime Coquelin
2018-12-17  8:51                 ` Wang, Xiao W
2018-12-17 11:02                   ` Maxime Coquelin
2018-12-17 14:41                     ` Wang, Xiao W
2018-12-17 19:00                       ` Maxime Coquelin
2018-12-18  8:27                         ` Wang, Xiao W
2018-12-18  8:44                           ` Thomas Monjalon
2018-12-18  8:01               ` [dpdk-dev] [PATCH v5 00/10] support SW assisted VDPA live migration Xiao Wang
2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 01/10] vhost: remove unused internal API Xiao Wang
2018-12-18  8:01                 ` [dpdk-dev] [PATCH v5 02/10] vhost: provide helper for host notifier ctrl Xiao Wang
2018-12-18 15:37                   ` Ferruh Yigit
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 03/10] vhost: provide helpers for virtio ring relay Xiao Wang
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 04/10] net/ifc: dump debug message for error Xiao Wang
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 05/10] net/ifc: store only registered device instance Xiao Wang
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 07/10] net/ifc: add devarg for LM mode Xiao Wang
2018-12-18 11:23                   ` Maxime Coquelin
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 08/10] net/ifc: use lib API for used ring logging Xiao Wang
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
2018-12-18 11:33                   ` Maxime Coquelin
2018-12-18  8:02                 ` [dpdk-dev] [PATCH v5 10/10] doc: update ifc NIC document Xiao Wang
2018-12-18 11:35                   ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error Xiao Wang
2018-12-16  9:11               ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance Xiao Wang
2018-12-16  9:12               ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified Xiao Wang
2018-12-16  9:17               ` Maxime Coquelin
2018-12-17  8:54                 ` Wang, Xiao W
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode Xiao Wang
2018-12-16  9:21               ` Maxime Coquelin
2018-12-17  9:00                 ` Wang, Xiao W
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging Xiao Wang
2018-12-16  9:24               ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration Xiao Wang
2018-12-16  9:35               ` Maxime Coquelin
2018-12-17  9:12                 ` Wang, Xiao W
2018-12-17 11:08                   ` Maxime Coquelin
2018-12-14 21:16             ` [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document Xiao Wang
2018-12-16  9:36               ` Maxime Coquelin
2018-12-17  9:15                 ` Wang, Xiao W
2018-12-18 14:01             ` [dpdk-dev] [PATCH v4 00/10] support SW assisted VDPA live migration Maxime Coquelin
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 2/9] vhost: provide helpers for virtio ring relay Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 3/9] net/ifc: dump debug message for error Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 4/9] net/ifc: store only registered device instance Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 6/9] net/ifc: add devarg for LM mode Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 7/9] net/ifc: use lib API for used ring logging Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
2018-12-13 10:09         ` [dpdk-dev] [PATCH v3 9/9] doc: update ifc NIC document Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 3/9] net/ifc: dump debug message for error Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 4/9] net/ifc: store only registered device instance Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 6/9] net/ifc: add devarg for LM mode Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 7/9] net/ifc: use lib API for used ring logging Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
2018-12-13  1:10     ` [dpdk-dev] [PATCH v2 9/9] doc: update ifc NIC document Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 3/9] net/ifc: dump debug message for error Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 4/9] net/ifc: store only registered device instance Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 5/9] net/ifc: detect if VDPA mode is specified Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode Xiao Wang
2018-12-04  6:31   ` Tiwei Bie
2018-12-12  6:53     ` Wang, Xiao W
2018-12-12 10:15   ` Alejandro Lucero
2018-12-12 10:23     ` Wang, Xiao W
2018-11-28  9:46 ` [dpdk-dev] [PATCH 7/9] net/ifc: use lib API for used ring logging Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 8/9] net/ifc: support SW assisted VDPA live migration Xiao Wang
2018-11-28  9:46 ` [dpdk-dev] [PATCH 9/9] doc: update ifc NIC document Xiao Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).