[dpdk-dev] [PATCH 0/4] support async dequeue for split ring

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH 0/4] support async dequeue for split ring
@ 2021-09-06 20:48 Wenwu Ma
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-06 20:48 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with offloading
copies to the DMA engine, thus saving precious CPU cycles.

note: PATCH 2/4 depends on vhost patch from Jiayu Hu
(http://patches.dpdk.org/project/dpdk/patch/1629463466-450012-1-git-send-email-jiayu.hu@intel.com/)

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst |   9 +
 doc/guides/sample_app_ug/vhost.rst  |   9 +-
 examples/vhost/ioat.c               |  67 +++-
 examples/vhost/ioat.h               |  25 ++
 examples/vhost/main.c               | 269 +++++++++-----
 examples/vhost/main.h               |  34 +-
 examples/vhost/virtio_net.c         |  16 +-
 lib/vhost/rte_vhost_async.h         |  36 +-
 lib/vhost/version.map               |   3 +
 lib/vhost/vhost.h                   |   3 +-
 lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
 11 files changed, 881 insertions(+), 121 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 1/4] vhost: support async dequeue for split ring
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
@ 2021-09-06 20:48 ` Wenwu Ma
  2021-09-10  7:36   ` Yang, YvonneX
  2021-09-15  2:51   ` Xia, Chenbo
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-06 20:48 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Yuan Wang, Wenwu Ma, Yinan Wang

From: Yuan Wang <yuanx.wang@intel.com>

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading copies to the async channel, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst |   9 +
 lib/vhost/rte_vhost_async.h         |  36 +-
 lib/vhost/version.map               |   3 +
 lib/vhost/vhost.h                   |   3 +-
 lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
 5 files changed, 579 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 171e0096f6..9ed544db7a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -303,6 +303,15 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, nr_inflight)``
+
+  This function tries to receive packets from the guest with offloading
+  copies to the async channel. The packets that are transfer completed
+  are returned in ``pkts``. The other packets that their copies are submitted
+  to the async channel but not completed are called "in-flight packets".
+  This function will not return in-flight packets until their copies are
+  completed by the async channel.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index ad71555a7f..5e2429ab70 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -83,12 +83,18 @@ struct rte_vhost_async_channel_ops {
 		uint16_t max_packets);
 };
 
+struct async_nethdr {
+	struct virtio_net_hdr hdr;
+	bool valid;
+};
+
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
 struct async_inflight_info {
 	struct rte_mbuf *mbuf;
-	uint16_t descs; /* num of descs inflight */
+	struct async_nethdr nethdr;
+	uint16_t descs; /* num of descs in-flight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
 };
 
@@ -255,5 +261,31 @@ int rte_vhost_async_get_inflight(int vid, uint16_t queue_id);
 __rte_experimental
 uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 		struct rte_mbuf **pkts, uint16_t count);
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  id of vhost device to dequeue data
+ * @param queue_id
+ *  queue id to dequeue data
+ * @param pkts
+ *  blank array to keep successfully dequeued packets
+ * @param count
+ *  size of the packet array
+ * @param nr_inflight
+ *  the amount of in-flight packets. If error occurred, its value is set to -1.
+ * @return
+ *  num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight);
 
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index c92a9d4962..1e033ad8e2 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -85,4 +85,7 @@ EXPERIMENTAL {
 	rte_vhost_async_channel_register_thread_unsafe;
 	rte_vhost_async_channel_unregister_thread_unsafe;
 	rte_vhost_clear_queue_thread_unsafe;
+
+	# added in 21.11
+	rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 1e56311725..89a31e4ca8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -49,7 +49,8 @@
 #define MAX_PKT_BURST 32
 
 #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2)
-#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 4)
+#define MAX_ASYNC_COPY_VECTOR 1024
+#define VHOST_MAX_ASYNC_VEC (MAX_ASYNC_COPY_VECTOR * 2)
 
 #define PACKED_DESC_ENQUEUE_USED_FLAG(w)	\
 	((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 0350f6fcce..67a8cd2c41 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3170,3 +3170,534 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline int
+async_desc_to_mbuf(struct virtio_net *dev,
+		  struct buf_vector *buf_vec, uint16_t nr_vec,
+		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+		  struct iovec *src_iovec, struct iovec *dst_iovec,
+		  struct rte_vhost_iov_iter *src_it,
+		  struct rte_vhost_iov_iter *dst_it,
+		  struct async_nethdr *nethdr,
+		  int nr_iovec)
+{
+	uint64_t buf_addr, buf_iova;
+	uint64_t mapped_len;
+	uint32_t tlen = 0;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint32_t mbuf_avail, mbuf_offset;
+	uint32_t cpy_len;
+	/* A counter to avoid desc dead loop chain */
+	uint16_t vec_idx = 0;
+	int tvec_idx = 0;
+	struct rte_mbuf *cur = m, *prev = m;
+	struct virtio_net_hdr tmp_hdr;
+	struct virtio_net_hdr *hdr = NULL;
+
+	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_len = buf_vec[vec_idx].buf_len;
+	buf_iova = buf_vec[vec_idx].buf_iova;
+
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
+
+	if (virtio_net_with_host_offload(dev)) {
+		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
+			/*
+			 * No luck, the virtio-net header doesn't fit
+			 * in a contiguous virtual area.
+			 */
+			copy_vnet_hdr_from_desc(&tmp_hdr, buf_vec);
+			hdr = &tmp_hdr;
+		} else {
+			hdr = (struct virtio_net_hdr *)((uintptr_t)buf_addr);
+		}
+	}
+
+	/*
+	 * A virtio driver normally uses at least 2 desc buffers
+	 * for Tx: the first for storing the header, and others
+	 * for storing the data.
+	 */
+	if (unlikely(buf_len < dev->vhost_hlen)) {
+		buf_offset = dev->vhost_hlen - buf_len;
+		vec_idx++;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+		buf_avail  = buf_len - buf_offset;
+	} else if (buf_len == dev->vhost_hlen) {
+		if (unlikely(++vec_idx >= nr_vec))
+			return -1;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+
+		buf_offset = 0;
+		buf_avail = buf_len;
+	} else {
+		buf_offset = dev->vhost_hlen;
+		buf_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
+	}
+
+	PRINT_PACKET(dev, (uintptr_t)(buf_addr + buf_offset), (uint32_t)buf_avail, 0);
+
+	mbuf_offset = 0;
+	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+	while (1) {
+		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
+
+		while (cpy_len) {
+			void *hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+						buf_iova + buf_offset, cpy_len,
+						&mapped_len);
+			if (unlikely(!hpa)) {
+				VHOST_LOG_DATA(ERR, "(%d) %s: failed to get hpa.\n",
+					dev->vid, __func__);
+				return -1;
+			}
+			if (unlikely(tvec_idx >= nr_iovec)) {
+				VHOST_LOG_DATA(ERR, "iovec is not enough for offloading\n");
+				return -1;
+			}
+
+			async_fill_vec(src_iovec + tvec_idx, hpa, (size_t)mapped_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+				(void *)(uintptr_t)rte_pktmbuf_iova_offset(cur, mbuf_offset),
+				(size_t)mapped_len);
+
+			tvec_idx++;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
+		}
+
+		/* This buf reaches to its end, get the next one */
+		if (buf_avail == 0) {
+			if (++vec_idx >= nr_vec)
+				break;
+
+			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
+			buf_len = buf_vec[vec_idx].buf_len;
+
+			buf_offset = 0;
+			buf_avail = buf_len;
+
+			PRINT_PACKET(dev, (uintptr_t)buf_addr, (uint32_t)buf_avail, 0);
+		}
+
+		/*
+		 * This mbuf reaches to its end, get a new one
+		 * to hold more data.
+		 */
+		if (mbuf_avail == 0) {
+			cur = rte_pktmbuf_alloc(mbuf_pool);
+			if (unlikely(cur == NULL)) {
+				VHOST_LOG_DATA(ERR, "Failed to allocate memory for mbuf.\n");
+				return -1;
+			}
+
+			prev->next = cur;
+			prev->data_len = mbuf_offset;
+			m->nb_segs += 1;
+			m->pkt_len += mbuf_offset;
+			prev = cur;
+
+			mbuf_offset = 0;
+			mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM;
+		}
+	}
+
+	prev->data_len = mbuf_offset;
+	m->pkt_len += mbuf_offset;
+
+	if (tlen) {
+		async_fill_iter(src_it, tlen, src_iovec, tvec_idx);
+		async_fill_iter(dst_it, tlen, dst_iovec, tvec_idx);
+		if (hdr) {
+			nethdr->valid = true;
+			nethdr->hdr = *hdr;
+		} else
+			nethdr->valid = false;
+	}
+
+	return 0;
+}
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, bool legacy_ol_flags)
+{
+	uint16_t n_pkts_cpl = 0, n_pkts_put = 0;
+	uint16_t start_idx, pkt_idx, from;
+	struct async_inflight_info *pkts_info;
+
+	pkt_idx = vq->async_pkts_idx & (vq->size - 1);
+	pkts_info = vq->async_pkts_info;
+	start_idx = virtio_dev_rx_async_get_info_idx(pkt_idx, vq->size,
+			vq->async_pkts_inflight_n);
+
+	if (count > vq->async_last_pkts_n) {
+		int ret;
+
+		ret = vq->async_ops.check_completed_copies(dev->vid, queue_id,
+				0, count - vq->async_last_pkts_n);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel poll error\n", dev->vid);
+			ret = 0;
+		}
+		n_pkts_cpl = ret;
+	}
+
+	n_pkts_cpl += vq->async_last_pkts_n;
+	if (unlikely(n_pkts_cpl == 0))
+		return 0;
+
+	n_pkts_put = RTE_MIN(count, n_pkts_cpl);
+
+	for (pkt_idx = 0; pkt_idx < n_pkts_put; pkt_idx++) {
+		from = (start_idx + pkt_idx) & (vq->size - 1);
+		pkts[pkt_idx] = pkts_info[from].mbuf;
+
+		if (pkts_info[from].nethdr.valid) {
+			vhost_dequeue_offload(&pkts_info[from].nethdr.hdr,
+					pkts[pkt_idx], legacy_ol_flags);
+		}
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, n_pkts_put);
+	__atomic_add_fetch(&vq->used->idx, n_pkts_put, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async_last_pkts_n = n_pkts_cpl - n_pkts_put;
+	vq->async_pkts_inflight_n -= n_pkts_put;
+
+	return n_pkts_put;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t nr_async_burst = 0;
+	uint16_t pkt_err = 0;
+	uint16_t iovec_idx = 0, it_idx = 0;
+	struct rte_vhost_iov_iter *it_pool = vq->it_pool;
+	struct iovec *vec_pool = vq->vec_pool;
+	struct iovec *src_iovec = vec_pool;
+	struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
+	struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
+	struct async_inflight_info *pkts_info = vq->async_pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%d) about to dequeue %u buffers\n", dev->vid, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed mbuf alloc of size %d from %s on %s.\n",
+					buf_len, mbuf_pool->name, dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
+		err = async_desc_to_mbuf(dev, buf_vec, nr_vec, pkt,
+				mbuf_pool, &src_iovec[iovec_idx],
+				&dst_iovec[iovec_idx], &it_pool[it_idx],
+				&it_pool[it_idx + 1],
+				&pkts_info[slot_idx].nethdr,
+				(VHOST_MAX_ASYNC_VEC >> 1) - iovec_idx);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed to offload copies to async channel %s.\n",
+					dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		async_fill_desc(&tdes[nr_async_burst], &it_pool[it_idx], &it_pool[it_idx + 1]);
+		pkts_info[slot_idx].mbuf = pkt;
+		nr_async_burst++;
+
+		iovec_idx += it_pool[it_idx].nr_segs;
+		it_idx += 2;
+
+		/* store used descs */
+		to = vq->async_desc_idx_split & (vq->size - 1);
+		vq->async_descs_split[to].id = head_idx;
+		vq->async_descs_split[to].len = 0;
+		vq->async_desc_idx_split++;
+
+		vq->last_avail_idx++;
+
+		if (unlikely(nr_async_burst >= VHOST_ASYNC_BATCH_THRESHOLD)) {
+			uint16_t nr_pkts;
+			int32_t ret;
+
+			ret = vq->async_ops.transfer_data(dev->vid, queue_id,
+					tdes, 0, nr_async_burst);
+			if (unlikely(ret < 0)) {
+				VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+				ret = 0;
+			}
+			nr_pkts = ret;
+
+			vq->async_pkts_inflight_n += nr_pkts;
+			it_idx = 0;
+			iovec_idx = 0;
+
+			if (unlikely(nr_pkts < nr_async_burst)) {
+				pkt_err = nr_async_burst - nr_pkts;
+				nr_async_burst = 0;
+				pkt_idx++;
+				break;
+			}
+			nr_async_burst = 0;
+		}
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	if (nr_async_burst) {
+		uint16_t nr_pkts;
+		int32_t ret;
+
+		ret = vq->async_ops.transfer_data(dev->vid, queue_id, tdes, 0, nr_async_burst);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+			ret = 0;
+		}
+		nr_pkts = ret;
+
+		vq->async_pkts_inflight_n += nr_pkts;
+
+		if (unlikely(nr_pkts < nr_async_burst))
+			pkt_err = nr_async_burst - nr_pkts;
+	}
+
+	if (unlikely(pkt_err)) {
+		uint16_t nr_err_dma = pkt_err;
+
+		pkt_idx -= nr_err_dma;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		vq->async_desc_idx_split -= nr_err_dma;
+		while (nr_err_dma-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+	}
+
+	vq->async_pkts_idx += pkt_idx;
+
+out:
+	if (vq->async_pkts_inflight_n > 0) {
+		nr_done_pkts = async_poll_dequeue_completed_split(dev, vq,
+					queue_id, pkts, count, legacy_ol_flags);
+	}
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: built-in vhost net backend is disabled.\n",
+			dev->vid, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: invalid virtqueue idx %d.\n",
+			dev->vid, __func__, queue_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async_registered)) {
+		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for queue id %d.\n",
+			dev->vid, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out_access_unlock;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev)))
+		return 0;
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+
+out:
+	*nr_inflight = vq->async_pkts_inflight_n;
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL)) {
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		memmove(&pkts[1], pkts, count * sizeof(struct rte_mbuf *));
+		pkts[0] = rarp_mbuf;
+		count += 1;
+	}
+
+	return count;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
@ 2021-09-06 20:48 ` Wenwu Ma
  2021-09-10  7:38   ` Yang, YvonneX
  2021-09-15  3:02   ` Xia, Chenbo
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-06 20:48 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
 examples/vhost/main.h       |  31 +++++++++--
 examples/vhost/virtio_net.c |  16 +++++-
 3 files changed, 101 insertions(+), 46 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index a4a8214e05..e246b640ea 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -888,22 +890,8 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1182,6 +1170,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+
+	complete_async_pkts(vdev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+				queue_id, pkts, rx_count);
+	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
+					__ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1212,25 +1227,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+						VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1243,6 +1241,14 @@ drain_eth_rx(struct vhost_dev *vdev)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id,
+					mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1250,13 +1256,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1441,6 +1442,21 @@ new_device(int vid)
 		}
 	}
 
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (async_vhost_driver) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							sync_enqueue_pkts;
+		}
+
+		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7b1ac60a6..948a23efa6 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,7 +100,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
@ 2021-09-06 20:48 ` Wenwu Ma
  2021-09-10  7:38   ` Yang, YvonneX
  2021-09-15  3:04   ` Xia, Chenbo
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-06 20:48 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 examples/vhost/ioat.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 457f8171f0..6adc30b622 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
 	unsigned short next_read;
 	unsigned short next_write;
 	unsigned short last_remain;
-	unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
 			goto out;
 		}
 		rte_rawdev_start(dev_id);
-		cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
 		dma_info->nr++;
 		i++;
 	}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			src = descs[i_desc].src;
 			dst = descs[i_desc].dst;
 			i_seg = 0;
-			if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+			if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
 				break;
 			while (i_seg < src->nr_segs) {
 				rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			}
 			write &= mask;
 			cb_tracker[dev_id].size_track[write] = src->nr_segs;
-			cb_tracker[dev_id].ioat_space -= src->nr_segs;
 			write++;
 		}
 	} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		if (n_seg == 0)
 			return 0;
 
-		cb_tracker[dev_id].ioat_space += n_seg;
 		n_seg += cb_tracker[dev_id].last_remain;
 
 		read = cb_tracker[dev_id].next_read;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
                   ` (2 preceding siblings ...)
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
@ 2021-09-06 20:48 ` Wenwu Ma
  2021-09-10  7:39   ` Yang, YvonneX
  2021-09-15  3:27   ` Xia, Chenbo
  2021-09-10  7:33 ` [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Yang, YvonneX
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-06 20:48 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c              |  61 +++++++--
 examples/vhost/ioat.h              |  25 ++++
 examples/vhost/main.c              | 201 +++++++++++++++++++----------
 examples/vhost/main.h              |   3 +-
 5 files changed, 216 insertions(+), 83 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 6adc30b622..540b61fff6 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid, vring_id;
+	int64_t socketid, vring_id;
 	struct rte_ioat_rawdev_config config;
 	struct rte_rawdev_info info = { .dev_private = &config };
 	char name[32];
@@ -60,6 +62,7 @@ open_ioat(const char *value)
 		goto out;
 	}
 	while (i < args_nr) {
+		bool is_txd;
 		char *arg_temp = dma_arg[i];
 		uint8_t sub_nr;
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +71,39 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		int async_flag;
+		char *txd, *rxd;
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			is_txd = true;
+			start = txd;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			is_txd = false;
+			start = rxd;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
 		}
 
-		vring_id = 0 + VIRTIO_RXQ;
+		vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
 		if (rte_pci_addr_parse(ptrs[1],
-				&(dma_info + vid)->dmas[vring_id].addr) < 0) {
+			&(dma_info + socketid)->dmas[vring_id].addr) < 0) {
 			ret = -1;
 			goto out;
 		}
 
-		rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+		rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
 				name, sizeof(name));
 		dev_id = rte_rawdev_get_dev_id(name);
 		if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		(dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-		(dma_info + vid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->async_flag |= async_flag;
 		config.ring_size = IOAT_RING_SIZE;
 		config.hdls_disable = true;
 		if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data, uint16_t count)
 {
 	uint32_t i_desc;
-	uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id;
 	struct rte_vhost_iov_iter *src = NULL;
 	struct rte_vhost_iov_iter *dst = NULL;
 	unsigned long i_seg;
 	unsigned short mask = MAX_ENQUEUED_SIZE - 1;
-	unsigned short write = cb_tracker[dev_id].next_write;
 
+	if (queue_id >= MAX_RING_COUNT)
+		return -1;
+
+	uint16_t dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
+	unsigned short write = cb_tracker[dev_id].next_write;
 	if (!opaque_data) {
 		for (i_desc = 0; i_desc < count; i_desc++) {
 			src = descs[i_desc].src;
@@ -170,16 +189,16 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets)
 {
-	if (!opaque_data) {
+	if (!opaque_data && (queue_id < MAX_RING_COUNT)) {
 		uintptr_t dump[255];
 		int n_seg;
 		unsigned short read, write;
 		unsigned short nb_packet = 0;
 		unsigned short mask = MAX_ENQUEUED_SIZE - 1;
 		unsigned short i;
+		uint16_t dev_id;
 
-		uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2
-				+ VIRTIO_RXQ].dev_id;
+		dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
 		n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump, dump);
 		if (n_seg < 0) {
 			RTE_LOG(ERR,
@@ -215,4 +234,18 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 	return -1;
 }
 
+uint32_t get_async_flag_by_vid(int vid)
+{
+	return dma_bind[vid2socketid[vid]].async_flag;
+}
+
+uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
 #endif /* RTE_RAW_IOAT */
diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h
index 62e163c585..fa5086e662 100644
--- a/examples/vhost/ioat.h
+++ b/examples/vhost/ioat.h
@@ -12,6 +12,9 @@
 #define MAX_VHOST_DEVICE 1024
 #define IOAT_RING_SIZE 4096
 #define MAX_ENQUEUED_SIZE 4096
+#define MAX_RING_COUNT	2
+#define ASYNC_ENQUEUE_VHOST	1
+#define ASYNC_DEQUEUE_VHOST	2
 
 struct dma_info {
 	struct rte_pci_addr addr;
@@ -20,6 +23,7 @@ struct dma_info {
 };
 
 struct dma_for_vhost {
+	int async_flag;
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
 	uint16_t nr;
 };
@@ -36,6 +40,10 @@ int32_t
 ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets);
+
+uint32_t get_async_flag_by_vid(int vid);
+uint32_t get_async_flag_by_socketid(int socketid);
+void init_vid2socketid_array(int vid, int socketid);
 #else
 static int open_ioat(const char *value __rte_unused)
 {
@@ -59,5 +67,22 @@ ioat_check_completed_copies_cb(int vid __rte_unused,
 {
 	return -1;
 }
+
+static uint32_t
+get_async_flag_by_vid(int vid __rte_unused)
+{
+	return 0;
+}
+
+static uint32_t
+get_async_flag_by_socketid(int socketid __rte_unused)
+{
+	return 0;
+}
+
+static void
+init_vid2socketid_array(int vid __rte_unused, int socketid __rte_unused)
+{
+}
 #endif
 #endif /* _IOAT_H_ */
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index e246b640ea..b34534111d 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -93,8 +93,6 @@ static int client_mode;
 
 static int builtin_net_driver;
 
-static int async_vhost_driver;
-
 static char *dma_type;
 
 /* Specify timeout (in useconds) between retries on RX. */
@@ -679,7 +677,6 @@ us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
-			async_vhost_driver = 1;
 			break;
 
 		case OPT_CLIENT_NUM:
@@ -855,7 +852,8 @@ complete_async_pkts(struct vhost_dev *vdev)
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+				complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -900,7 +898,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1180,8 +1178,8 @@ async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
 	complete_async_pkts(vdev);
 	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
 				queue_id, pkts, rx_count);
-	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
-					__ATOMIC_SEQ_CST);
+	__atomic_add_fetch(&vdev->pkts_enq_inflight,
+			enqueue_count, __ATOMIC_SEQ_CST);
 
 	enqueue_fail = rx_count - enqueue_count;
 	if (enqueue_fail)
@@ -1237,10 +1235,23 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+	return dequeue_count;
+}
+
 uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mempool *mbuf_pool,
 			struct rte_mbuf **pkts, uint16_t count)
@@ -1336,6 +1347,32 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if ((queue_id % VIRTIO_QNUM) == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchonization  occurs through the use of the
@@ -1392,21 +1429,91 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (async_vhost_driver) {
-		uint16_t n_pkt = 0;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+	if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	}
+	if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+	}
+
+	rte_free(vdev);
+}
+
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						sync_enqueue_pkts;
+		}
 
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						async_dequeue_pkts;
+		} else {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						sync_dequeue_pkts;
 		}
+	}
 
-		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int ret = 0;
+	struct rte_vhost_async_config config = {0};
+	struct rte_vhost_async_channel_ops channel_ops;
+
+	if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
+		channel_ops.transfer_data = ioat_transfer_data_cb;
+		channel_ops.check_completed_copies =
+			ioat_check_completed_copies_cb;
+
+		config.features = RTE_VHOST_ASYNC_INORDER;
+
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
+					config, &channel_ops);
+		}
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_TXQ,
+					config, &channel_ops);
+		}
 	}
 
-	rte_free(vdev);
+	return ret;
 }
 
 /*
@@ -1442,20 +1549,8 @@ new_device(int vid)
 		}
 	}
 
-	if (builtin_net_driver) {
-		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
-		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
-	} else {
-		if (async_vhost_driver) {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							async_enqueue_pkts;
-		} else {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							sync_enqueue_pkts;
-		}
-
-		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
-	}
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
 
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
@@ -1484,27 +1579,13 @@ new_device(int vid)
 	rte_vhost_enable_guest_notification(vid, VIRTIO_RXQ, 0);
 	rte_vhost_enable_guest_notification(vid, VIRTIO_TXQ, 0);
 
+	int ret = vhost_async_channel_register(vid);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (async_vhost_driver) {
-		struct rte_vhost_async_config config = {0};
-		struct rte_vhost_async_channel_ops channel_ops;
-
-		if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
-			channel_ops.transfer_data = ioat_transfer_data_cb;
-			channel_ops.check_completed_copies =
-				ioat_check_completed_copies_cb;
-
-			config.features = RTE_VHOST_ASYNC_INORDER;
-
-			return rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
-				config, &channel_ops);
-		}
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1522,19 +1603,8 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (async_vhost_driver) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
-	}
+	if (!enable)
+		vhost_clear_queue_thread_unsafe(vdev, queue_id);
 
 	return 0;
 }
@@ -1778,10 +1848,11 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (async_vhost_driver)
-			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
+		uint64_t flag = flags;
+		if (get_async_flag_by_socketid(i) != 0)
+			flag |= RTE_VHOST_USER_ASYNC_COPY;
 
-		ret = rte_vhost_driver_register(file, flags);
+		ret = rte_vhost_driver_register(file, flag);
 		if (ret != 0) {
 			unregister_drivers(i);
 			rte_exit(EXIT_FAILURE,
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 948a23efa6..5af7e7d97f 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -51,7 +51,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] support async dequeue for split ring
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
                   ` (3 preceding siblings ...)
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
@ 2021-09-10  7:33 ` Yang, YvonneX
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
  6 siblings, 0 replies; 28+ messages in thread
From: Yang, YvonneX @ 2021-09-10  7:33 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil



> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> Subject: [PATCH 0/4] support async dequeue for split ring
> 
> This patch implements asynchronous dequeue data path for split ring.
> A new asynchronous dequeue function is introduced. With this function, the
> application can try to receive packets from the guest with offloading copies
> to the DMA engine, thus saving precious CPU cycles.
> 
> note: PATCH 2/4 depends on vhost patch from Jiayu Hu
> (http://patches.dpdk.org/project/dpdk/patch/1629463466-450012-1-git-
> send-email-jiayu.hu@intel.com/)
> 
> Wenwu Ma (3):
>   examples/vhost: refactor vhost enqueue and dequeue datapaths
>   examples/vhost: use a new API to query remaining ring space
>   examples/vhost: support vhost async dequeue data path
> 
> Yuan Wang (1):
>   vhost: support async dequeue for split ring
> 
>  doc/guides/prog_guide/vhost_lib.rst |   9 +
>  doc/guides/sample_app_ug/vhost.rst  |   9 +-
>  examples/vhost/ioat.c               |  67 +++-
>  examples/vhost/ioat.h               |  25 ++
>  examples/vhost/main.c               | 269 +++++++++-----
>  examples/vhost/main.h               |  34 +-
>  examples/vhost/virtio_net.c         |  16 +-
>  lib/vhost/rte_vhost_async.h         |  36 +-
>  lib/vhost/version.map               |   3 +
>  lib/vhost/vhost.h                   |   3 +-
>  lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
>  11 files changed, 881 insertions(+), 121 deletions(-)
> 
> --
> 2.25.1

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] vhost: support async dequeue for split ring
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
@ 2021-09-10  7:36   ` Yang, YvonneX
  2021-09-15  2:51   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Yang, YvonneX @ 2021-09-10  7:36 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Jiang, Cheng1, Hu, Jiayu, Pai G,
	Sunil, Wang, YuanX, Wang, Yinan



> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Wang, YuanX <yuanx.wang@intel.com>; Ma,
> WenwuX <wenwux.ma@intel.com>; Wang, Yinan <yinan.wang@intel.com>
> Subject: [PATCH 1/4] vhost: support async dequeue for split ring
> 
> From: Yuan Wang <yuanx.wang@intel.com>
> 
> This patch implements asynchronous dequeue data path for split ring.
> A new asynchronous dequeue function is introduced. With this function, the
> application can try to receive packets from the guest with offloading copies
> to the async channel, thus saving precious CPU cycles.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Tested-by: Yinan Wang <yinan.wang@intel.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst |   9 +
>  lib/vhost/rte_vhost_async.h         |  36 +-
>  lib/vhost/version.map               |   3 +
>  lib/vhost/vhost.h                   |   3 +-
>  lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
>  5 files changed, 579 insertions(+), 3 deletions(-)
> 

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
@ 2021-09-10  7:38   ` Yang, YvonneX
  2021-09-15  3:02   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Yang, YvonneX @ 2021-09-10  7:38 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil



> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> Subject: [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue
> datapaths
> 
> Previously, by judging the flag, we call different enqueue/dequeue functions
> in data path.
> 
> Now, we use an ops that was initialized when Vhost was created, so that we
> can call ops directly in Vhost data path without any more flag judgment.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
>  examples/vhost/main.h       |  31 +++++++++--
>  examples/vhost/virtio_net.c |  16 +++++-
>  3 files changed, 101 insertions(+), 46 deletions(-)
> 

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
@ 2021-09-10  7:38   ` Yang, YvonneX
  2021-09-15  3:04   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Yang, YvonneX @ 2021-09-10  7:38 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil



> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> Subject: [PATCH 3/4] examples/vhost: use a new API to query remaining ring
> space
> 
> A new API for querying the remaining descriptor ring capacity is available, so
> we use the new one instead of the old one.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  examples/vhost/ioat.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
@ 2021-09-10  7:39   ` Yang, YvonneX
  2021-09-15  3:27   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Yang, YvonneX @ 2021-09-10  7:39 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil



> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> Subject: [PATCH 4/4] examples/vhost: support vhost async dequeue data
> path
> 
> This patch is to add vhost async dequeue data-path in vhost sample.
> vswitch can leverage IOAT to accelerate vhost async dequeue data-path.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/sample_app_ug/vhost.rst |   9 +-
>  examples/vhost/ioat.c              |  61 +++++++--
>  examples/vhost/ioat.h              |  25 ++++
>  examples/vhost/main.c              | 201 +++++++++++++++++++----------
>  examples/vhost/main.h              |   3 +-
>  5 files changed, 216 insertions(+), 83 deletions(-)
> 
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] vhost: support async dequeue for split ring
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
  2021-09-10  7:36   ` Yang, YvonneX
@ 2021-09-15  2:51   ` Xia, Chenbo
       [not found]     ` <CO1PR11MB4897F3D5ABDE7133DB99791385DB9@CO1PR11MB4897.namprd11.prod.outlook.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Xia, Chenbo @ 2021-09-15  2:51 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil, Yang,
	YvonneX, Wang,  YuanX, Wang, Yinan

Hi,

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>; Jiang,
> Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; Yang, YvonneX <yvonnex.yang@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>; Wang, Yinan
> <yinan.wang@intel.com>
> Subject: [PATCH 1/4] vhost: support async dequeue for split ring
> 
> From: Yuan Wang <yuanx.wang@intel.com>
> 
> This patch implements asynchronous dequeue data path for split ring.
> A new asynchronous dequeue function is introduced. With this function,
> the application can try to receive packets from the guest with
> offloading copies to the async channel, thus saving precious CPU
> cycles.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Tested-by: Yinan Wang <yinan.wang@intel.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst |   9 +
>  lib/vhost/rte_vhost_async.h         |  36 +-
>  lib/vhost/version.map               |   3 +
>  lib/vhost/vhost.h                   |   3 +-
>  lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
>  5 files changed, 579 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 171e0096f6..9ed544db7a 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -303,6 +303,15 @@ The following is an overview of some key Vhost API
> functions:
>    Clear inflight packets which are submitted to DMA engine in vhost async
> data
>    path. Completed packets are returned to applications through ``pkts``.
> 
> +* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count,
> nr_inflight)``
> +
> +  This function tries to receive packets from the guest with offloading
> +  copies to the async channel. The packets that are transfer completed
> +  are returned in ``pkts``. The other packets that their copies are submitted
> +  to the async channel but not completed are called "in-flight packets".
> +  This function will not return in-flight packets until their copies are
> +  completed by the async channel.
> +
>  Vhost-user Implementations
>  --------------------------
> 
> diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
> index ad71555a7f..5e2429ab70 100644
> --- a/lib/vhost/rte_vhost_async.h
> +++ b/lib/vhost/rte_vhost_async.h
> @@ -83,12 +83,18 @@ struct rte_vhost_async_channel_ops {
>  		uint16_t max_packets);
>  };
> 
> +struct async_nethdr {
> +	struct virtio_net_hdr hdr;
> +	bool valid;
> +};
> +

As a struct exposed in public headers, it's better to prefix it with rte_.
In this case I would prefer rte_async_net_hdr.

>  /**
> - * inflight async packet information
> + * in-flight async packet information
>   */
>  struct async_inflight_info {

Could you help to rename it too? Like rte_async_inflight_info.

>  	struct rte_mbuf *mbuf;
> -	uint16_t descs; /* num of descs inflight */
> +	struct async_nethdr nethdr;
> +	uint16_t descs; /* num of descs in-flight */
>  	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
>  };
> 
> @@ -255,5 +261,31 @@ int rte_vhost_async_get_inflight(int vid, uint16_t
> queue_id);
>  __rte_experimental
>  uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
>  		struct rte_mbuf **pkts, uint16_t count);
> +/**
> + * This function tries to receive packets from the guest with offloading
> + * copies to the async channel. The packets that are transfer completed
> + * are returned in "pkts". The other packets that their copies are submitted
> to
> + * the async channel but not completed are called "in-flight packets".
> + * This function will not return in-flight packets until their copies are
> + * completed by the async channel.
> + *
> + * @param vid
> + *  id of vhost device to dequeue data
> + * @param queue_id
> + *  queue id to dequeue data

Param mbuf_pool is missed.

> + * @param pkts
> + *  blank array to keep successfully dequeued packets
> + * @param count
> + *  size of the packet array
> + * @param nr_inflight
> + *  the amount of in-flight packets. If error occurred, its value is set to -
> 1.
> + * @return
> + *  num of successfully dequeued packets
> + */
> +__rte_experimental
> +uint16_t
> +rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> +	int *nr_inflight);
> 
>  #endif /* _RTE_VHOST_ASYNC_H_ */
> diff --git a/lib/vhost/version.map b/lib/vhost/version.map
> index c92a9d4962..1e033ad8e2 100644
> --- a/lib/vhost/version.map
> +++ b/lib/vhost/version.map
> @@ -85,4 +85,7 @@ EXPERIMENTAL {
>  	rte_vhost_async_channel_register_thread_unsafe;
>  	rte_vhost_async_channel_unregister_thread_unsafe;
>  	rte_vhost_clear_queue_thread_unsafe;
> +
> +	# added in 21.11
> +	rte_vhost_async_try_dequeue_burst;
>  };
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 1e56311725..89a31e4ca8 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -49,7 +49,8 @@

[...]

> +uint16_t
> +rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> +	int *nr_inflight)
> +{
> +	struct virtio_net *dev;
> +	struct rte_mbuf *rarp_mbuf = NULL;
> +	struct vhost_virtqueue *vq;
> +	int16_t success = 1;
> +
> +	*nr_inflight = -1;
> +
> +	dev = get_device(vid);
> +	if (!dev)
> +		return 0;
> +
> +	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
> +		VHOST_LOG_DATA(ERR,
> +			"(%d) %s: built-in vhost net backend is disabled.\n",
> +			dev->vid, __func__);
> +		return 0;
> +	}
> +
> +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
> +		VHOST_LOG_DATA(ERR,
> +			"(%d) %s: invalid virtqueue idx %d.\n",
> +			dev->vid, __func__, queue_id);
> +		return 0;
> +	}
> +
> +	vq = dev->virtqueue[queue_id];
> +
> +	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
> +		return 0;
> +
> +	if (unlikely(vq->enabled == 0)) {
> +		count = 0;
> +		goto out_access_unlock;
> +	}
> +
> +	if (unlikely(!vq->async_registered)) {
> +		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for queue
> id %d.\n",
> +			dev->vid, __func__, queue_id);
> +		count = 0;
> +		goto out_access_unlock;
> +	}
> +
> +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> +		vhost_user_iotlb_rd_lock(vq);
> +
> +	if (unlikely(vq->access_ok == 0))
> +		if (unlikely(vring_translate(dev, vq) < 0)) {
> +			count = 0;
> +			goto out_access_unlock;
> +		}
> +
> +	/*
> +	 * Construct a RARP broadcast packet, and inject it to the "pkts"
> +	 * array, to looks like that guest actually send such packet.
> +	 *
> +	 * Check user_send_rarp() for more information.
> +	 *
> +	 * broadcast_rarp shares a cacheline in the virtio_net structure
> +	 * with some fields that are accessed during enqueue and
> +	 * __atomic_compare_exchange_n causes a write if performed compare
> +	 * and exchange. This could result in false sharing between enqueue
> +	 * and dequeue.
> +	 *
> +	 * Prevent unnecessary false sharing by reading broadcast_rarp first
> +	 * and only performing compare and exchange if the read indicates it
> +	 * is likely to be set.
> +	 */
> +	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
> +			__atomic_compare_exchange_n(&dev->broadcast_rarp,
> +			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
> +
> +		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
> +		if (rarp_mbuf == NULL) {
> +			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
> +			count = 0;
> +			goto out;
> +		}
> +		count -= 1;
> +	}
> +
> +	if (unlikely(vq_is_packed(dev)))
> +		return 0;

Should add a log here.

Thanks,
Chenbo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
  2021-09-10  7:38   ` Yang, YvonneX
@ 2021-09-15  3:02   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Xia, Chenbo @ 2021-09-15  3:02 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil, Yang, YvonneX

Hi,

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>; Jiang,
> Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; Yang, YvonneX <yvonnex.yang@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>
> Subject: [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue
> datapaths
> 
> Previously, by judging the flag, we call different enqueue/dequeue
> functions in data path.
> 
> Now, we use an ops that was initialized when Vhost was created,
> so that we can call ops directly in Vhost data path without any more
> flag judgment.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
>  examples/vhost/main.h       |  31 +++++++++--
>  examples/vhost/virtio_net.c |  16 +++++-
>  3 files changed, 101 insertions(+), 46 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index a4a8214e05..e246b640ea 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>  static char *socket_files;

[...]

> 
> @@ -87,7 +100,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
>  uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  			 struct rte_mbuf **pkts, uint32_t count);
> 
> -uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> -			 struct rte_mempool *mbuf_pool,
> -			 struct rte_mbuf **pkts, uint16_t count);
> +uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mbuf **pkts, uint32_t count);
> +uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
> +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
> +uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);

This func is defined in the 4th. So please remove it here.

Thanks,
Chenbo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
  2021-09-10  7:38   ` Yang, YvonneX
@ 2021-09-15  3:04   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Xia, Chenbo @ 2021-09-15  3:04 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil, Yang, YvonneX

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>; Jiang,
> Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; Yang, YvonneX <yvonnex.yang@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>
> Subject: [PATCH 3/4] examples/vhost: use a new API to query remaining ring
> space
> 
> A new API for querying the remaining descriptor ring capacity
> is available, so we use the new one instead of the old one.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  examples/vhost/ioat.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
> index 457f8171f0..6adc30b622 100644
> --- a/examples/vhost/ioat.c
> +++ b/examples/vhost/ioat.c
> @@ -17,7 +17,6 @@ struct packet_tracker {
>  	unsigned short next_read;
>  	unsigned short next_write;
>  	unsigned short last_remain;
> -	unsigned short ioat_space;
>  };
> 
>  struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
> @@ -113,7 +112,6 @@ open_ioat(const char *value)
>  			goto out;
>  		}
>  		rte_rawdev_start(dev_id);
> -		cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
>  		dma_info->nr++;
>  		i++;
>  	}
> @@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
>  			src = descs[i_desc].src;
>  			dst = descs[i_desc].dst;
>  			i_seg = 0;
> -			if (cb_tracker[dev_id].ioat_space < src->nr_segs)
> +			if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
>  				break;
>  			while (i_seg < src->nr_segs) {
>  				rte_ioat_enqueue_copy(dev_id,
> @@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
>  			}
>  			write &= mask;
>  			cb_tracker[dev_id].size_track[write] = src->nr_segs;
> -			cb_tracker[dev_id].ioat_space -= src->nr_segs;
>  			write++;
>  		}
>  	} else {
> @@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
>  		if (n_seg == 0)
>  			return 0;
> 
> -		cb_tracker[dev_id].ioat_space += n_seg;
>  		n_seg += cb_tracker[dev_id].last_remain;
> 
>  		read = cb_tracker[dev_id].next_read;
> --
> 2.25.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path
  2021-09-06 20:48 ` [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  2021-09-10  7:39   ` Yang, YvonneX
@ 2021-09-15  3:27   ` Xia, Chenbo
  1 sibling, 0 replies; 28+ messages in thread
From: Xia, Chenbo @ 2021-09-15  3:27 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil, Yang, YvonneX

Hi,

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Tuesday, September 7, 2021 4:49 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>; Jiang,
> Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; Yang, YvonneX <yvonnex.yang@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>
> Subject: [PATCH 4/4] examples/vhost: support vhost async dequeue data path
> 
> This patch is to add vhost async dequeue data-path in vhost sample.
> vswitch can leverage IOAT to accelerate vhost async dequeue data-path.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/sample_app_ug/vhost.rst |   9 +-
>  examples/vhost/ioat.c              |  61 +++++++--
>  examples/vhost/ioat.h              |  25 ++++
>  examples/vhost/main.c              | 201 +++++++++++++++++++----------
>  examples/vhost/main.h              |   3 +-
>  5 files changed, 216 insertions(+), 83 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/vhost.rst
> b/doc/guides/sample_app_ug/vhost.rst
> index 9afde9c7f5..63dcf181e1 100644
> --- a/doc/guides/sample_app_ug/vhost.rst
> +++ b/doc/guides/sample_app_ug/vhost.rst
> @@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used
> in combination with dmas
>  **--dmas**
>  This parameter is used to specify the assigned DMA device of a vhost device.
>  Async vhost-user net driver will be used if --dmas is set. For example
> ---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
> -device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
> -enqueue operation.
> +--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
> +DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
> +and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
> +operation. The index of the device corresponds to the socket file in order,
> +that means vhost device 0 is created through the first socket file, vhost
> +device 1 is created through the second socket file, and so on.
> 
>  Common Issues
>  -------------
> diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
> index 6adc30b622..540b61fff6 100644
> --- a/examples/vhost/ioat.c
> +++ b/examples/vhost/ioat.c
> @@ -21,6 +21,8 @@ struct packet_tracker {
> 
>  struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
> 
> +int vid2socketid[MAX_VHOST_DEVICE];
> +
>  int
>  open_ioat(const char *value)
>  {
> @@ -29,7 +31,7 @@ open_ioat(const char *value)
>  	char *addrs = input;
>  	char *ptrs[2];
>  	char *start, *end, *substr;
> -	int64_t vid, vring_id;
> +	int64_t socketid, vring_id;
>  	struct rte_ioat_rawdev_config config;
>  	struct rte_rawdev_info info = { .dev_private = &config };
>  	char name[32];
> @@ -60,6 +62,7 @@ open_ioat(const char *value)
>  		goto out;
>  	}
>  	while (i < args_nr) {
> +		bool is_txd;
>  		char *arg_temp = dma_arg[i];
>  		uint8_t sub_nr;
>  		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
> @@ -68,27 +71,39 @@ open_ioat(const char *value)
>  			goto out;
>  		}
> 
> -		start = strstr(ptrs[0], "txd");
> -		if (start == NULL) {
> +		int async_flag;
> +		char *txd, *rxd;
> +		txd = strstr(ptrs[0], "txd");
> +		rxd = strstr(ptrs[0], "rxd");
> +		if (txd) {
> +			is_txd = true;
> +			start = txd;
> +			async_flag = ASYNC_ENQUEUE_VHOST;
> +		} else if (rxd) {
> +			is_txd = false;
> +			start = rxd;
> +			async_flag = ASYNC_DEQUEUE_VHOST;
> +		} else {
>  			ret = -1;
>  			goto out;
>  		}
> 
>  		start += 3;
> -		vid = strtol(start, &end, 0);
> +		socketid = strtol(start, &end, 0);
>  		if (end == start) {
>  			ret = -1;
>  			goto out;
>  		}
> 
> -		vring_id = 0 + VIRTIO_RXQ;
> +		vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
> +
>  		if (rte_pci_addr_parse(ptrs[1],
> -				&(dma_info + vid)->dmas[vring_id].addr) < 0) {
> +			&(dma_info + socketid)->dmas[vring_id].addr) < 0) {
>  			ret = -1;
>  			goto out;
>  		}
> 
> -		rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
> +		rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
>  				name, sizeof(name));
>  		dev_id = rte_rawdev_get_dev_id(name);
>  		if (dev_id == (uint16_t)(-ENODEV) ||
> @@ -103,8 +118,9 @@ open_ioat(const char *value)
>  			goto out;
>  		}
> 
> -		(dma_info + vid)->dmas[vring_id].dev_id = dev_id;
> -		(dma_info + vid)->dmas[vring_id].is_valid = true;
> +		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
> +		(dma_info + socketid)->dmas[vring_id].is_valid = true;
> +		(dma_info + socketid)->async_flag |= async_flag;
>  		config.ring_size = IOAT_RING_SIZE;
>  		config.hdls_disable = true;
>  		if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
> @@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
>  		struct rte_vhost_async_status *opaque_data, uint16_t count)
>  {
>  	uint32_t i_desc;
> -	uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id;
>  	struct rte_vhost_iov_iter *src = NULL;
>  	struct rte_vhost_iov_iter *dst = NULL;
>  	unsigned long i_seg;
>  	unsigned short mask = MAX_ENQUEUED_SIZE - 1;
> -	unsigned short write = cb_tracker[dev_id].next_write;
> 
> +	if (queue_id >= MAX_RING_COUNT)
> +		return -1;
> +
> +	uint16_t dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
> +	unsigned short write = cb_tracker[dev_id].next_write;
>  	if (!opaque_data) {
>  		for (i_desc = 0; i_desc < count; i_desc++) {
>  			src = descs[i_desc].src;
> @@ -170,16 +189,16 @@ ioat_check_completed_copies_cb(int vid, uint16_t
> queue_id,
>  		struct rte_vhost_async_status *opaque_data,
>  		uint16_t max_packets)
>  {
> -	if (!opaque_data) {
> +	if (!opaque_data && (queue_id < MAX_RING_COUNT)) {

Should be: if (!opaque_data && queue_id < MAX_RING_COUNT) {

>  		uintptr_t dump[255];
>  		int n_seg;
>  		unsigned short read, write;
>  		unsigned short nb_packet = 0;
>  		unsigned short mask = MAX_ENQUEUED_SIZE - 1;
>  		unsigned short i;
> +		uint16_t dev_id;
> 
> -		uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2
> -				+ VIRTIO_RXQ].dev_id;
> +		dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
>  		n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump,
> dump);
>  		if (n_seg < 0) {
>  			RTE_LOG(ERR,
> @@ -215,4 +234,18 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
>  	return -1;
>  }
> 
> +uint32_t get_async_flag_by_vid(int vid)
> +{
> +	return dma_bind[vid2socketid[vid]].async_flag;

async_flag is sometimes int and sometimes uint32_t. Please check code that uses
the flag and make it all uint32_t.

> +}
> +
> +uint32_t get_async_flag_by_socketid(int socketid)
> +{
> +	return dma_bind[socketid].async_flag;
> +}
> +
> +void init_vid2socketid_array(int vid, int socketid)
> +{
> +	vid2socketid[vid] = socketid;
> +}
>  #endif /* RTE_RAW_IOAT */
> diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h
> index 62e163c585..fa5086e662 100644
> --- a/examples/vhost/ioat.h
> +++ b/examples/vhost/ioat.h
> @@ -12,6 +12,9 @@
>  #define MAX_VHOST_DEVICE 1024
>  #define IOAT_RING_SIZE 4096
>  #define MAX_ENQUEUED_SIZE 4096
> +#define MAX_RING_COUNT	2
> +#define ASYNC_ENQUEUE_VHOST	1
> +#define ASYNC_DEQUEUE_VHOST	2
> 
>  struct dma_info {
>  	struct rte_pci_addr addr;
> @@ -20,6 +23,7 @@ struct dma_info {
>  };
> 
>  struct dma_for_vhost {
> +	int async_flag;
>  	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
>  	uint16_t nr;
>  };
> @@ -36,6 +40,10 @@ int32_t
>  ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
>  		struct rte_vhost_async_status *opaque_data,
>  		uint16_t max_packets);
> +
> +uint32_t get_async_flag_by_vid(int vid);
> +uint32_t get_async_flag_by_socketid(int socketid);
> +void init_vid2socketid_array(int vid, int socketid);
>  #else
>  static int open_ioat(const char *value __rte_unused)
>  {
> @@ -59,5 +67,22 @@ ioat_check_completed_copies_cb(int vid __rte_unused,
>  {
>  	return -1;
>  }
> +
> +static uint32_t
> +get_async_flag_by_vid(int vid __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static uint32_t
> +get_async_flag_by_socketid(int socketid __rte_unused)
> +{
> +	return 0;
> +}
> +
> +static void
> +init_vid2socketid_array(int vid __rte_unused, int socketid __rte_unused)
> +{
> +}
>  #endif
>  #endif /* _IOAT_H_ */
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index e246b640ea..b34534111d 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -93,8 +93,6 @@ static int client_mode;
> 
>  static int builtin_net_driver;
> 
> -static int async_vhost_driver;
> -
>  static char *dma_type;
> 
>  /* Specify timeout (in useconds) between retries on RX. */
> @@ -679,7 +677,6 @@ us_vhost_parse_args(int argc, char **argv)
>  				us_vhost_usage(prgname);
>  				return -1;
>  			}
> -			async_vhost_driver = 1;
>  			break;
> 
>  		case OPT_CLIENT_NUM:
> @@ -855,7 +852,8 @@ complete_async_pkts(struct vhost_dev *vdev)
>  					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST);
>  	if (complete_count) {
>  		free_pkts(p_cpl, complete_count);
> -		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count,
> __ATOMIC_SEQ_CST);
> +		__atomic_sub_fetch(&vdev->pkts_enq_inflight,
> +				complete_count, __ATOMIC_SEQ_CST);
>  	}
> 
>  }
> @@ -900,7 +898,7 @@ drain_vhost(struct vhost_dev *vdev)
>  				__ATOMIC_SEQ_CST);
>  	}
> 
> -	if (!async_vhost_driver)
> +	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
>  		free_pkts(m, nr_xmit);
>  }
> 
> @@ -1180,8 +1178,8 @@ async_enqueue_pkts(struct vhost_dev *vdev, uint16_t
> queue_id,
>  	complete_async_pkts(vdev);
>  	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
>  				queue_id, pkts, rx_count);
> -	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
> -					__ATOMIC_SEQ_CST);
> +	__atomic_add_fetch(&vdev->pkts_enq_inflight,
> +			enqueue_count, __ATOMIC_SEQ_CST);
> 
>  	enqueue_fail = rx_count - enqueue_count;
>  	if (enqueue_fail)
> @@ -1237,10 +1235,23 @@ drain_eth_rx(struct vhost_dev *vdev)
>  				__ATOMIC_SEQ_CST);
>  	}
> 
> -	if (!async_vhost_driver)
> +	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
>  		free_pkts(pkts, rx_count);
>  }
> 
> +uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +				struct rte_mempool *mbuf_pool,
> +				struct rte_mbuf **pkts, uint16_t count)
> +{
> +	int nr_inflight;
> +	uint16_t dequeue_count;
> +	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
> +			mbuf_pool, pkts, count, &nr_inflight);
> +	if (likely(nr_inflight != -1))
> +		dev->pkts_deq_inflight = nr_inflight;
> +	return dequeue_count;
> +}
> +
>  uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  			struct rte_mempool *mbuf_pool,
>  			struct rte_mbuf **pkts, uint16_t count)
> @@ -1336,6 +1347,32 @@ switch_worker(void *arg __rte_unused)
>  	return 0;
>  }
> 
> +static void
> +vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
> +{
> +	uint16_t n_pkt = 0;
> +	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
> +	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
> +
> +	if ((queue_id % VIRTIO_QNUM) == 0) {

You are assuming VIRTIO_QNUM equals 2 here. The correct logic should be
'queue_id % 2 == 0' or 'queue_id & 0x1 == 0'

Thanks,
Chenbo

> +		while (vdev->pkts_enq_inflight) {
> +			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
> +				queue_id, m_enq_cpl, vdev->pkts_enq_inflight);
> +			free_pkts(m_enq_cpl, n_pkt);
> +			__atomic_sub_fetch(&vdev->pkts_enq_inflight,
> +					n_pkt, __ATOMIC_SEQ_CST);
> +		}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] vhost: support async dequeue for split ring
       [not found]     ` <CO1PR11MB4897F3D5ABDE7133DB99791385DB9@CO1PR11MB4897.namprd11.prod.outlook.com>
@ 2021-09-15 11:35       ` Xia, Chenbo
  0 siblings, 0 replies; 28+ messages in thread
From: Xia, Chenbo @ 2021-09-15 11:35 UTC (permalink / raw)
  To: Wang, YuanX, Ma, WenwuX, dev, maxime.coquelin
  Cc: Jiang, Cheng1, Hu, Jiayu, Pai G, Sunil, Yang, YvonneX, Wang, Yinan

Hi Maxime & Yuan,

> -----Original Message-----
> From: Wang, YuanX <yuanx.wang@intel.com>
> Sent: Wednesday, September 15, 2021 5:09 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>;
> dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu,
> Jiayu <jiayu.hu@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>; Yang,
> YvonneX <yvonnex.yang@intel.com>; Wang, Yinan <yinan.wang@intel.com>
> Subject: RE: [PATCH 1/4] vhost: support async dequeue for split ring
> 
> Hi Chenbo,
> 
> > -----Original Message-----
> > From: Xia, Chenbo <chenbo.xia@intel.com>
> > Sent: Wednesday, September 15, 2021 10:52 AM
> > To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> > Cc: maxime.coquelin@redhat.com; Jiang, Cheng1 <cheng1.jiang@intel.com>;
> > Hu, Jiayu <jiayu.hu@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>; Yang,
> > YvonneX <yvonnex.yang@intel.com>; Wang, YuanX
> > <yuanx.wang@intel.com>; Wang, Yinan <yinan.wang@intel.com>
> > Subject: RE: [PATCH 1/4] vhost: support async dequeue for split ring
> >
> > Hi,
> >
> > > -----Original Message-----
> > > From: Ma, WenwuX <wenwux.ma@intel.com>
> > > Sent: Tuesday, September 7, 2021 4:49 AM
> > > To: dev@dpdk.org
> > > Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> > > Jiang,
> > > Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Pai
> > > G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> > > <yvonnex.yang@intel.com>; Wang, YuanX <yuanx.wang@intel.com>; Ma,
> > > WenwuX <wenwux.ma@intel.com>; Wang, Yinan <yinan.wang@intel.com>
> > > Subject: [PATCH 1/4] vhost: support async dequeue for split ring
> > >
> > > From: Yuan Wang <yuanx.wang@intel.com>
> > >
> > > This patch implements asynchronous dequeue data path for split ring.
> > > A new asynchronous dequeue function is introduced. With this function,
> > > the application can try to receive packets from the guest with
> > > offloading copies to the async channel, thus saving precious CPU
> > > cycles.
> > >
> > > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > Tested-by: Yinan Wang <yinan.wang@intel.com>
> > > ---
> > >  doc/guides/prog_guide/vhost_lib.rst |   9 +
> > >  lib/vhost/rte_vhost_async.h         |  36 +-
> > >  lib/vhost/version.map               |   3 +
> > >  lib/vhost/vhost.h                   |   3 +-
> > >  lib/vhost/virtio_net.c              | 531 ++++++++++++++++++++++++++++
> > >  5 files changed, 579 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > > b/doc/guides/prog_guide/vhost_lib.rst
> > > index 171e0096f6..9ed544db7a 100644
> > > --- a/doc/guides/prog_guide/vhost_lib.rst
> > > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > > @@ -303,6 +303,15 @@ The following is an overview of some key Vhost
> > > API
> > > functions:
> > >    Clear inflight packets which are submitted to DMA engine in vhost
> > > async data
> > >    path. Completed packets are returned to applications through ``pkts``.
> > >
> > > +* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts,
> > > +count,
> > > nr_inflight)``
> > > +
> > > +  This function tries to receive packets from the guest with
> > > + offloading  copies to the async channel. The packets that are
> > > + transfer completed  are returned in ``pkts``. The other packets that
> > > + their copies are submitted  to the async channel but not completed are
> > called "in-flight packets".
> > > +  This function will not return in-flight packets until their copies
> > > + are  completed by the async channel.
> > > +
> > >  Vhost-user Implementations
> > >  --------------------------
> > >
> > > diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
> > > index ad71555a7f..5e2429ab70 100644
> > > --- a/lib/vhost/rte_vhost_async.h
> > > +++ b/lib/vhost/rte_vhost_async.h
> > > @@ -83,12 +83,18 @@ struct rte_vhost_async_channel_ops {
> > >  		uint16_t max_packets);
> > >  };
> > >
> > > +struct async_nethdr {
> > > +	struct virtio_net_hdr hdr;
> > > +	bool valid;
> > > +};
> > > +
> >
> > As a struct exposed in public headers, it's better to prefix it with rte_.
> > In this case I would prefer rte_async_net_hdr.
> >
> > >  /**
> > > - * inflight async packet information
> > > + * in-flight async packet information
> > >   */
> > >  struct async_inflight_info {
> >
> > Could you help to rename it too? Like rte_async_inflight_info.
> 
> You are right, these two structs are for internal use and not suitable for
> exposure in the public header,
> but they are used for async channel, I think it's not suitable to be placed in
> other headers.
> Could you give some advice on which file to put them in?

@Maxime, What do you think of this? I think either changing it/renaming it/moving it
is ABI breakage. But since it's never used by any APP, I guess it's not big problem.
So what do you think we should do with the struct? I will vote for move it temporarily
to header like vhost.h. At some point, we can create a new internal async header for
structs like this. Or create it now?

@Yuan, I think again of the struct async_nethdr, do we really need to define this?
As for now, header being invalid only happens when virtio_net_with_host_offload(dev)
is false, right? So why not use this to know hdr invalid or not when you need to check?

Thanks,
Chenbo

> 
> >
> > >  	struct rte_mbuf *mbuf;
> > > -	uint16_t descs; /* num of descs inflight */
> > > +	struct async_nethdr nethdr;
> > > +	uint16_t descs; /* num of descs in-flight */
> > >  	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
> > > };
> > >
> > > @@ -255,5 +261,31 @@ int rte_vhost_async_get_inflight(int vid,
> > > uint16_t queue_id);  __rte_experimental  uint16_t
> > > rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
> > >  		struct rte_mbuf **pkts, uint16_t count);
> > > +/**
> > > + * This function tries to receive packets from the guest with
> > > +offloading
> > > + * copies to the async channel. The packets that are transfer
> > > +completed
> > > + * are returned in "pkts". The other packets that their copies are
> > > +submitted
> > > to
> > > + * the async channel but not completed are called "in-flight packets".
> > > + * This function will not return in-flight packets until their copies
> > > + are
> > > + * completed by the async channel.
> > > + *
> > > + * @param vid
> > > + *  id of vhost device to dequeue data
> > > + * @param queue_id
> > > + *  queue id to dequeue data
> >
> > Param mbuf_pool is missed.
> 
> Thanks, will fix it in next version.
> 
> Regards,
> Yuan
> 
> >
> > > + * @param pkts
> > > + *  blank array to keep successfully dequeued packets
> > > + * @param count
> > > + *  size of the packet array
> > > + * @param nr_inflight
> > > + *  the amount of in-flight packets. If error occurred, its value is
> > > + set to -
> > > 1.
> > > + * @return
> > > + *  num of successfully dequeued packets  */ __rte_experimental
> > > +uint16_t rte_vhost_async_try_dequeue_burst(int vid, uint16_t
> > > +queue_id,
> > > +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count,
> > > +	int *nr_inflight);
> > >
> > >  #endif /* _RTE_VHOST_ASYNC_H_ */
> > > diff --git a/lib/vhost/version.map b/lib/vhost/version.map index
> > > c92a9d4962..1e033ad8e2 100644
> > > --- a/lib/vhost/version.map
> > > +++ b/lib/vhost/version.map
> > > @@ -85,4 +85,7 @@ EXPERIMENTAL {
> > >  	rte_vhost_async_channel_register_thread_unsafe;
> > >  	rte_vhost_async_channel_unregister_thread_unsafe;
> > >  	rte_vhost_clear_queue_thread_unsafe;
> > > +
> > > +	# added in 21.11
> > > +	rte_vhost_async_try_dequeue_burst;
> > >  };
> > > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index
> > > 1e56311725..89a31e4ca8 100644
> > > --- a/lib/vhost/vhost.h
> > > +++ b/lib/vhost/vhost.h
> > > @@ -49,7 +49,8 @@
> >
> > [...]
> >
> > > +uint16_t
> > > +rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> > > +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count,
> > > +	int *nr_inflight)
> > > +{
> > > +	struct virtio_net *dev;
> > > +	struct rte_mbuf *rarp_mbuf = NULL;
> > > +	struct vhost_virtqueue *vq;
> > > +	int16_t success = 1;
> > > +
> > > +	*nr_inflight = -1;
> > > +
> > > +	dev = get_device(vid);
> > > +	if (!dev)
> > > +		return 0;
> > > +
> > > +	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
> > > +		VHOST_LOG_DATA(ERR,
> > > +			"(%d) %s: built-in vhost net backend is disabled.\n",
> > > +			dev->vid, __func__);
> > > +		return 0;
> > > +	}
> > > +
> > > +	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
> > > +		VHOST_LOG_DATA(ERR,
> > > +			"(%d) %s: invalid virtqueue idx %d.\n",
> > > +			dev->vid, __func__, queue_id);
> > > +		return 0;
> > > +	}
> > > +
> > > +	vq = dev->virtqueue[queue_id];
> > > +
> > > +	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
> > > +		return 0;
> > > +
> > > +	if (unlikely(vq->enabled == 0)) {
> > > +		count = 0;
> > > +		goto out_access_unlock;
> > > +	}
> > > +
> > > +	if (unlikely(!vq->async_registered)) {
> > > +		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for
> > queue
> > > id %d.\n",
> > > +			dev->vid, __func__, queue_id);
> > > +		count = 0;
> > > +		goto out_access_unlock;
> > > +	}
> > > +
> > > +	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> > > +		vhost_user_iotlb_rd_lock(vq);
> > > +
> > > +	if (unlikely(vq->access_ok == 0))
> > > +		if (unlikely(vring_translate(dev, vq) < 0)) {
> > > +			count = 0;
> > > +			goto out_access_unlock;
> > > +		}
> > > +
> > > +	/*
> > > +	 * Construct a RARP broadcast packet, and inject it to the "pkts"
> > > +	 * array, to looks like that guest actually send such packet.
> > > +	 *
> > > +	 * Check user_send_rarp() for more information.
> > > +	 *
> > > +	 * broadcast_rarp shares a cacheline in the virtio_net structure
> > > +	 * with some fields that are accessed during enqueue and
> > > +	 * __atomic_compare_exchange_n causes a write if performed
> > compare
> > > +	 * and exchange. This could result in false sharing between enqueue
> > > +	 * and dequeue.
> > > +	 *
> > > +	 * Prevent unnecessary false sharing by reading broadcast_rarp first
> > > +	 * and only performing compare and exchange if the read indicates it
> > > +	 * is likely to be set.
> > > +	 */
> > > +	if (unlikely(__atomic_load_n(&dev->broadcast_rarp,
> > __ATOMIC_ACQUIRE) &&
> > > +			__atomic_compare_exchange_n(&dev-
> > >broadcast_rarp,
> > > +			&success, 0, 0, __ATOMIC_RELEASE,
> > __ATOMIC_RELAXED))) {
> > > +
> > > +		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev-
> > >mac);
> > > +		if (rarp_mbuf == NULL) {
> > > +			VHOST_LOG_DATA(ERR, "Failed to make RARP
> > packet.\n");
> > > +			count = 0;
> > > +			goto out;
> > > +		}
> > > +		count -= 1;
> > > +	}
> > > +
> > > +	if (unlikely(vq_is_packed(dev)))
> > > +		return 0;
> >
> > Should add a log here.
> >
> > Thanks,
> > Chenbo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] support async dequeue for split ring
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
                   ` (4 preceding siblings ...)
  2021-09-10  7:33 ` [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Yang, YvonneX
@ 2021-09-17 19:26 ` Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 1/4] vhost: " Wenwu Ma
                     ` (3 more replies)
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
  6 siblings, 4 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-17 19:26 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with offloading
copies to the DMA engine, thus saving precious CPU cycles.

v2:
- Removed struct async_nethdr in 1/4.
- Removed a useless function declaration in 2/4,
  and fixed some coding style in 4/4.

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst |   9 +
 doc/guides/sample_app_ug/vhost.rst  |   9 +-
 examples/vhost/ioat.c               |  67 +++-
 examples/vhost/ioat.h               |  25 ++
 examples/vhost/main.c               | 269 +++++++++-----
 examples/vhost/main.h               |  34 +-
 examples/vhost/virtio_net.c         |  16 +-
 lib/vhost/rte_vhost_async.h         |  33 +-
 lib/vhost/version.map               |   3 +
 lib/vhost/vhost.h                   |   3 +-
 lib/vhost/virtio_net.c              | 530 ++++++++++++++++++++++++++++
 11 files changed, 877 insertions(+), 121 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] vhost: support async dequeue for split ring
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
@ 2021-09-17 19:27   ` Wenwu Ma
  2021-09-27  6:33     ` Jiang, Cheng1
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Wenwu Ma @ 2021-09-17 19:27 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Yuan Wang, Wenwu Ma, Yinan Wang

From: Yuan Wang <yuanx.wang@intel.com>

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading copies to the async channel, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst |   9 +
 lib/vhost/rte_vhost_async.h         |  33 +-
 lib/vhost/version.map               |   3 +
 lib/vhost/vhost.h                   |   3 +-
 lib/vhost/virtio_net.c              | 530 ++++++++++++++++++++++++++++
 5 files changed, 575 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 171e0096f6..9ed544db7a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -303,6 +303,15 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, nr_inflight)``
+
+  This function tries to receive packets from the guest with offloading
+  copies to the async channel. The packets that are transfer completed
+  are returned in ``pkts``. The other packets that their copies are submitted
+  to the async channel but not completed are called "in-flight packets".
+  This function will not return in-flight packets until their copies are
+  completed by the async channel.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index ad71555a7f..973efa19b1 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -84,11 +84,12 @@ struct rte_vhost_async_channel_ops {
 };
 
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
 struct async_inflight_info {
 	struct rte_mbuf *mbuf;
-	uint16_t descs; /* num of descs inflight */
+	struct virtio_net_hdr nethdr;
+	uint16_t descs; /* num of descs in-flight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
 };
 
@@ -255,5 +256,33 @@ int rte_vhost_async_get_inflight(int vid, uint16_t queue_id);
 __rte_experimental
 uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 		struct rte_mbuf **pkts, uint16_t count);
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  id of vhost device to dequeue data
+ * @param queue_id
+ *  queue id to dequeue data
+ * @param mbuf_pool
+ *  mbuf_pool where host mbuf is allocated.
+ * @param pkts
+ *  blank array to keep successfully dequeued packets
+ * @param count
+ *  size of the packet array
+ * @param nr_inflight
+ *  the amount of in-flight packets. If error occurred, its value is set to -1.
+ * @return
+ *  num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight);
 
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index c92a9d4962..1e033ad8e2 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -85,4 +85,7 @@ EXPERIMENTAL {
 	rte_vhost_async_channel_register_thread_unsafe;
 	rte_vhost_async_channel_unregister_thread_unsafe;
 	rte_vhost_clear_queue_thread_unsafe;
+
+	# added in 21.11
+	rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 1e56311725..89a31e4ca8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -49,7 +49,8 @@
 #define MAX_PKT_BURST 32
 
 #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2)
-#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 4)
+#define MAX_ASYNC_COPY_VECTOR 1024
+#define VHOST_MAX_ASYNC_VEC (MAX_ASYNC_COPY_VECTOR * 2)
 
 #define PACKED_DESC_ENQUEUE_USED_FLAG(w)	\
 	((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 0350f6fcce..e7a802688f 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3170,3 +3170,533 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline int
+async_desc_to_mbuf(struct virtio_net *dev,
+		  struct buf_vector *buf_vec, uint16_t nr_vec,
+		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+		  struct iovec *src_iovec, struct iovec *dst_iovec,
+		  struct rte_vhost_iov_iter *src_it,
+		  struct rte_vhost_iov_iter *dst_it,
+		  struct virtio_net_hdr *nethdr,
+		  int nr_iovec)
+{
+	uint64_t buf_addr, buf_iova;
+	uint64_t mapped_len;
+	uint32_t tlen = 0;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint32_t mbuf_avail, mbuf_offset;
+	uint32_t cpy_len;
+	/* A counter to avoid desc dead loop chain */
+	uint16_t vec_idx = 0;
+	int tvec_idx = 0;
+	struct rte_mbuf *cur = m, *prev = m;
+	struct virtio_net_hdr tmp_hdr;
+	struct virtio_net_hdr *hdr = NULL;
+
+	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_len = buf_vec[vec_idx].buf_len;
+	buf_iova = buf_vec[vec_idx].buf_iova;
+
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
+
+	if (virtio_net_with_host_offload(dev)) {
+		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
+			/*
+			 * No luck, the virtio-net header doesn't fit
+			 * in a contiguous virtual area.
+			 */
+			copy_vnet_hdr_from_desc(&tmp_hdr, buf_vec);
+			hdr = &tmp_hdr;
+		} else {
+			hdr = (struct virtio_net_hdr *)((uintptr_t)buf_addr);
+		}
+	}
+
+	/*
+	 * A virtio driver normally uses at least 2 desc buffers
+	 * for Tx: the first for storing the header, and others
+	 * for storing the data.
+	 */
+	if (unlikely(buf_len < dev->vhost_hlen)) {
+		buf_offset = dev->vhost_hlen - buf_len;
+		vec_idx++;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+		buf_avail  = buf_len - buf_offset;
+	} else if (buf_len == dev->vhost_hlen) {
+		if (unlikely(++vec_idx >= nr_vec))
+			return -1;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+
+		buf_offset = 0;
+		buf_avail = buf_len;
+	} else {
+		buf_offset = dev->vhost_hlen;
+		buf_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
+	}
+
+	PRINT_PACKET(dev, (uintptr_t)(buf_addr + buf_offset), (uint32_t)buf_avail, 0);
+
+	mbuf_offset = 0;
+	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+	while (1) {
+		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
+
+		while (cpy_len) {
+			void *hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+						buf_iova + buf_offset, cpy_len,
+						&mapped_len);
+			if (unlikely(!hpa)) {
+				VHOST_LOG_DATA(ERR, "(%d) %s: failed to get hpa.\n",
+					dev->vid, __func__);
+				return -1;
+			}
+			if (unlikely(tvec_idx >= nr_iovec)) {
+				VHOST_LOG_DATA(ERR, "iovec is not enough for offloading\n");
+				return -1;
+			}
+
+			async_fill_vec(src_iovec + tvec_idx, hpa, (size_t)mapped_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+				(void *)(uintptr_t)rte_pktmbuf_iova_offset(cur, mbuf_offset),
+				(size_t)mapped_len);
+
+			tvec_idx++;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
+		}
+
+		/* This buf reaches to its end, get the next one */
+		if (buf_avail == 0) {
+			if (++vec_idx >= nr_vec)
+				break;
+
+			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
+			buf_len = buf_vec[vec_idx].buf_len;
+
+			buf_offset = 0;
+			buf_avail = buf_len;
+
+			PRINT_PACKET(dev, (uintptr_t)buf_addr, (uint32_t)buf_avail, 0);
+		}
+
+		/*
+		 * This mbuf reaches to its end, get a new one
+		 * to hold more data.
+		 */
+		if (mbuf_avail == 0) {
+			cur = rte_pktmbuf_alloc(mbuf_pool);
+			if (unlikely(cur == NULL)) {
+				VHOST_LOG_DATA(ERR, "Failed to allocate memory for mbuf.\n");
+				return -1;
+			}
+
+			prev->next = cur;
+			prev->data_len = mbuf_offset;
+			m->nb_segs += 1;
+			m->pkt_len += mbuf_offset;
+			prev = cur;
+
+			mbuf_offset = 0;
+			mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM;
+		}
+	}
+
+	prev->data_len = mbuf_offset;
+	m->pkt_len += mbuf_offset;
+
+	if (tlen) {
+		async_fill_iter(src_it, tlen, src_iovec, tvec_idx);
+		async_fill_iter(dst_it, tlen, dst_iovec, tvec_idx);
+		if (hdr)
+			*nethdr = *hdr;
+	}
+	return 0;
+}
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, bool legacy_ol_flags)
+{
+	uint16_t n_pkts_cpl = 0, n_pkts_put = 0;
+	uint16_t start_idx, pkt_idx, from;
+	struct async_inflight_info *pkts_info;
+
+	pkt_idx = vq->async_pkts_idx & (vq->size - 1);
+	pkts_info = vq->async_pkts_info;
+	start_idx = virtio_dev_rx_async_get_info_idx(pkt_idx, vq->size,
+			vq->async_pkts_inflight_n);
+
+	if (count > vq->async_last_pkts_n) {
+		int ret;
+
+		ret = vq->async_ops.check_completed_copies(dev->vid, queue_id,
+				0, count - vq->async_last_pkts_n);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel poll error\n", dev->vid);
+			ret = 0;
+		}
+		n_pkts_cpl = ret;
+	}
+
+	n_pkts_cpl += vq->async_last_pkts_n;
+	if (unlikely(n_pkts_cpl == 0))
+		return 0;
+
+	n_pkts_put = RTE_MIN(count, n_pkts_cpl);
+
+	for (pkt_idx = 0; pkt_idx < n_pkts_put; pkt_idx++) {
+		from = (start_idx + pkt_idx) & (vq->size - 1);
+		pkts[pkt_idx] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(&pkts_info[from].nethdr,
+					pkts[pkt_idx], legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, n_pkts_put);
+	__atomic_add_fetch(&vq->used->idx, n_pkts_put, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async_last_pkts_n = n_pkts_cpl - n_pkts_put;
+	vq->async_pkts_inflight_n -= n_pkts_put;
+
+	return n_pkts_put;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t nr_async_burst = 0;
+	uint16_t pkt_err = 0;
+	uint16_t iovec_idx = 0, it_idx = 0;
+	struct rte_vhost_iov_iter *it_pool = vq->it_pool;
+	struct iovec *vec_pool = vq->vec_pool;
+	struct iovec *src_iovec = vec_pool;
+	struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
+	struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
+	struct async_inflight_info *pkts_info = vq->async_pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%d) about to dequeue %u buffers\n", dev->vid, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed mbuf alloc of size %d from %s on %s.\n",
+					buf_len, mbuf_pool->name, dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
+		err = async_desc_to_mbuf(dev, buf_vec, nr_vec, pkt,
+				mbuf_pool, &src_iovec[iovec_idx],
+				&dst_iovec[iovec_idx], &it_pool[it_idx],
+				&it_pool[it_idx + 1],
+				&pkts_info[slot_idx].nethdr,
+				(VHOST_MAX_ASYNC_VEC >> 1) - iovec_idx);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed to offload copies to async channel %s.\n",
+					dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		async_fill_desc(&tdes[nr_async_burst], &it_pool[it_idx], &it_pool[it_idx + 1]);
+		pkts_info[slot_idx].mbuf = pkt;
+		nr_async_burst++;
+
+		iovec_idx += it_pool[it_idx].nr_segs;
+		it_idx += 2;
+
+		/* store used descs */
+		to = vq->async_desc_idx_split & (vq->size - 1);
+		vq->async_descs_split[to].id = head_idx;
+		vq->async_descs_split[to].len = 0;
+		vq->async_desc_idx_split++;
+
+		vq->last_avail_idx++;
+
+		if (unlikely(nr_async_burst >= VHOST_ASYNC_BATCH_THRESHOLD)) {
+			uint16_t nr_pkts;
+			int32_t ret;
+
+			ret = vq->async_ops.transfer_data(dev->vid, queue_id,
+					tdes, 0, nr_async_burst);
+			if (unlikely(ret < 0)) {
+				VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+				ret = 0;
+			}
+			nr_pkts = ret;
+
+			vq->async_pkts_inflight_n += nr_pkts;
+			it_idx = 0;
+			iovec_idx = 0;
+
+			if (unlikely(nr_pkts < nr_async_burst)) {
+				pkt_err = nr_async_burst - nr_pkts;
+				nr_async_burst = 0;
+				pkt_idx++;
+				break;
+			}
+			nr_async_burst = 0;
+		}
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	if (nr_async_burst) {
+		uint16_t nr_pkts;
+		int32_t ret;
+
+		ret = vq->async_ops.transfer_data(dev->vid, queue_id, tdes, 0, nr_async_burst);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+			ret = 0;
+		}
+		nr_pkts = ret;
+
+		vq->async_pkts_inflight_n += nr_pkts;
+
+		if (unlikely(nr_pkts < nr_async_burst))
+			pkt_err = nr_async_burst - nr_pkts;
+	}
+
+	if (unlikely(pkt_err)) {
+		uint16_t nr_err_dma = pkt_err;
+
+		pkt_idx -= nr_err_dma;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		vq->async_desc_idx_split -= nr_err_dma;
+		while (nr_err_dma-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+	}
+
+	vq->async_pkts_idx += pkt_idx;
+
+out:
+	if (vq->async_pkts_inflight_n > 0) {
+		nr_done_pkts = async_poll_dequeue_completed_split(dev, vq,
+					queue_id, pkts, count, legacy_ol_flags);
+	}
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: built-in vhost net backend is disabled.\n",
+			dev->vid, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: invalid virtqueue idx %d.\n",
+			dev->vid, __func__, queue_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async_registered)) {
+		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for queue id %d.\n",
+			dev->vid, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out_access_unlock;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: async dequeue does not support packed ring.\n",
+			dev->vid, __func__);
+		return 0;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+
+out:
+	*nr_inflight = vq->async_pkts_inflight_n;
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL)) {
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		memmove(&pkts[1], pkts, count * sizeof(struct rte_mbuf *));
+		pkts[0] = rarp_mbuf;
+		count += 1;
+	}
+
+	return count;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 1/4] vhost: " Wenwu Ma
@ 2021-09-17 19:27   ` Wenwu Ma
  2021-09-27  6:56     ` Jiang, Cheng1
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  3 siblings, 1 reply; 28+ messages in thread
From: Wenwu Ma @ 2021-09-17 19:27 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
 examples/vhost/main.h       |  28 ++++++++--
 examples/vhost/virtio_net.c |  16 +++++-
 3 files changed, 98 insertions(+), 46 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d0bf1f31e3..254f7097bc 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -879,22 +881,8 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1173,6 +1161,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+
+	complete_async_pkts(vdev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+				queue_id, pkts, rx_count);
+	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
+					__ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1203,25 +1218,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+						VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1234,6 +1232,14 @@ drain_eth_rx(struct vhost_dev *vdev)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id,
+					mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1241,13 +1247,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1432,6 +1433,21 @@ new_device(int vid)
 		}
 	}
 
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (async_vhost_driver) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							sync_enqueue_pkts;
+		}
+
+		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7b1ac60a6..2c5a558f12 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,7 +100,16 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] examples/vhost: use a new API to query remaining ring space
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 1/4] vhost: " Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
@ 2021-09-17 19:27   ` Wenwu Ma
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-17 19:27 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 examples/vhost/ioat.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 457f8171f0..6adc30b622 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
 	unsigned short next_read;
 	unsigned short next_write;
 	unsigned short last_remain;
-	unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
 			goto out;
 		}
 		rte_rawdev_start(dev_id);
-		cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
 		dma_info->nr++;
 		i++;
 	}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			src = descs[i_desc].src;
 			dst = descs[i_desc].dst;
 			i_seg = 0;
-			if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+			if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
 				break;
 			while (i_seg < src->nr_segs) {
 				rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			}
 			write &= mask;
 			cb_tracker[dev_id].size_track[write] = src->nr_segs;
-			cb_tracker[dev_id].ioat_space -= src->nr_segs;
 			write++;
 		}
 	} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		if (n_seg == 0)
 			return 0;
 
-		cb_tracker[dev_id].ioat_space += n_seg;
 		n_seg += cb_tracker[dev_id].last_remain;
 
 		read = cb_tracker[dev_id].next_read;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] examples/vhost: support vhost async dequeue data path
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
                     ` (2 preceding siblings ...)
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
@ 2021-09-17 19:27   ` Wenwu Ma
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-17 19:27 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c              |  61 +++++++--
 examples/vhost/ioat.h              |  25 ++++
 examples/vhost/main.c              | 201 +++++++++++++++++++----------
 examples/vhost/main.h              |   6 +-
 5 files changed, 219 insertions(+), 83 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 6adc30b622..3a256b0f4c 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid, vring_id;
+	int64_t socketid, vring_id;
 	struct rte_ioat_rawdev_config config;
 	struct rte_rawdev_info info = { .dev_private = &config };
 	char name[32];
@@ -60,6 +62,7 @@ open_ioat(const char *value)
 		goto out;
 	}
 	while (i < args_nr) {
+		bool is_txd;
 		char *arg_temp = dma_arg[i];
 		uint8_t sub_nr;
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +71,39 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		int async_flag;
+		char *txd, *rxd;
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			is_txd = true;
+			start = txd;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			is_txd = false;
+			start = rxd;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
 		}
 
-		vring_id = 0 + VIRTIO_RXQ;
+		vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
 		if (rte_pci_addr_parse(ptrs[1],
-				&(dma_info + vid)->dmas[vring_id].addr) < 0) {
+			&(dma_info + socketid)->dmas[vring_id].addr) < 0) {
 			ret = -1;
 			goto out;
 		}
 
-		rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+		rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
 				name, sizeof(name));
 		dev_id = rte_rawdev_get_dev_id(name);
 		if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		(dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-		(dma_info + vid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->async_flag |= async_flag;
 		config.ring_size = IOAT_RING_SIZE;
 		config.hdls_disable = true;
 		if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data, uint16_t count)
 {
 	uint32_t i_desc;
-	uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id;
 	struct rte_vhost_iov_iter *src = NULL;
 	struct rte_vhost_iov_iter *dst = NULL;
 	unsigned long i_seg;
 	unsigned short mask = MAX_ENQUEUED_SIZE - 1;
-	unsigned short write = cb_tracker[dev_id].next_write;
 
+	if (queue_id >= MAX_RING_COUNT)
+		return -1;
+
+	uint16_t dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
+	unsigned short write = cb_tracker[dev_id].next_write;
 	if (!opaque_data) {
 		for (i_desc = 0; i_desc < count; i_desc++) {
 			src = descs[i_desc].src;
@@ -170,16 +189,16 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets)
 {
-	if (!opaque_data) {
+	if (!opaque_data && queue_id < MAX_RING_COUNT) {
 		uintptr_t dump[255];
 		int n_seg;
 		unsigned short read, write;
 		unsigned short nb_packet = 0;
 		unsigned short mask = MAX_ENQUEUED_SIZE - 1;
 		unsigned short i;
+		uint16_t dev_id;
 
-		uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2
-				+ VIRTIO_RXQ].dev_id;
+		dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
 		n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump, dump);
 		if (n_seg < 0) {
 			RTE_LOG(ERR,
@@ -215,4 +234,18 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 	return -1;
 }
 
+uint32_t get_async_flag_by_vid(int vid)
+{
+	return dma_bind[vid2socketid[vid]].async_flag;
+}
+
+uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
 #endif /* RTE_RAW_IOAT */
diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h
index 62e163c585..105cee556d 100644
--- a/examples/vhost/ioat.h
+++ b/examples/vhost/ioat.h
@@ -12,6 +12,9 @@
 #define MAX_VHOST_DEVICE 1024
 #define IOAT_RING_SIZE 4096
 #define MAX_ENQUEUED_SIZE 4096
+#define MAX_RING_COUNT	2
+#define ASYNC_ENQUEUE_VHOST	1
+#define ASYNC_DEQUEUE_VHOST	2
 
 struct dma_info {
 	struct rte_pci_addr addr;
@@ -20,6 +23,7 @@ struct dma_info {
 };
 
 struct dma_for_vhost {
+	uint32_t async_flag;
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
 	uint16_t nr;
 };
@@ -36,6 +40,10 @@ int32_t
 ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets);
+
+uint32_t get_async_flag_by_vid(int vid);
+uint32_t get_async_flag_by_socketid(int socketid);
+void init_vid2socketid_array(int vid, int socketid);
 #else
 static int open_ioat(const char *value __rte_unused)
 {
@@ -59,5 +67,22 @@ ioat_check_completed_copies_cb(int vid __rte_unused,
 {
 	return -1;
 }
+
+static uint32_t
+get_async_flag_by_vid(int vid __rte_unused)
+{
+	return 0;
+}
+
+static uint32_t
+get_async_flag_by_socketid(int socketid __rte_unused)
+{
+	return 0;
+}
+
+static void
+init_vid2socketid_array(int vid __rte_unused, int socketid __rte_unused)
+{
+}
 #endif
 #endif /* _IOAT_H_ */
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 254f7097bc..572ffc12ae 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -93,8 +93,6 @@ static int client_mode;
 
 static int builtin_net_driver;
 
-static int async_vhost_driver;
-
 static char *dma_type;
 
 /* Specify timeout (in useconds) between retries on RX. */
@@ -673,7 +671,6 @@ us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
-			async_vhost_driver = 1;
 			break;
 
 		case OPT_CLIENT_NUM:
@@ -846,7 +843,8 @@ complete_async_pkts(struct vhost_dev *vdev)
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+				complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -891,7 +889,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1171,8 +1169,8 @@ async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
 	complete_async_pkts(vdev);
 	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
 				queue_id, pkts, rx_count);
-	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
-					__ATOMIC_SEQ_CST);
+	__atomic_add_fetch(&vdev->pkts_enq_inflight,
+			enqueue_count, __ATOMIC_SEQ_CST);
 
 	enqueue_fail = rx_count - enqueue_count;
 	if (enqueue_fail)
@@ -1228,10 +1226,23 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+	return dequeue_count;
+}
+
 uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mempool *mbuf_pool,
 			struct rte_mbuf **pkts, uint16_t count)
@@ -1327,6 +1338,32 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchonization  occurs through the use of the
@@ -1383,21 +1420,91 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (async_vhost_driver) {
-		uint16_t n_pkt = 0;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+	if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	}
+	if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+	}
+
+	rte_free(vdev);
+}
+
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						sync_enqueue_pkts;
+		}
 
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						async_dequeue_pkts;
+		} else {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						sync_dequeue_pkts;
 		}
+	}
 
-		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int ret = 0;
+	struct rte_vhost_async_config config = {0};
+	struct rte_vhost_async_channel_ops channel_ops;
+
+	if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
+		channel_ops.transfer_data = ioat_transfer_data_cb;
+		channel_ops.check_completed_copies =
+			ioat_check_completed_copies_cb;
+
+		config.features = RTE_VHOST_ASYNC_INORDER;
+
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
+					config, &channel_ops);
+		}
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_TXQ,
+					config, &channel_ops);
+		}
 	}
 
-	rte_free(vdev);
+	return ret;
 }
 
 /*
@@ -1433,20 +1540,8 @@ new_device(int vid)
 		}
 	}
 
-	if (builtin_net_driver) {
-		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
-		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
-	} else {
-		if (async_vhost_driver) {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							async_enqueue_pkts;
-		} else {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							sync_enqueue_pkts;
-		}
-
-		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
-	}
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
 
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
@@ -1475,27 +1570,13 @@ new_device(int vid)
 	rte_vhost_enable_guest_notification(vid, VIRTIO_RXQ, 0);
 	rte_vhost_enable_guest_notification(vid, VIRTIO_TXQ, 0);
 
+	int ret = vhost_async_channel_register(vid);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (async_vhost_driver) {
-		struct rte_vhost_async_config config = {0};
-		struct rte_vhost_async_channel_ops channel_ops;
-
-		if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
-			channel_ops.transfer_data = ioat_transfer_data_cb;
-			channel_ops.check_completed_copies =
-				ioat_check_completed_copies_cb;
-
-			config.features = RTE_VHOST_ASYNC_INORDER;
-
-			return rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
-				config, &channel_ops);
-		}
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1513,19 +1594,8 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (async_vhost_driver) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
-	}
+	if (!enable)
+		vhost_clear_queue_thread_unsafe(vdev, queue_id);
 
 	return 0;
 }
@@ -1769,10 +1839,11 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (async_vhost_driver)
-			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
+		uint64_t flag = flags;
+		if (get_async_flag_by_socketid(i) != 0)
+			flag |= RTE_VHOST_USER_ASYNC_COPY;
 
-		ret = rte_vhost_driver_register(file, flags);
+		ret = rte_vhost_driver_register(file, flag);
 		if (ret != 0) {
 			unregister_drivers(i);
 			rte_exit(EXIT_FAILURE,
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 2c5a558f12..5af7e7d97f 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -51,7 +51,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -112,4 +113,7 @@ uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mbuf **pkts, uint16_t count);
 uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/4] vhost: support async dequeue for split ring
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 1/4] vhost: " Wenwu Ma
@ 2021-09-27  6:33     ` Jiang, Cheng1
  0 siblings, 0 replies; 28+ messages in thread
From: Jiang, Cheng1 @ 2021-09-27  6:33 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Hu, Jiayu, Pai G, Sunil, Yang,
	YvonneX, Wang, YuanX, Wang, Yinan

Hi Wenwu,

Comments are inline.

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Saturday, September 18, 2021 3:27 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Wang, YuanX <yuanx.wang@intel.com>; Ma,
> WenwuX <wenwux.ma@intel.com>; Wang, Yinan <yinan.wang@intel.com>
> Subject: [PATCH v2 1/4] vhost: support async dequeue for split ring
> 
> From: Yuan Wang <yuanx.wang@intel.com>
> 
> This patch implements asynchronous dequeue data path for split ring.
> A new asynchronous dequeue function is introduced. With this function, the
> application can try to receive packets from the guest with offloading copies
> to the async channel, thus saving precious CPU cycles.
> 
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Tested-by: Yinan Wang <yinan.wang@intel.com>
> Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst |   9 +
>  lib/vhost/rte_vhost_async.h         |  33 +-
>  lib/vhost/version.map               |   3 +
>  lib/vhost/vhost.h                   |   3 +-
>  lib/vhost/virtio_net.c              | 530 ++++++++++++++++++++++++++++
>  5 files changed, 575 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 171e0096f6..9ed544db7a 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -303,6 +303,15 @@ The following is an overview of some key Vhost API
> functions:
>    Clear inflight packets which are submitted to DMA engine in vhost async
> data
>    path. Completed packets are returned to applications through ``pkts``.
> 
> +* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts,
> +count, nr_inflight)``
> +
> +  This function tries to receive packets from the guest with offloading
> + copies to the async channel. The packets that are transfer completed
> + are returned in ``pkts``. The other packets that their copies are
> + submitted  to the async channel but not completed are called "in-flight
> packets".
> +  This function will not return in-flight packets until their copies
> + are  completed by the async channel.
> +
>  Vhost-user Implementations
>  --------------------------
> 
> diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h index
> ad71555a7f..973efa19b1 100644
> --- a/lib/vhost/rte_vhost_async.h
> +++ b/lib/vhost/rte_vhost_async.h
> @@ -84,11 +84,12 @@ struct rte_vhost_async_channel_ops {  };
> 
>  /**
> - * inflight async packet information
> + * in-flight async packet information
>   */
>  struct async_inflight_info {
>  	struct rte_mbuf *mbuf;
> -	uint16_t descs; /* num of descs inflight */
> +	struct virtio_net_hdr nethdr;
> +	uint16_t descs; /* num of descs in-flight */
>  	uint16_t nr_buffers; /* num of buffers inflight for packed ring */  };
> 
> @@ -255,5 +256,33 @@ int rte_vhost_async_get_inflight(int vid, uint16_t
> queue_id);  __rte_experimental  uint16_t
> rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
>  		struct rte_mbuf **pkts, uint16_t count);

Blank line is needed here.

> +/**
> + * This function tries to receive packets from the guest with
> +offloading
> + * copies to the async channel. The packets that are transfer completed
> + * are returned in "pkts". The other packets that their copies are
> +submitted to
> + * the async channel but not completed are called "in-flight packets".
> + * This function will not return in-flight packets until their copies
> +are
> + * completed by the async channel.
> + *
> + * @param vid
> + *  id of vhost device to dequeue data

The Initials should be in uppercase. The following also needs to be changed.

> + * @param queue_id
> + *  queue id to dequeue data

Should be 'ID of virtqueue ......'.

Thanks,
Cheng

> + * @param mbuf_pool
> + *  mbuf_pool where host mbuf is allocated.
> + * @param pkts
> + *  blank array to keep successfully dequeued packets
> + * @param count
> + *  size of the packet array
> + * @param nr_inflight
> + *  the amount of in-flight packets. If error occurred, its value is set to -1.
> + * @return
> + *  num of successfully dequeued packets  */ __rte_experimental
> +uint16_t rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count,
> +	int *nr_inflight);
> 
>  #endif /* _RTE_VHOST_ASYNC_H_ */
> diff --git a/lib/vhost/version.map b/lib/vhost/version.map index
> c92a9d4962..1e033ad8e2 100644
> --- a/lib/vhost/version.map
> +++ b/lib/vhost/version.map
> @@ -85,4 +85,7 @@ EXPERIMENTAL {
>  	rte_vhost_async_channel_register_thread_unsafe;
>  	rte_vhost_async_channel_unregister_thread_unsafe;
>  	rte_vhost_clear_queue_thread_unsafe;
> +
> +	# added in 21.11
> +	rte_vhost_async_try_dequeue_burst;
>  };


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
@ 2021-09-27  6:56     ` Jiang, Cheng1
  0 siblings, 0 replies; 28+ messages in thread
From: Jiang, Cheng1 @ 2021-09-27  6:56 UTC (permalink / raw)
  To: Ma, WenwuX, dev
  Cc: maxime.coquelin, Xia, Chenbo, Hu, Jiayu, Pai G, Sunil, Yang, YvonneX

Hi,

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Saturday, September 18, 2021 3:27 AM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> Jiang, Cheng1 <cheng1.jiang@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>;
> Pai G, Sunil <sunil.pai.g@intel.com>; Yang, YvonneX
> <yvonnex.yang@intel.com>; Ma, WenwuX <wenwux.ma@intel.com>
> Subject: [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and
> dequeue datapaths
> 
> Previously, by judging the flag, we call different enqueue/dequeue
> functions in data path.
> 
> Now, we use an ops that was initialized when Vhost was created,
> so that we can call ops directly in Vhost data path without any more
> flag judgment.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> ---
>  examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
>  examples/vhost/main.h       |  28 ++++++++--
>  examples/vhost/virtio_net.c |  16 +++++-
>  3 files changed, 98 insertions(+), 46 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index d0bf1f31e3..254f7097bc 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num =
> BURST_RX_RETRIES;
>  static char *socket_files;
>  static int nb_sockets;
> 
> +static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
> +
>  /* empty vmdq configuration structure. Filled in programatically */
>  static struct rte_eth_conf vmdq_conf_default = {
>  	.rxmode = {
> @@ -879,22 +881,8 @@ drain_vhost(struct vhost_dev *vdev)
>  	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
>  	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
> 
> -	if (builtin_net_driver) {
> -		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
> -	} else if (async_vhost_driver) {
> -		uint16_t enqueue_fail = 0;
> -
> -		complete_async_pkts(vdev);
> -		ret = rte_vhost_submit_enqueue_burst(vdev->vid,
> VIRTIO_RXQ, m, nr_xmit);
> -		__atomic_add_fetch(&vdev->pkts_inflight, ret,
> __ATOMIC_SEQ_CST);
> -
> -		enqueue_fail = nr_xmit - ret;
> -		if (enqueue_fail)
> -			free_pkts(&m[ret], nr_xmit - ret);
> -	} else {
> -		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
> -						m, nr_xmit);
> -	}
> +	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
> +					VIRTIO_RXQ, m, nr_xmit);
> 

Now, the line char number limit is 100, so you don't have to put it in 2 lines.

>  	if (enable_stats) {
>  		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
> @@ -1173,6 +1161,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
>  	}
>  }
> 
> +uint16_t
> +async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t rx_count)
> +{
> +	uint16_t enqueue_count;
> +	uint16_t enqueue_fail = 0;
> +
> +	complete_async_pkts(vdev);
> +	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
> +				queue_id, pkts, rx_count);

Same here.

> +	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
> +					__ATOMIC_SEQ_CST);

Same here.

> +
> +	enqueue_fail = rx_count - enqueue_count;
> +	if (enqueue_fail)
> +		free_pkts(&pkts[enqueue_count], enqueue_fail);
> +
> +	return enqueue_count;
> +}
> +
> +uint16_t
> +sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t rx_count)
> +{
> +	return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts,
> rx_count);
> +}
> +
>  static __rte_always_inline void
>  drain_eth_rx(struct vhost_dev *vdev)
>  {
> @@ -1203,25 +1218,8 @@ drain_eth_rx(struct vhost_dev *vdev)
>  		}
>  	}
> 
> -	if (builtin_net_driver) {
> -		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
> -						pkts, rx_count);
> -	} else if (async_vhost_driver) {
> -		uint16_t enqueue_fail = 0;
> -
> -		complete_async_pkts(vdev);
> -		enqueue_count = rte_vhost_submit_enqueue_burst(vdev-
> >vid,
> -					VIRTIO_RXQ, pkts, rx_count);
> -		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
> __ATOMIC_SEQ_CST);
> -
> -		enqueue_fail = rx_count - enqueue_count;
> -		if (enqueue_fail)
> -			free_pkts(&pkts[enqueue_count], enqueue_fail);
> -
> -	} else {
> -		enqueue_count = rte_vhost_enqueue_burst(vdev->vid,
> VIRTIO_RXQ,
> -						pkts, rx_count);
> -	}
> +	enqueue_count = vdev_queue_ops[vdev-
> >vid].enqueue_pkt_burst(vdev,
> +						VIRTIO_RXQ, pkts, rx_count);
> 
>  	if (enable_stats) {
>  		__atomic_add_fetch(&vdev->stats.rx_total_atomic,
> rx_count,
> @@ -1234,6 +1232,14 @@ drain_eth_rx(struct vhost_dev *vdev)
>  		free_pkts(pkts, rx_count);
>  }
> 
> +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return rte_vhost_dequeue_burst(dev->vid, queue_id,
> +					mbuf_pool, pkts, count);

Same here.

> +}
> +
>  static __rte_always_inline void
>  drain_virtio_tx(struct vhost_dev *vdev)
>  {
> @@ -1241,13 +1247,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
>  	uint16_t count;
>  	uint16_t i;
> 
> -	if (builtin_net_driver) {
> -		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
> -					pkts, MAX_PKT_BURST);
> -	} else {
> -		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
> -					mbuf_pool, pkts, MAX_PKT_BURST);
> -	}
> +	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
> +				VIRTIO_TXQ, mbuf_pool, pkts,
> MAX_PKT_BURST);
> 
>  	/* setup VMDq for the first packet */
>  	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
> @@ -1432,6 +1433,21 @@ new_device(int vid)
>  		}
>  	}
> 
> +	if (builtin_net_driver) {
> +		vdev_queue_ops[vid].enqueue_pkt_burst =
> builtin_enqueue_pkts;
> +		vdev_queue_ops[vid].dequeue_pkt_burst =
> builtin_dequeue_pkts;
> +	} else {
> +		if (async_vhost_driver) {
> +			vdev_queue_ops[vid].enqueue_pkt_burst =
> +							async_enqueue_pkts;

Same here.

> +		} else {
> +			vdev_queue_ops[vid].enqueue_pkt_burst =
> +							sync_enqueue_pkts;
> +		}

Same here. And it seems we don't need '{ }' here.

Thanks,
Cheng

> +
> +		vdev_queue_ops[vid].dequeue_pkt_burst =
> sync_dequeue_pkts;
> +	}
> +
>  	if (builtin_net_driver)
>  		vs_vhost_net_setup(vdev);
> 
> diff --git a/examples/vhost/main.h b/examples/vhost/main.h
> index e7b1ac60a6..2c5a558f12 100644
> --- a/examples/vhost/main.h
> +++ b/examples/vhost/main.h
> @@ -61,6 +61,19 @@ struct vhost_dev {
>  	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
>  } __rte_cache_aligned;
> 
> +typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
> +			uint16_t queue_id, struct rte_mbuf **pkts,
> +			uint32_t count);
> +
> +typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
> +			uint16_t queue_id, struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +
> +struct vhost_queue_ops {
> +	vhost_enqueue_burst_t enqueue_pkt_burst;
> +	vhost_dequeue_burst_t dequeue_pkt_burst;
> +};
> +
>  TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
> 
> 
> @@ -87,7 +100,16 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
>  uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  			 struct rte_mbuf **pkts, uint32_t count);
> 
> -uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> -			 struct rte_mempool *mbuf_pool,
> -			 struct rte_mbuf **pkts, uint16_t count);
> +uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mbuf **pkts, uint32_t count);
> +uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
> +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
>  #endif /* _MAIN_H_ */
> diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
> index 9064fc3a82..2432a96566 100644
> --- a/examples/vhost/virtio_net.c
> +++ b/examples/vhost/virtio_net.c
> @@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t
> queue_id,
>  	return count;
>  }
> 
> +uint16_t
> +builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t count)
> +{
> +	return vs_enqueue_pkts(dev, queue_id, pkts, count);
> +}
> +
>  static __rte_always_inline int
>  dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
>  	    struct rte_mbuf *m, uint16_t desc_idx,
> @@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct
> rte_vhost_vring *vr,
>  	return 0;
>  }
> 
> -uint16_t
> +static uint16_t
>  vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
>  {
> @@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t
> queue_id,
> 
>  	return i;
>  }
> +
> +uint16_t
> +builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> +{
> +	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
> +}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring
  2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
                   ` (5 preceding siblings ...)
  2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
@ 2021-09-28 18:56 ` Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 1/4] vhost: " Wenwu Ma
                     ` (3 more replies)
  6 siblings, 4 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-28 18:56 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with offloading
copies to the DMA engine, thus saving precious CPU cycles.

v3:
- Update release note.
- Update function comments.

v2:
- Removed struct async_nethdr in 1/4.
- Removed a useless function declaration in 2/4,
  and fixed some coding style in 4/4.

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst    |   9 +
 doc/guides/rel_notes/release_21_11.rst |   3 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/ioat.c                  |  67 +++-
 examples/vhost/ioat.h                  |  25 ++
 examples/vhost/main.c                  | 269 ++++++++-----
 examples/vhost/main.h                  |  34 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  34 +-
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   3 +-
 lib/vhost/virtio_net.c                 | 530 +++++++++++++++++++++++++
 12 files changed, 881 insertions(+), 121 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] vhost: support async dequeue for split ring
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
@ 2021-09-28 18:56   ` Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-28 18:56 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Yuan Wang, Wenwu Ma, Yinan Wang

From: Yuan Wang <yuanx.wang@intel.com>

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading copies to the async channel, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   9 +
 doc/guides/rel_notes/release_21_11.rst |   3 +
 lib/vhost/rte_vhost_async.h            |  34 +-
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   3 +-
 lib/vhost/virtio_net.c                 | 530 +++++++++++++++++++++++++
 6 files changed, 579 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 171e0096f6..9ed544db7a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -303,6 +303,15 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, nr_inflight)``
+
+  This function tries to receive packets from the guest with offloading
+  copies to the async channel. The packets that are transfer completed
+  are returned in ``pkts``. The other packets that their copies are submitted
+  to the async channel but not completed are called "in-flight packets".
+  This function will not return in-flight packets until their copies are
+  completed by the async channel.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index ad7c1afec0..79e4297ff9 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -91,6 +91,9 @@ New Features
   Added command-line options to specify total number of processes and
   current process ID. Each process owns subset of Rx and Tx queues.
 
+* **Added support for vhost async splited ring data path.**
+
+  Added async dequeue support for splited ring in vhost async data path.
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index ad71555a7f..703c81753a 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -84,11 +84,12 @@ struct rte_vhost_async_channel_ops {
 };
 
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
 struct async_inflight_info {
 	struct rte_mbuf *mbuf;
-	uint16_t descs; /* num of descs inflight */
+	struct virtio_net_hdr nethdr;
+	uint16_t descs; /* num of descs in-flight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
 };
 
@@ -256,4 +257,33 @@ __rte_experimental
 uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 		struct rte_mbuf **pkts, uint16_t count);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated.
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  The amount of in-flight packets. If error occurred, its value is set to -1.
+ * @return
+ *  Num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight);
+
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 8ebde3f694..8eb7e92c32 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -85,4 +85,7 @@ EXPERIMENTAL {
 	rte_vhost_async_channel_register_thread_unsafe;
 	rte_vhost_async_channel_unregister_thread_unsafe;
 	rte_vhost_clear_queue_thread_unsafe;
+
+	# added in 21.11
+	rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 1e56311725..89a31e4ca8 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -49,7 +49,8 @@
 #define MAX_PKT_BURST 32
 
 #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2)
-#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 4)
+#define MAX_ASYNC_COPY_VECTOR 1024
+#define VHOST_MAX_ASYNC_VEC (MAX_ASYNC_COPY_VECTOR * 2)
 
 #define PACKED_DESC_ENQUEUE_USED_FLAG(w)	\
 	((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index ec2c91e7a7..4bc69b9081 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3170,3 +3170,533 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline int
+async_desc_to_mbuf(struct virtio_net *dev,
+		  struct buf_vector *buf_vec, uint16_t nr_vec,
+		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
+		  struct iovec *src_iovec, struct iovec *dst_iovec,
+		  struct rte_vhost_iov_iter *src_it,
+		  struct rte_vhost_iov_iter *dst_it,
+		  struct virtio_net_hdr *nethdr,
+		  int nr_iovec)
+{
+	uint64_t buf_addr, buf_iova;
+	uint64_t mapped_len;
+	uint32_t tlen = 0;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint32_t mbuf_avail, mbuf_offset;
+	uint32_t cpy_len;
+	/* A counter to avoid desc dead loop chain */
+	uint16_t vec_idx = 0;
+	int tvec_idx = 0;
+	struct rte_mbuf *cur = m, *prev = m;
+	struct virtio_net_hdr tmp_hdr;
+	struct virtio_net_hdr *hdr = NULL;
+
+	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_len = buf_vec[vec_idx].buf_len;
+	buf_iova = buf_vec[vec_idx].buf_iova;
+
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
+
+	if (virtio_net_with_host_offload(dev)) {
+		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
+			/*
+			 * No luck, the virtio-net header doesn't fit
+			 * in a contiguous virtual area.
+			 */
+			copy_vnet_hdr_from_desc(&tmp_hdr, buf_vec);
+			hdr = &tmp_hdr;
+		} else {
+			hdr = (struct virtio_net_hdr *)((uintptr_t)buf_addr);
+		}
+	}
+
+	/*
+	 * A virtio driver normally uses at least 2 desc buffers
+	 * for Tx: the first for storing the header, and others
+	 * for storing the data.
+	 */
+	if (unlikely(buf_len < dev->vhost_hlen)) {
+		buf_offset = dev->vhost_hlen - buf_len;
+		vec_idx++;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+		buf_avail  = buf_len - buf_offset;
+	} else if (buf_len == dev->vhost_hlen) {
+		if (unlikely(++vec_idx >= nr_vec))
+			return -1;
+		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
+		buf_len = buf_vec[vec_idx].buf_len;
+
+		buf_offset = 0;
+		buf_avail = buf_len;
+	} else {
+		buf_offset = dev->vhost_hlen;
+		buf_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
+	}
+
+	PRINT_PACKET(dev, (uintptr_t)(buf_addr + buf_offset), (uint32_t)buf_avail, 0);
+
+	mbuf_offset = 0;
+	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+	while (1) {
+		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
+
+		while (cpy_len) {
+			void *hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+						buf_iova + buf_offset, cpy_len,
+						&mapped_len);
+			if (unlikely(!hpa)) {
+				VHOST_LOG_DATA(ERR, "(%d) %s: failed to get hpa.\n",
+					dev->vid, __func__);
+				return -1;
+			}
+			if (unlikely(tvec_idx >= nr_iovec)) {
+				VHOST_LOG_DATA(ERR, "iovec is not enough for offloading\n");
+				return -1;
+			}
+
+			async_fill_vec(src_iovec + tvec_idx, hpa, (size_t)mapped_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+				(void *)(uintptr_t)rte_pktmbuf_iova_offset(cur, mbuf_offset),
+				(size_t)mapped_len);
+
+			tvec_idx++;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
+		}
+
+		/* This buf reaches to its end, get the next one */
+		if (buf_avail == 0) {
+			if (++vec_idx >= nr_vec)
+				break;
+
+			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
+			buf_len = buf_vec[vec_idx].buf_len;
+
+			buf_offset = 0;
+			buf_avail = buf_len;
+
+			PRINT_PACKET(dev, (uintptr_t)buf_addr, (uint32_t)buf_avail, 0);
+		}
+
+		/*
+		 * This mbuf reaches to its end, get a new one
+		 * to hold more data.
+		 */
+		if (mbuf_avail == 0) {
+			cur = rte_pktmbuf_alloc(mbuf_pool);
+			if (unlikely(cur == NULL)) {
+				VHOST_LOG_DATA(ERR, "Failed to allocate memory for mbuf.\n");
+				return -1;
+			}
+
+			prev->next = cur;
+			prev->data_len = mbuf_offset;
+			m->nb_segs += 1;
+			m->pkt_len += mbuf_offset;
+			prev = cur;
+
+			mbuf_offset = 0;
+			mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM;
+		}
+	}
+
+	prev->data_len = mbuf_offset;
+	m->pkt_len += mbuf_offset;
+
+	if (tlen) {
+		async_fill_iter(src_it, tlen, src_iovec, tvec_idx);
+		async_fill_iter(dst_it, tlen, dst_iovec, tvec_idx);
+		if (hdr)
+			*nethdr = *hdr;
+	}
+	return 0;
+}
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, bool legacy_ol_flags)
+{
+	uint16_t n_pkts_cpl = 0, n_pkts_put = 0;
+	uint16_t start_idx, pkt_idx, from;
+	struct async_inflight_info *pkts_info;
+
+	pkt_idx = vq->async_pkts_idx & (vq->size - 1);
+	pkts_info = vq->async_pkts_info;
+	start_idx = virtio_dev_rx_async_get_info_idx(pkt_idx, vq->size,
+			vq->async_pkts_inflight_n);
+
+	if (count > vq->async_last_pkts_n) {
+		int ret;
+
+		ret = vq->async_ops.check_completed_copies(dev->vid, queue_id,
+				0, count - vq->async_last_pkts_n);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel poll error\n", dev->vid);
+			ret = 0;
+		}
+		n_pkts_cpl = ret;
+	}
+
+	n_pkts_cpl += vq->async_last_pkts_n;
+	if (unlikely(n_pkts_cpl == 0))
+		return 0;
+
+	n_pkts_put = RTE_MIN(count, n_pkts_cpl);
+
+	for (pkt_idx = 0; pkt_idx < n_pkts_put; pkt_idx++) {
+		from = (start_idx + pkt_idx) & (vq->size - 1);
+		pkts[pkt_idx] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(&pkts_info[from].nethdr,
+					pkts[pkt_idx], legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, n_pkts_put);
+	__atomic_add_fetch(&vq->used->idx, n_pkts_put, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async_last_pkts_n = n_pkts_cpl - n_pkts_put;
+	vq->async_pkts_inflight_n -= n_pkts_put;
+
+	return n_pkts_put;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t nr_async_burst = 0;
+	uint16_t pkt_err = 0;
+	uint16_t iovec_idx = 0, it_idx = 0;
+	struct rte_vhost_iov_iter *it_pool = vq->it_pool;
+	struct iovec *vec_pool = vq->vec_pool;
+	struct iovec *src_iovec = vec_pool;
+	struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
+	struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
+	struct async_inflight_info *pkts_info = vq->async_pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%d) about to dequeue %u buffers\n", dev->vid, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed mbuf alloc of size %d from %s on %s.\n",
+					buf_len, mbuf_pool->name, dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
+		err = async_desc_to_mbuf(dev, buf_vec, nr_vec, pkt,
+				mbuf_pool, &src_iovec[iovec_idx],
+				&dst_iovec[iovec_idx], &it_pool[it_idx],
+				&it_pool[it_idx + 1],
+				&pkts_info[slot_idx].nethdr,
+				(VHOST_MAX_ASYNC_VEC >> 1) - iovec_idx);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"Failed to offload copies to async channel %s.\n",
+					dev->ifname);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		async_fill_desc(&tdes[nr_async_burst], &it_pool[it_idx], &it_pool[it_idx + 1]);
+		pkts_info[slot_idx].mbuf = pkt;
+		nr_async_burst++;
+
+		iovec_idx += it_pool[it_idx].nr_segs;
+		it_idx += 2;
+
+		/* store used descs */
+		to = vq->async_desc_idx_split & (vq->size - 1);
+		vq->async_descs_split[to].id = head_idx;
+		vq->async_descs_split[to].len = 0;
+		vq->async_desc_idx_split++;
+
+		vq->last_avail_idx++;
+
+		if (unlikely(nr_async_burst >= VHOST_ASYNC_BATCH_THRESHOLD)) {
+			uint16_t nr_pkts;
+			int32_t ret;
+
+			ret = vq->async_ops.transfer_data(dev->vid, queue_id,
+					tdes, 0, nr_async_burst);
+			if (unlikely(ret < 0)) {
+				VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+				ret = 0;
+			}
+			nr_pkts = ret;
+
+			vq->async_pkts_inflight_n += nr_pkts;
+			it_idx = 0;
+			iovec_idx = 0;
+
+			if (unlikely(nr_pkts < nr_async_burst)) {
+				pkt_err = nr_async_burst - nr_pkts;
+				nr_async_burst = 0;
+				pkt_idx++;
+				break;
+			}
+			nr_async_burst = 0;
+		}
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	if (nr_async_burst) {
+		uint16_t nr_pkts;
+		int32_t ret;
+
+		ret = vq->async_ops.transfer_data(dev->vid, queue_id, tdes, 0, nr_async_burst);
+		if (unlikely(ret < 0)) {
+			VHOST_LOG_DATA(ERR, "(%d) async channel submit error\n", dev->vid);
+			ret = 0;
+		}
+		nr_pkts = ret;
+
+		vq->async_pkts_inflight_n += nr_pkts;
+
+		if (unlikely(nr_pkts < nr_async_burst))
+			pkt_err = nr_async_burst - nr_pkts;
+	}
+
+	if (unlikely(pkt_err)) {
+		uint16_t nr_err_dma = pkt_err;
+
+		pkt_idx -= nr_err_dma;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		vq->async_desc_idx_split -= nr_err_dma;
+		while (nr_err_dma-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+	}
+
+	vq->async_pkts_idx += pkt_idx;
+
+out:
+	if (vq->async_pkts_inflight_n > 0) {
+		nr_done_pkts = async_poll_dequeue_completed_split(dev, vq,
+					queue_id, pkts, count, legacy_ol_flags);
+	}
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: built-in vhost net backend is disabled.\n",
+			dev->vid, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: invalid virtqueue idx %d.\n",
+			dev->vid, __func__, queue_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async_registered)) {
+		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for queue id %d.\n",
+			dev->vid, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out_access_unlock;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		VHOST_LOG_DATA(ERR,
+			"(%d) %s: async dequeue does not support packed ring.\n",
+			dev->vid, __func__);
+		return 0;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count);
+
+out:
+	*nr_inflight = vq->async_pkts_inflight_n;
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL)) {
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		memmove(&pkts[1], pkts, count * sizeof(struct rte_mbuf *));
+		pkts[0] = rarp_mbuf;
+		count += 1;
+	}
+
+	return count;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 1/4] vhost: " Wenwu Ma
@ 2021-09-28 18:56   ` Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-28 18:56 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 examples/vhost/main.c       | 100 +++++++++++++++++++++---------------
 examples/vhost/main.h       |  28 ++++++++--
 examples/vhost/virtio_net.c |  16 +++++-
 3 files changed, 98 insertions(+), 46 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d0bf1f31e3..254f7097bc 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -879,22 +881,8 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1173,6 +1161,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+
+	complete_async_pkts(vdev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+				queue_id, pkts, rx_count);
+	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
+					__ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1203,25 +1218,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (async_vhost_driver) {
-		uint16_t enqueue_fail = 0;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+						VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1234,6 +1232,14 @@ drain_eth_rx(struct vhost_dev *vdev)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id,
+					mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1241,13 +1247,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1432,6 +1433,21 @@ new_device(int vid)
 		}
 	}
 
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (async_vhost_driver) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+							sync_enqueue_pkts;
+		}
+
+		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7b1ac60a6..2c5a558f12 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,7 +100,16 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] examples/vhost: use a new API to query remaining ring space
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 1/4] vhost: " Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
@ 2021-09-28 18:56   ` Wenwu Ma
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-28 18:56 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 examples/vhost/ioat.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 457f8171f0..6adc30b622 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
 	unsigned short next_read;
 	unsigned short next_write;
 	unsigned short last_remain;
-	unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
 			goto out;
 		}
 		rte_rawdev_start(dev_id);
-		cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
 		dma_info->nr++;
 		i++;
 	}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			src = descs[i_desc].src;
 			dst = descs[i_desc].dst;
 			i_seg = 0;
-			if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+			if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
 				break;
 			while (i_seg < src->nr_segs) {
 				rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 			}
 			write &= mask;
 			cb_tracker[dev_id].size_track[write] = src->nr_segs;
-			cb_tracker[dev_id].ioat_space -= src->nr_segs;
 			write++;
 		}
 	} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		if (n_seg == 0)
 			return 0;
 
-		cb_tracker[dev_id].ioat_space += n_seg;
 		n_seg += cb_tracker[dev_id].last_remain;
 
 		read = cb_tracker[dev_id].next_read;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path
  2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
                     ` (2 preceding siblings ...)
  2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
@ 2021-09-28 18:56   ` Wenwu Ma
  3 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-09-28 18:56 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, jiayu.hu, Sunil.Pai.G,
	yvonnex.yang, Wenwu Ma

This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c              |  61 +++++++--
 examples/vhost/ioat.h              |  25 ++++
 examples/vhost/main.c              | 201 +++++++++++++++++++----------
 examples/vhost/main.h              |   6 +-
 5 files changed, 219 insertions(+), 83 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 6adc30b622..3a256b0f4c 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid, vring_id;
+	int64_t socketid, vring_id;
 	struct rte_ioat_rawdev_config config;
 	struct rte_rawdev_info info = { .dev_private = &config };
 	char name[32];
@@ -60,6 +62,7 @@ open_ioat(const char *value)
 		goto out;
 	}
 	while (i < args_nr) {
+		bool is_txd;
 		char *arg_temp = dma_arg[i];
 		uint8_t sub_nr;
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +71,39 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		int async_flag;
+		char *txd, *rxd;
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			is_txd = true;
+			start = txd;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			is_txd = false;
+			start = rxd;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
 		}
 
-		vring_id = 0 + VIRTIO_RXQ;
+		vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
 		if (rte_pci_addr_parse(ptrs[1],
-				&(dma_info + vid)->dmas[vring_id].addr) < 0) {
+			&(dma_info + socketid)->dmas[vring_id].addr) < 0) {
 			ret = -1;
 			goto out;
 		}
 
-		rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+		rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
 				name, sizeof(name));
 		dev_id = rte_rawdev_get_dev_id(name);
 		if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		(dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-		(dma_info + vid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->async_flag |= async_flag;
 		config.ring_size = IOAT_RING_SIZE;
 		config.hdls_disable = true;
 		if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data, uint16_t count)
 {
 	uint32_t i_desc;
-	uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id;
 	struct rte_vhost_iov_iter *src = NULL;
 	struct rte_vhost_iov_iter *dst = NULL;
 	unsigned long i_seg;
 	unsigned short mask = MAX_ENQUEUED_SIZE - 1;
-	unsigned short write = cb_tracker[dev_id].next_write;
 
+	if (queue_id >= MAX_RING_COUNT)
+		return -1;
+
+	uint16_t dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
+	unsigned short write = cb_tracker[dev_id].next_write;
 	if (!opaque_data) {
 		for (i_desc = 0; i_desc < count; i_desc++) {
 			src = descs[i_desc].src;
@@ -170,16 +189,16 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets)
 {
-	if (!opaque_data) {
+	if (!opaque_data && queue_id < MAX_RING_COUNT) {
 		uintptr_t dump[255];
 		int n_seg;
 		unsigned short read, write;
 		unsigned short nb_packet = 0;
 		unsigned short mask = MAX_ENQUEUED_SIZE - 1;
 		unsigned short i;
+		uint16_t dev_id;
 
-		uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2
-				+ VIRTIO_RXQ].dev_id;
+		dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
 		n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump, dump);
 		if (n_seg < 0) {
 			RTE_LOG(ERR,
@@ -215,4 +234,18 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 	return -1;
 }
 
+uint32_t get_async_flag_by_vid(int vid)
+{
+	return dma_bind[vid2socketid[vid]].async_flag;
+}
+
+uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
 #endif /* RTE_RAW_IOAT */
diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h
index 62e163c585..105cee556d 100644
--- a/examples/vhost/ioat.h
+++ b/examples/vhost/ioat.h
@@ -12,6 +12,9 @@
 #define MAX_VHOST_DEVICE 1024
 #define IOAT_RING_SIZE 4096
 #define MAX_ENQUEUED_SIZE 4096
+#define MAX_RING_COUNT	2
+#define ASYNC_ENQUEUE_VHOST	1
+#define ASYNC_DEQUEUE_VHOST	2
 
 struct dma_info {
 	struct rte_pci_addr addr;
@@ -20,6 +23,7 @@ struct dma_info {
 };
 
 struct dma_for_vhost {
+	uint32_t async_flag;
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
 	uint16_t nr;
 };
@@ -36,6 +40,10 @@ int32_t
 ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets);
+
+uint32_t get_async_flag_by_vid(int vid);
+uint32_t get_async_flag_by_socketid(int socketid);
+void init_vid2socketid_array(int vid, int socketid);
 #else
 static int open_ioat(const char *value __rte_unused)
 {
@@ -59,5 +67,22 @@ ioat_check_completed_copies_cb(int vid __rte_unused,
 {
 	return -1;
 }
+
+static uint32_t
+get_async_flag_by_vid(int vid __rte_unused)
+{
+	return 0;
+}
+
+static uint32_t
+get_async_flag_by_socketid(int socketid __rte_unused)
+{
+	return 0;
+}
+
+static void
+init_vid2socketid_array(int vid __rte_unused, int socketid __rte_unused)
+{
+}
 #endif
 #endif /* _IOAT_H_ */
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 254f7097bc..572ffc12ae 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -93,8 +93,6 @@ static int client_mode;
 
 static int builtin_net_driver;
 
-static int async_vhost_driver;
-
 static char *dma_type;
 
 /* Specify timeout (in useconds) between retries on RX. */
@@ -673,7 +671,6 @@ us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
-			async_vhost_driver = 1;
 			break;
 
 		case OPT_CLIENT_NUM:
@@ -846,7 +843,8 @@ complete_async_pkts(struct vhost_dev *vdev)
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+				complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -891,7 +889,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1171,8 +1169,8 @@ async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
 	complete_async_pkts(vdev);
 	enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
 				queue_id, pkts, rx_count);
-	__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count,
-					__ATOMIC_SEQ_CST);
+	__atomic_add_fetch(&vdev->pkts_enq_inflight,
+			enqueue_count, __ATOMIC_SEQ_CST);
 
 	enqueue_fail = rx_count - enqueue_count;
 	if (enqueue_fail)
@@ -1228,10 +1226,23 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_ENQUEUE_VHOST) == 0)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+	return dequeue_count;
+}
+
 uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mempool *mbuf_pool,
 			struct rte_mbuf **pkts, uint16_t count)
@@ -1327,6 +1338,32 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight,
+					n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchonization  occurs through the use of the
@@ -1383,21 +1420,91 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (async_vhost_driver) {
-		uint16_t n_pkt = 0;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+	if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	}
+	if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+	}
+
+	rte_free(vdev);
+}
+
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						sync_enqueue_pkts;
+		}
 
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						async_dequeue_pkts;
+		} else {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						sync_dequeue_pkts;
 		}
+	}
 
-		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int ret = 0;
+	struct rte_vhost_async_config config = {0};
+	struct rte_vhost_async_channel_ops channel_ops;
+
+	if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
+		channel_ops.transfer_data = ioat_transfer_data_cb;
+		channel_ops.check_completed_copies =
+			ioat_check_completed_copies_cb;
+
+		config.features = RTE_VHOST_ASYNC_INORDER;
+
+		if (get_async_flag_by_vid(vid) & ASYNC_ENQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
+					config, &channel_ops);
+		}
+		if (get_async_flag_by_vid(vid) & ASYNC_DEQUEUE_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_TXQ,
+					config, &channel_ops);
+		}
 	}
 
-	rte_free(vdev);
+	return ret;
 }
 
 /*
@@ -1433,20 +1540,8 @@ new_device(int vid)
 		}
 	}
 
-	if (builtin_net_driver) {
-		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
-		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
-	} else {
-		if (async_vhost_driver) {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							async_enqueue_pkts;
-		} else {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							sync_enqueue_pkts;
-		}
-
-		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
-	}
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
 
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
@@ -1475,27 +1570,13 @@ new_device(int vid)
 	rte_vhost_enable_guest_notification(vid, VIRTIO_RXQ, 0);
 	rte_vhost_enable_guest_notification(vid, VIRTIO_TXQ, 0);
 
+	int ret = vhost_async_channel_register(vid);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (async_vhost_driver) {
-		struct rte_vhost_async_config config = {0};
-		struct rte_vhost_async_channel_ops channel_ops;
-
-		if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
-			channel_ops.transfer_data = ioat_transfer_data_cb;
-			channel_ops.check_completed_copies =
-				ioat_check_completed_copies_cb;
-
-			config.features = RTE_VHOST_ASYNC_INORDER;
-
-			return rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
-				config, &channel_ops);
-		}
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1513,19 +1594,8 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (async_vhost_driver) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
-	}
+	if (!enable)
+		vhost_clear_queue_thread_unsafe(vdev, queue_id);
 
 	return 0;
 }
@@ -1769,10 +1839,11 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (async_vhost_driver)
-			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
+		uint64_t flag = flags;
+		if (get_async_flag_by_socketid(i) != 0)
+			flag |= RTE_VHOST_USER_ASYNC_COPY;
 
-		ret = rte_vhost_driver_register(file, flags);
+		ret = rte_vhost_driver_register(file, flag);
 		if (ret != 0) {
 			unregister_drivers(i);
 			rte_exit(EXIT_FAILURE,
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 2c5a558f12..5af7e7d97f 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -51,7 +51,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -112,4 +113,7 @@ uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mbuf **pkts, uint16_t count);
 uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path
  2021-06-23 15:00 ` [dpdk-dev] [PATCH v3 0/4] vhost: " Wenwu Ma
@ 2021-06-23 15:00   ` Wenwu Ma
  0 siblings, 0 replies; 28+ messages in thread
From: Wenwu Ma @ 2021-06-23 15:00 UTC (permalink / raw)
  To: dev; +Cc: maxime.coquelin, chenbo.xia, cheng1.jiang, Wenwu Ma

This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c              |  61 ++++++++++---
 examples/vhost/ioat.h              |  25 ++++++
 examples/vhost/main.c              | 140 ++++++++++++++++++++---------
 4 files changed, 177 insertions(+), 58 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index bf4e033bdb..a305100b47 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid, vring_id;
+	int64_t socketid, vring_id;
 	struct rte_ioat_rawdev_config config;
 	struct rte_rawdev_info info = { .dev_private = &config };
 	char name[32];
@@ -60,6 +62,8 @@ open_ioat(const char *value)
 		goto out;
 	}
 	while (i < args_nr) {
+		char *txd, *rxd;
+		bool is_txd;
 		char *arg_temp = dma_arg[i];
 		uint8_t sub_nr;
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +72,38 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		int async_flag;
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd == NULL && rxd == NULL) {
 			ret = -1;
 			goto out;
+		} else if (txd) {
+			is_txd = true;
+			start = txd;
+			async_flag = ASYNC_RX_VHOST;
+		} else {
+			is_txd = false;
+			start = rxd;
+			async_flag = ASYNC_TX_VHOST;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
 		}
 
-		vring_id = 0 + VIRTIO_RXQ;
+		vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
 		if (rte_pci_addr_parse(ptrs[1],
-				&(dma_info + vid)->dmas[vring_id].addr) < 0) {
+			&(dma_info + socketid)->dmas[vring_id].addr) < 0) {
 			ret = -1;
 			goto out;
 		}
 
-		rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+		rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
 				name, sizeof(name));
 		dev_id = rte_rawdev_get_dev_id(name);
 		if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
 			goto out;
 		}
 
-		(dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-		(dma_info + vid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].is_valid = true;
+		(dma_info + socketid)->async_flag |= async_flag;
 		config.ring_size = IOAT_RING_SIZE;
 		config.hdls_disable = true;
 		if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data, uint16_t count)
 {
 	uint32_t i_desc;
-	uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2 + VIRTIO_RXQ].dev_id;
 	struct rte_vhost_iov_iter *src = NULL;
 	struct rte_vhost_iov_iter *dst = NULL;
 	unsigned long i_seg;
 	unsigned short mask = MAX_ENQUEUED_SIZE - 1;
-	unsigned short write = cb_tracker[dev_id].next_write;
 
+	if (queue_id >= MAX_RING_COUNT)
+		return -1;
+
+	uint16_t dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
+	unsigned short write = cb_tracker[dev_id].next_write;
 	if (!opaque_data) {
 		for (i_desc = 0; i_desc < count; i_desc++) {
 			src = descs[i_desc].src;
@@ -170,16 +189,16 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets)
 {
-	if (!opaque_data) {
+	if (!opaque_data && (queue_id < MAX_RING_COUNT)) {
 		uintptr_t dump[255];
 		int n_seg;
 		unsigned short read, write;
 		unsigned short nb_packet = 0;
 		unsigned short mask = MAX_ENQUEUED_SIZE - 1;
 		unsigned short i;
+		uint16_t dev_id;
 
-		uint16_t dev_id = dma_bind[vid].dmas[queue_id * 2
-				+ VIRTIO_RXQ].dev_id;
+		dev_id = dma_bind[vid2socketid[vid]].dmas[queue_id].dev_id;
 		n_seg = rte_ioat_completed_ops(dev_id, 255, NULL, NULL, dump, dump);
 		if (n_seg < 0) {
 			RTE_LOG(ERR,
@@ -215,4 +234,18 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 	return -1;
 }
 
+uint32_t get_async_flag_by_vid(int vid)
+{
+	return dma_bind[vid2socketid[vid]].async_flag;
+}
+
+uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
 #endif /* RTE_RAW_IOAT */
diff --git a/examples/vhost/ioat.h b/examples/vhost/ioat.h
index 1aa28ed6a3..51111d65af 100644
--- a/examples/vhost/ioat.h
+++ b/examples/vhost/ioat.h
@@ -12,6 +12,9 @@
 #define MAX_VHOST_DEVICE 1024
 #define IOAT_RING_SIZE 4096
 #define MAX_ENQUEUED_SIZE 4096
+#define MAX_RING_COUNT	2
+#define ASYNC_RX_VHOST	1
+#define ASYNC_TX_VHOST	2
 
 struct dma_info {
 	struct rte_pci_addr addr;
@@ -20,6 +23,7 @@ struct dma_info {
 };
 
 struct dma_for_vhost {
+	int async_flag;
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
 	uint16_t nr;
 };
@@ -36,6 +40,10 @@ uint32_t
 ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets);
+
+uint32_t get_async_flag_by_vid(int vid);
+uint32_t get_async_flag_by_socketid(int socketid);
+void init_vid2socketid_array(int vid, int socketid);
 #else
 static int open_ioat(const char *value __rte_unused)
 {
@@ -59,5 +67,22 @@ ioat_check_completed_copies_cb(int vid __rte_unused,
 {
 	return -1;
 }
+
+static uint32_t
+get_async_flag_by_vid(int vid __rte_unused)
+{
+	return 0;
+}
+
+static uint32_t
+get_async_flag_by_socketid(int socketid __rte_unused)
+{
+	return 0;
+}
+
+static void
+init_vid2socketid_array(int vid __rte_unused, int socketid __rte_unused)
+{
+}
 #endif
 #endif /* _IOAT_H_ */
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index aebdc3a566..81d7e4cbd3 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -93,8 +93,6 @@ static int client_mode;
 
 static int builtin_net_driver;
 
-static int async_vhost_driver;
-
 static char *dma_type;
 
 /* Specify timeout (in useconds) between retries on RX. */
@@ -679,7 +677,6 @@ us_vhost_parse_args(int argc, char **argv)
 				us_vhost_usage(prgname);
 				return -1;
 			}
-			async_vhost_driver = 1;
 			break;
 
 		case OPT_CLIENT_NUM:
@@ -897,7 +894,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_RX_VHOST) == 0)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1237,10 +1234,19 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!async_vhost_driver)
+	if ((get_async_flag_by_vid(vdev->vid) & ASYNC_RX_VHOST) == 0)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+				struct rte_mempool *mbuf_pool,
+				struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	return rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight);
+}
+
 uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			struct rte_mempool *mbuf_pool,
 			struct rte_mbuf **pkts, uint16_t count)
@@ -1392,12 +1398,90 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (async_vhost_driver)
+	if (get_async_flag_by_vid(vid) & ASYNC_RX_VHOST)
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+	if (get_async_flag_by_vid(vid) & ASYNC_TX_VHOST)
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
 
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (get_async_flag_by_vid(vid) & ASYNC_RX_VHOST) {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						async_enqueue_pkts;
+		} else {
+			vdev_queue_ops[vid].enqueue_pkt_burst =
+						sync_enqueue_pkts;
+		}
+
+		if (get_async_flag_by_vid(vid) & ASYNC_TX_VHOST) {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						async_dequeue_pkts;
+		} else {
+			vdev_queue_ops[vid].dequeue_pkt_burst =
+						sync_dequeue_pkts;
+		}
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int ret = 0;
+	struct rte_vhost_async_features f;
+	struct rte_vhost_async_channel_ops channel_ops;
+
+	if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
+		channel_ops.transfer_data = ioat_transfer_data_cb;
+		channel_ops.check_completed_copies =
+			ioat_check_completed_copies_cb;
+
+		f.async_inorder = 1;
+		f.async_threshold = 256;
+
+		if (get_async_flag_by_vid(vid) & ASYNC_RX_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
+					f.intval, &channel_ops);
+		}
+		if (get_async_flag_by_vid(vid) & ASYNC_TX_VHOST) {
+			ret |= rte_vhost_async_channel_register(vid, VIRTIO_TXQ,
+					f.intval, &channel_ops);
+		}
+	}
+
+	return ret;
+}
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1431,20 +1515,8 @@ new_device(int vid)
 		}
 	}
 
-	if (builtin_net_driver) {
-		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
-		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
-	} else {
-		if (async_vhost_driver) {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							async_enqueue_pkts;
-		} else {
-			vdev_queue_ops[vid].enqueue_pkt_burst =
-							sync_enqueue_pkts;
-		}
-
-		vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
-	}
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
 
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
@@ -1473,28 +1545,13 @@ new_device(int vid)
 	rte_vhost_enable_guest_notification(vid, VIRTIO_RXQ, 0);
 	rte_vhost_enable_guest_notification(vid, VIRTIO_TXQ, 0);
 
+	int ret = vhost_async_channel_register(vid);
+
 	RTE_LOG(INFO, VHOST_DATA,
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (async_vhost_driver) {
-		struct rte_vhost_async_features f;
-		struct rte_vhost_async_channel_ops channel_ops;
-
-		if (dma_type != NULL && strncmp(dma_type, "ioat", 4) == 0) {
-			channel_ops.transfer_data = ioat_transfer_data_cb;
-			channel_ops.check_completed_copies =
-				ioat_check_completed_copies_cb;
-
-			f.async_inorder = 1;
-			f.async_threshold = 256;
-
-			return rte_vhost_async_channel_register(vid, VIRTIO_RXQ,
-				f.intval, &channel_ops);
-		}
-	}
-
-	return 0;
+	return ret;
 }
 
 /*
@@ -1735,10 +1792,11 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (async_vhost_driver)
-			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
+		uint64_t flag = flags;
+		if (get_async_flag_by_socketid(i) != 0)
+			flag |= RTE_VHOST_USER_ASYNC_COPY;
 
-		ret = rte_vhost_driver_register(file, flags);
+		ret = rte_vhost_driver_register(file, flag);
 		if (ret != 0) {
 			unregister_drivers(i);
 			rte_exit(EXIT_FAILURE,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-09-28  7:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 20:48 [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Wenwu Ma
2021-09-06 20:48 ` [dpdk-dev] [PATCH 1/4] vhost: " Wenwu Ma
2021-09-10  7:36   ` Yang, YvonneX
2021-09-15  2:51   ` Xia, Chenbo
     [not found]     ` <CO1PR11MB4897F3D5ABDE7133DB99791385DB9@CO1PR11MB4897.namprd11.prod.outlook.com>
2021-09-15 11:35       ` Xia, Chenbo
2021-09-06 20:48 ` [dpdk-dev] [PATCH 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
2021-09-10  7:38   ` Yang, YvonneX
2021-09-15  3:02   ` Xia, Chenbo
2021-09-06 20:48 ` [dpdk-dev] [PATCH 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
2021-09-10  7:38   ` Yang, YvonneX
2021-09-15  3:04   ` Xia, Chenbo
2021-09-06 20:48 ` [dpdk-dev] [PATCH 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
2021-09-10  7:39   ` Yang, YvonneX
2021-09-15  3:27   ` Xia, Chenbo
2021-09-10  7:33 ` [dpdk-dev] [PATCH 0/4] support async dequeue for split ring Yang, YvonneX
2021-09-17 19:26 ` [dpdk-dev] [PATCH v2 " Wenwu Ma
2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 1/4] vhost: " Wenwu Ma
2021-09-27  6:33     ` Jiang, Cheng1
2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
2021-09-27  6:56     ` Jiang, Cheng1
2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
2021-09-17 19:27   ` [dpdk-dev] [PATCH v2 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
2021-09-28 18:56 ` [dpdk-dev] [PATCH v3 0/4] support async dequeue for split ring Wenwu Ma
2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 1/4] vhost: " Wenwu Ma
2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 2/4] examples/vhost: refactor vhost enqueue and dequeue datapaths Wenwu Ma
2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 3/4] examples/vhost: use a new API to query remaining ring space Wenwu Ma
2021-09-28 18:56   ` [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma
  -- strict thread matches above, loose matches on Subject: below --
2021-06-02  8:31 [dpdk-dev] [PATCH 0/1] lib/vhost: support async dequeue for split ring Yuan Wang
2021-06-23 15:00 ` [dpdk-dev] [PATCH v3 0/4] vhost: " Wenwu Ma
2021-06-23 15:00   ` [dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path Wenwu Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).