DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v1 0/5] vhost: support async dequeue data path
@ 2022-04-07 15:25 xuan.ding
  2022-04-07 15:25 ` [PATCH v1 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                   ` (11 more replies)
  0 siblings, 12 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload
memory copies to DMA engine, so as to save CPU cycles and improve
the copy performance. This patch implements vhost async dequeue data
path for split ring.

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 292 +++++++++++-----
 examples/vhost/main.h                  |  35 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  33 ++
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 459 ++++++++++++++++++++++---
 10 files changed, 711 insertions(+), 148 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v1 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
@ 2022-04-07 15:25 ` xuan.ding
  2022-04-07 15:25 ` [PATCH v1 2/5] vhost: prepare async " xuan.ding
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 66 +++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 37 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..a2d04a1f60 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,9 +1030,9 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
@@ -1043,10 +1043,17 @@ sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1165,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
+			sync_fill_seg(dev, vq, m, mbuf_offset,
 					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+					buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2474,7 +2481,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  bool legacy_ol_flags)
 {
 	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint64_t buf_addr, buf_iova, buf_len;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2489,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2519,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2544,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, m, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, true);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2559,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2578,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2598,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v1 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
  2022-04-07 15:25 ` [PATCH v1 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-04-07 15:25 ` xuan.ding
  2022-04-07 15:25 ` [PATCH v1 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a2d04a1f60..709ff483a3 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,16 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst,
+						 (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1161,8 +1168,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+						buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v1 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
  2022-04-07 15:25 ` [PATCH v1 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-04-07 15:25 ` [PATCH v1 2/5] vhost: prepare async " xuan.ding
@ 2022-04-07 15:25 ` xuan.ding
  2022-04-07 15:25 ` [PATCH v1 4/5] vhost: support async dequeue for split ring xuan.ding
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patches refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 47 ++++++++++++++++++++++++++++++++----------
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..9209558465 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -177,6 +177,7 @@ extern struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX];
  * inflight async packet information
  */
 struct async_inflight_info {
+	struct virtio_net_hdr nethdr;
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 709ff483a3..382e953c2d 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2482,10 +2482,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset;
 	uint64_t buf_addr, buf_iova, buf_len;
@@ -2496,6 +2496,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2548,12 +2550,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, m, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, true);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+						buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				buf_addr + buf_offset,
+				buf_iova + buf_offset, cpy_len, true);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2602,11 +2617,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (hdr) {
+		if (is_async) {
+			async_iter_finalize(async);
+			pkts_info[slot_idx].nethdr = *hdr;
+		} else {
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+		}
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2738,8 +2762,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2750,6 +2774,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2931,8 +2956,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+				mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v1 4/5] vhost: support async dequeue for split ring
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (2 preceding siblings ...)
  2022-04-07 15:25 ` [PATCH v1 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-04-07 15:25 ` xuan.ding
  2022-04-07 15:25 ` [PATCH v1 5/5] examples/vhost: support async dequeue data path xuan.ding
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 lib/vhost/rte_vhost_async.h            |  33 +++
 lib/vhost/version.map                  |   3 +
 lib/vhost/virtio_net.c                 | 335 +++++++++++++++++++++++++
 5 files changed, 382 insertions(+)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 886f8f5e72..40cf315170 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -276,6 +276,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..422a6673cb 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to accelerate
+  receiving pkts from guest.
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index f1293c6a9d..23fe1a7316 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -187,6 +187,39 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  The amount of in-flight packets. If error occurred, its value is set to -1.
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 0a66c5840c..514e3ff6a6 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -87,6 +87,9 @@ EXPERIMENTAL {
 
 	# added in 22.03
 	rte_vhost_async_dma_configure;
+
+	# added in 22.07
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 382e953c2d..3085905d17 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3165,3 +3165,338 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info;
+	struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+
+	pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n", dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG,
+			"(%s) %s: failed to transfer data for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, queue_id, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: built-in vhost net backend is disabled.\n",
+			dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: invalid virtqueue idx %d.\n",
+			dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+			       dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v1 5/5] examples/vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (3 preceding siblings ...)
  2022-04-07 15:25 ` [PATCH v1 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-04-07 15:25 ` xuan.ding
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-07 15:25 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 292 ++++++++++++++++++++---------
 examples/vhost/main.h              |  35 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 254 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d94fabb060..d26e40ab73 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,13 +1019,13 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight, complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -1031,23 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1056,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1328,6 +1341,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+	__atomic_add_fetch(&dev->pkts_enq_inflight, enqueue_count, __ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1358,26 +1398,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1386,10 +1408,33 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1397,13 +1442,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1482,6 +1522,31 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight, dma_id, 0);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight, dma_id, 0);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1538,25 +1603,78 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-		}
-
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
-		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = false;
+	}
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = false;
 	}
 
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1568,6 +1686,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1590,6 +1710,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1621,16 +1752,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1648,19 +1770,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1885,7 +1997,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index b4a453e77e..40ac2841d1 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -52,7 +52,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -62,6 +63,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -88,6 +102,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -98,7 +113,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (4 preceding siblings ...)
  2022-04-07 15:25 ` [PATCH v1 5/5] examples/vhost: support async dequeue data path xuan.ding
@ 2022-04-11 10:00 ` xuan.ding
  2022-04-11 10:00   ` [PATCH v2 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (4 more replies)
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
                   ` (5 subsequent siblings)
  11 siblings, 5 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 292 +++++++++++-----
 examples/vhost/main.h                  |  35 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  33 ++
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 460 ++++++++++++++++++++++---
 10 files changed, 712 insertions(+), 148 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
@ 2022-04-11 10:00   ` xuan.ding
  2022-04-11 10:00   ` [PATCH v2 2/5] vhost: prepare async " xuan.ding
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 66 +++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 37 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..a2d04a1f60 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,9 +1030,9 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
@@ -1043,10 +1043,17 @@ sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1165,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
+			sync_fill_seg(dev, vq, m, mbuf_offset,
 					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+					buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2474,7 +2481,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  bool legacy_ol_flags)
 {
 	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint64_t buf_addr, buf_iova, buf_len;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2489,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2519,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2544,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, m, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, true);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2559,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2578,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2598,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
  2022-04-11 10:00   ` [PATCH v2 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-04-11 10:00   ` xuan.ding
  2022-04-11 10:00   ` [PATCH v2 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a2d04a1f60..709ff483a3 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,16 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst,
+						 (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1161,8 +1168,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+						buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
  2022-04-11 10:00   ` [PATCH v2 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-04-11 10:00   ` [PATCH v2 2/5] vhost: prepare async " xuan.ding
@ 2022-04-11 10:00   ` xuan.ding
  2022-04-11 10:00   ` [PATCH v2 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-04-11 10:00   ` [PATCH v2 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patches refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..9209558465 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -177,6 +177,7 @@ extern struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX];
  * inflight async packet information
  */
 struct async_inflight_info {
+	struct virtio_net_hdr nethdr;
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 709ff483a3..56904ad9a5 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2482,10 +2482,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset;
 	uint64_t buf_addr, buf_iova, buf_len;
@@ -2496,6 +2496,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2533,6 +2535,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2548,12 +2551,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, m, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, true);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+						buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				buf_addr + buf_offset,
+				buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2602,11 +2618,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2738,8 +2763,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2750,6 +2775,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2931,8 +2957,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+				mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 4/5] vhost: support async dequeue for split ring
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-04-11 10:00   ` [PATCH v2 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-04-11 10:00   ` xuan.ding
  2022-04-11 10:00   ` [PATCH v2 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 lib/vhost/rte_vhost_async.h            |  33 +++
 lib/vhost/version.map                  |   3 +
 lib/vhost/virtio_net.c                 | 335 +++++++++++++++++++++++++
 5 files changed, 382 insertions(+)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 886f8f5e72..40cf315170 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -276,6 +276,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..422a6673cb 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to accelerate
+  receiving pkts from guest.
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index f1293c6a9d..23fe1a7316 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -187,6 +187,39 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  The amount of in-flight packets. If error occurred, its value is set to -1.
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 0a66c5840c..514e3ff6a6 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -87,6 +87,9 @@ EXPERIMENTAL {
 
 	# added in 22.03
 	rte_vhost_async_dma_configure;
+
+	# added in 22.07
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 56904ad9a5..514315ef50 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3166,3 +3166,338 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info;
+	struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+
+	pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n", dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG,
+			"(%s) %s: failed to transfer data for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, queue_id, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: built-in vhost net backend is disabled.\n",
+			dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: invalid virtqueue idx %d.\n",
+			dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+			       dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 5/5] examples/vhost: support async dequeue data path
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-04-11 10:00   ` [PATCH v2 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-04-11 10:00   ` xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-11 10:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 292 ++++++++++++++++++++---------
 examples/vhost/main.h              |  35 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 254 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d94fabb060..d26e40ab73 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,13 +1019,13 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight, complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -1031,23 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1056,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1328,6 +1341,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+	__atomic_add_fetch(&dev->pkts_enq_inflight, enqueue_count, __ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1358,26 +1398,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1386,10 +1408,33 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1397,13 +1442,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1482,6 +1522,31 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight, dma_id, 0);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight, dma_id, 0);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1538,25 +1603,78 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-		}
-
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
-		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = false;
+	}
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = false;
 	}
 
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1568,6 +1686,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1590,6 +1710,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1621,16 +1752,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1648,19 +1770,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1885,7 +1997,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index b4a453e77e..40ac2841d1 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -52,7 +52,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -62,6 +63,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -88,6 +102,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -98,7 +113,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (5 preceding siblings ...)
  2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
@ 2022-04-19  3:43 ` xuan.ding
  2022-04-19  3:43   ` [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (4 more replies)
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
                   ` (4 subsequent siblings)
  11 siblings, 5 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 292 ++++++++++-----
 examples/vhost/main.h                  |  35 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  33 ++
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 470 ++++++++++++++++++++++---
 10 files changed, 720 insertions(+), 150 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
@ 2022-04-19  3:43   ` xuan.ding
  2022-04-22 15:30     ` Maxime Coquelin
  2022-04-19  3:43   ` [PATCH v3 2/5] vhost: prepare async " xuan.ding
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 76 ++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..6d53016c75 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
+			sync_fill_seg(dev, vq, m, mbuf_offset,
 					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+					buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
  2022-04-19  3:43   ` [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-04-19  3:43   ` xuan.ding
  2022-04-22 15:32     ` Maxime Coquelin
  2022-04-19  3:43   ` [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/virtio_net.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 6d53016c75..391fb82f0e 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,16 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst,
+						 (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1174,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+						buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
  2022-04-19  3:43   ` [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-04-19  3:43   ` [PATCH v3 2/5] vhost: prepare async " xuan.ding
@ 2022-04-19  3:43   ` xuan.ding
  2022-04-22 11:06     ` David Marchand
  2022-04-22 15:43     ` Maxime Coquelin
  2022-04-19  3:43   ` [PATCH v3 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-04-19  3:43   ` [PATCH v3 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 2 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patches refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..9209558465 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -177,6 +177,7 @@ extern struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX];
  * inflight async packet information
  */
 struct async_inflight_info {
+	struct virtio_net_hdr nethdr;
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 391fb82f0e..6f5bd21946 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2488,10 +2488,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2502,6 +2502,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2539,6 +2541,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2554,12 +2557,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				buf_addr + buf_offset,
+				buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2608,11 +2624,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2744,8 +2769,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2756,6 +2781,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2937,8 +2963,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+				mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 4/5] vhost: support async dequeue for split ring
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-04-19  3:43   ` [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-04-19  3:43   ` xuan.ding
  2022-04-19  3:43   ` [PATCH v3 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 lib/vhost/rte_vhost_async.h            |  33 +++
 lib/vhost/version.map                  |   3 +
 lib/vhost/virtio_net.c                 | 335 +++++++++++++++++++++++++
 5 files changed, 382 insertions(+)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 886f8f5e72..40cf315170 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -276,6 +276,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..422a6673cb 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to accelerate
+  receiving pkts from guest.
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index f1293c6a9d..23fe1a7316 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -187,6 +187,39 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  The amount of in-flight packets. If error occurred, its value is set to -1.
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 0a66c5840c..514e3ff6a6 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -87,6 +87,9 @@ EXPERIMENTAL {
 
 	# added in 22.03
 	rte_vhost_async_dma_configure;
+
+	# added in 22.07
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 6f5bd21946..4cead0374b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3172,3 +3172,338 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info;
+	struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+
+	pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) - vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n", dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG,
+			"(%s) %s: failed to transfer data for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, queue_id, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, uint16_t queue_id,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
+		uint16_t count, uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, queue_id, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: built-in vhost net backend is disabled.\n",
+			dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR,
+			"(%s) %s: invalid virtqueue idx %d.\n",
+			dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+			       dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+			dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, queue_id,
+				mbuf_pool, pkts, count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 5/5] examples/vhost: support async dequeue data path
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-04-19  3:43   ` [PATCH v3 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-04-19  3:43   ` xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-04-19  3:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 292 ++++++++++++++++++++---------
 examples/vhost/main.h              |  35 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 254 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d94fabb060..d26e40ab73 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,13 +1019,13 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight, complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -1031,23 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1056,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1328,6 +1341,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+	__atomic_add_fetch(&dev->pkts_enq_inflight, enqueue_count, __ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1358,26 +1398,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1386,10 +1408,33 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1397,13 +1442,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1482,6 +1522,31 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight, dma_id, 0);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight, dma_id, 0);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1538,25 +1603,78 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-		}
-
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
-		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = false;
+	}
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = false;
 	}
 
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1568,6 +1686,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1590,6 +1710,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1621,16 +1752,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1648,19 +1770,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1885,7 +1997,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index b4a453e77e..40ac2841d1 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -52,7 +52,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -62,6 +63,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -88,6 +102,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -98,7 +113,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-19  3:43   ` [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-04-22 11:06     ` David Marchand
  2022-04-22 15:46       ` Maxime Coquelin
  2022-04-22 15:43     ` Maxime Coquelin
  1 sibling, 1 reply; 73+ messages in thread
From: David Marchand @ 2022-04-22 11:06 UTC (permalink / raw)
  To: Xuan Ding
  Cc: Maxime Coquelin, Xia, Chenbo, dev, Jiayu Hu, Cheng Jiang,
	Sunil Pai G, liangma

We (at RH) have some issues with our email infrastructure, so I can't
reply inline of the patch.

Copy/pasting the code:

+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id,
+ struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+ uint16_t vchan_id, bool legacy_ol_flags)
+{
+ uint16_t start_idx, from, i;
+ uint16_t nr_cpl_pkts = 0;
+ struct async_inflight_info *pkts_info;
+ struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
+

Please, don't pass queue_id as an input parameter for
async_poll_dequeue_completed_split().
The caller of this helper already dereferenced the vq.
You can pass vq.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-04-19  3:43   ` [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-04-22 15:30     ` Maxime Coquelin
  0 siblings, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-04-22 15:30 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia; +Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma

Hi Xuan,

On 4/19/22 05:43, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch extracts the descriptors to buffers filling from
> copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
> and dequeue path are refactored to use the same function
> sync_fill_seg() for preparing batch elements, which simplifies
> the code without performance degradation.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
>   lib/vhost/virtio_net.c | 76 ++++++++++++++++++++----------------------
>   1 file changed, 37 insertions(+), 39 deletions(-)
> 

Nice refactoring, thanks for donig it:

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-04-19  3:43   ` [PATCH v3 2/5] vhost: prepare async " xuan.ding
@ 2022-04-22 15:32     ` Maxime Coquelin
  0 siblings, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-04-22 15:32 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia; +Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma



On 4/19/22 05:43, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch refactors vhost async enqueue path and dequeue path to use
> the same function async_fill_seg() for preparing batch elements,
> which simplifies the code without performance degradation.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
>   lib/vhost/virtio_net.c | 23 +++++++++++++++--------
>   1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 6d53016c75..391fb82f0e 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
>   }
>   
>   static __rte_always_inline int
> -async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
> +async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
>   		struct rte_mbuf *m, uint32_t mbuf_offset,
> -		uint64_t buf_iova, uint32_t cpy_len)
> +		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
>   {
>   	struct vhost_async *async = vq->async;
>   	uint64_t mapped_len;
>   	uint32_t buf_offset = 0;
> +	void *src, *dst;
>   	void *host_iova;
>   
>   	while (cpy_len) {
> @@ -1015,10 +1016,16 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
>   			return -1;
>   		}
>   
> -		if (unlikely(async_iter_add_iovec(dev, async,
> -						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> -							mbuf_offset),
> -						host_iova, (size_t)mapped_len)))
> +		if (to_desc) {
> +			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
> +			dst = host_iova;
> +		} else {
> +			src = host_iova;
> +			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
> +		}
> +
> +		if (unlikely(async_iter_add_iovec(dev, async, src, dst,
> +						 (size_t)mapped_len)))

Minor, but it may fit in a single line.

>   			return -1;
>   
>   		cpy_len -= (uint32_t)mapped_len;
> @@ -1167,8 +1174,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>   		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
>   
>   		if (is_async) {
> -			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
> -						buf_iova + buf_offset, cpy_len) < 0)
> +			if (async_fill_seg(dev, vq, m, mbuf_offset,
> +						buf_iova + buf_offset, cpy_len, true) < 0)
>   				goto error;
>   		} else {
>   			sync_fill_seg(dev, vq, m, mbuf_offset,

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-19  3:43   ` [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
  2022-04-22 11:06     ` David Marchand
@ 2022-04-22 15:43     ` Maxime Coquelin
  1 sibling, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-04-22 15:43 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia; +Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma



On 4/19/22 05:43, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patches refactors copy_desc_to_mbuf() used by the sync
> path to support both sync and async descriptor to mbuf filling.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
>   lib/vhost/vhost.h      |  1 +
>   lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
>   2 files changed, 38 insertions(+), 11 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-22 11:06     ` David Marchand
@ 2022-04-22 15:46       ` Maxime Coquelin
  2022-04-24  2:02         ` Ding, Xuan
  0 siblings, 1 reply; 73+ messages in thread
From: Maxime Coquelin @ 2022-04-22 15:46 UTC (permalink / raw)
  To: David Marchand, Xuan Ding
  Cc: Xia, Chenbo, dev, Jiayu Hu, Cheng Jiang, Sunil Pai G, liangma



On 4/22/22 13:06, David Marchand wrote:
> We (at RH) have some issues with our email infrastructure, so I can't
> reply inline of the patch.
> 
> Copy/pasting the code:
> 
> +static __rte_always_inline uint16_t
> +async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id,
> + struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
> + uint16_t vchan_id, bool legacy_ol_flags)
> +{
> + uint16_t start_idx, from, i;
> + uint16_t nr_cpl_pkts = 0;
> + struct async_inflight_info *pkts_info;
> + struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
> +
> 
> Please, don't pass queue_id as an input parameter for
> async_poll_dequeue_completed_split().
> The caller of this helper already dereferenced the vq.
> You can pass vq.
> 
> 


I think David's comment was intended to be a reply to patch 4, but I 
agree with him.

Could you please fix this and also fix the build issues reported by the
CI? I'll continue the review on V4.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-04-22 15:46       ` Maxime Coquelin
@ 2022-04-24  2:02         ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-04-24  2:02 UTC (permalink / raw)
  To: Maxime Coquelin, David Marchand
  Cc: Xia, Chenbo, dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma

Hi Maxime, David,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, April 22, 2022 11:46 PM
> To: David Marchand <david.marchand@redhat.com>; Ding, Xuan
> <xuan.ding@intel.com>
> Cc: Xia, Chenbo <chenbo.xia@intel.com>; dev <dev@dpdk.org>; Hu, Jiayu
> <jiayu.hu@intel.com>; Jiang, Cheng1 <cheng1.jiang@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; liangma@liangbit.com
> Subject: Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf
> filling
> 
> 
> 
> On 4/22/22 13:06, David Marchand wrote:
> > We (at RH) have some issues with our email infrastructure, so I can't
> > reply inline of the patch.
> >
> > Copy/pasting the code:
> >
> > +static __rte_always_inline uint16_t
> > +async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t
> > +queue_id,  struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
> > +uint16_t vchan_id, bool legacy_ol_flags) {  uint16_t start_idx, from,
> > +i;  uint16_t nr_cpl_pkts = 0;  struct async_inflight_info *pkts_info;
> > +struct vhost_virtqueue *vq = dev->virtqueue[queue_id];
> > +
> >
> > Please, don't pass queue_id as an input parameter for
> > async_poll_dequeue_completed_split().
> > The caller of this helper already dereferenced the vq.
> > You can pass vq.
> >
> >
> 
> 
> I think David's comment was intended to be a reply to patch 4, but I agree
> with him.
> 
> Could you please fix this and also fix the build issues reported by the CI? I'll
> continue the review on V4.

Thanks for your suggestion, please see v4.

Regards,
Xuan

> 
> Thanks,
> Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (6 preceding siblings ...)
  2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
@ 2022-05-05  6:23 ` xuan.ding
  2022-05-05  6:23   ` [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (5 more replies)
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
                   ` (3 subsequent siblings)
  11 siblings, 6 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v3->v4:
* fix CI build warnings
* adjust some indentation
* pass vq instead of queue_id

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 292 +++++++++++-----
 examples/vhost/main.h                  |  35 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  33 ++
 lib/vhost/version.map                  |   3 +
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 467 ++++++++++++++++++++++---
 10 files changed, 716 insertions(+), 151 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
@ 2022-05-05  6:23   ` xuan.ding
  2022-05-05  7:37     ` Yang, YvonneX
  2022-05-05  6:23   ` [PATCH v4 2/5] vhost: prepare async " xuan.ding
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..d4c94d2a9b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
  2022-05-05  6:23   ` [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-05  6:23   ` xuan.ding
  2022-05-05  7:38     ` Yang, YvonneX
  2022-05-05  6:23   ` [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index d4c94d2a9b..a9e2dcd9ce 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,15 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst, (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1173,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
  2022-05-05  6:23   ` [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-05-05  6:23   ` [PATCH v4 2/5] vhost: prepare async " xuan.ding
@ 2022-05-05  6:23   ` xuan.ding
  2022-05-05  7:39     ` Yang, YvonneX
  2022-05-05  6:23   ` [PATCH v4 4/5] vhost: support async dequeue for split ring xuan.ding
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..00744b234f 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -180,6 +180,7 @@ struct async_inflight_info {
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
+	struct virtio_net_hdr nethdr;
 };
 
 struct vhost_async {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a9e2dcd9ce..5904839d5c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2487,10 +2487,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2501,6 +2501,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2538,6 +2540,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2553,12 +2556,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2607,11 +2623,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2743,8 +2768,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				   mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2755,6 +2780,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2936,8 +2962,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+			   mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 4/5] vhost: support async dequeue for split ring
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-05-05  6:23   ` [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-05  6:23   ` xuan.ding
  2022-05-05  7:40     ` Yang, YvonneX
  2022-05-05 19:36     ` Maxime Coquelin
  2022-05-05  6:23   ` [PATCH v4 5/5] examples/vhost: support async dequeue data path xuan.ding
  2022-05-05 19:52   ` [PATCH v4 0/5] vhost: " Maxime Coquelin
  5 siblings, 2 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   4 +
 lib/vhost/rte_vhost_async.h            |  33 +++
 lib/vhost/version.map                  |   3 +
 lib/vhost/virtio_net.c                 | 331 +++++++++++++++++++++++++
 5 files changed, 378 insertions(+)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 886f8f5e72..40cf315170 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -276,6 +276,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 42a5f2d990..422a6673cb 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to accelerate
+  receiving pkts from guest.
 
 Removed Items
 -------------
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index f1293c6a9d..23fe1a7316 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -187,6 +187,39 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  The amount of in-flight packets. If error occurred, its value is set to -1.
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 0a66c5840c..514e3ff6a6 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -87,6 +87,9 @@ EXPERIMENTAL {
 
 	# added in 22.03
 	rte_vhost_async_dma_configure;
+
+	# added in 22.07
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5904839d5c..0e5fecfd2c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3171,3 +3171,334 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
+			dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer data.\n",
+				dev->ifname, __func__);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, vq, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	*nr_inflight = -1;
+
+	dev = get_device(vid);
+	if (!dev)
+		return 0;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: built-in vhost net backend is disabled.\n",
+				dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid virtqueue idx %d.\n",
+				dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+				dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+				dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, mbuf_pool, pkts,
+							 count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, mbuf_pool, pkts,
+							    count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 5/5] examples/vhost: support async dequeue data path
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-05-05  6:23   ` [PATCH v4 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-05  6:23   ` xuan.ding
  2022-05-05  7:39     ` Yang, YvonneX
  2022-05-05 19:38     ` Maxime Coquelin
  2022-05-05 19:52   ` [PATCH v4 0/5] vhost: " Maxime Coquelin
  5 siblings, 2 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-05  6:23 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 292 ++++++++++++++++++++---------
 examples/vhost/main.h              |  35 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 254 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d94fabb060..d26e40ab73 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,13 +1019,13 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
 	if (complete_count) {
 		free_pkts(p_cpl, complete_count);
-		__atomic_sub_fetch(&vdev->pkts_inflight, complete_count, __ATOMIC_SEQ_CST);
+		__atomic_sub_fetch(&vdev->pkts_enq_inflight, complete_count, __ATOMIC_SEQ_CST);
 	}
 
 }
@@ -1031,23 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, ret, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1056,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1328,6 +1341,33 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+	__atomic_add_fetch(&dev->pkts_enq_inflight, enqueue_count, __ATOMIC_SEQ_CST);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1358,26 +1398,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-		__atomic_add_fetch(&vdev->pkts_inflight, enqueue_count, __ATOMIC_SEQ_CST);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1386,10 +1408,33 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+	if (likely(nr_inflight != -1))
+		dev->pkts_deq_inflight = nr_inflight;
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1397,13 +1442,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1482,6 +1522,31 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	struct rte_mbuf *m_enq_cpl[vdev->pkts_enq_inflight];
+	struct rte_mbuf *m_deq_cpl[vdev->pkts_deq_inflight];
+
+	if (queue_id % 2 == 0) {
+		while (vdev->pkts_enq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_enq_cpl, vdev->pkts_enq_inflight, dma_id, 0);
+			free_pkts(m_enq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_enq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	} else {
+		while (vdev->pkts_deq_inflight) {
+			n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
+				queue_id, m_deq_cpl, vdev->pkts_deq_inflight, dma_id, 0);
+			free_pkts(m_deq_cpl, n_pkt);
+			__atomic_sub_fetch(&vdev->pkts_deq_inflight, n_pkt, __ATOMIC_SEQ_CST);
+		}
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1538,25 +1603,78 @@ destroy_device(int vid)
 		"(%d) device has been removed from data core\n",
 		vdev->vid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-		while (vdev->pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, vdev->pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-		}
-
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
-		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = false;
+	}
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = false;
 	}
 
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1568,6 +1686,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1590,6 +1710,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1621,16 +1752,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1648,19 +1770,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[vdev->pkts_inflight];
-
-			while (vdev->pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, vdev->pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				__atomic_sub_fetch(&vdev->pkts_inflight, n_pkt, __ATOMIC_SEQ_CST);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1885,7 +1997,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index b4a453e77e..40ac2841d1 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -52,7 +52,8 @@ struct vhost_dev {
 	uint64_t features;
 	size_t hdr_len;
 	uint16_t nr_vrings;
-	uint16_t pkts_inflight;
+	uint16_t pkts_enq_inflight;
+	uint16_t pkts_deq_inflight;
 	struct rte_vhost_memory *mem;
 	struct device_statistics stats;
 	TAILQ_ENTRY(vhost_dev) global_vdev_entry;
@@ -62,6 +63,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -88,6 +102,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -98,7 +113,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-05  6:23   ` [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-05  7:37     ` Yang, YvonneX
  0 siblings, 0 replies; 73+ messages in thread
From: Yang, YvonneX @ 2022-05-05  7:37 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ding, Xuan



> -----Original Message-----
> From: xuan.ding@intel.com <xuan.ding@intel.com>
> Sent: 2022年5月5日 14:24
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>
> Subject: [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch extracts the descriptors to buffers filling from
> copy_desc_to_mbuf() into a dedicated function. Besides, enqueue and dequeue
> path are refactored to use the same function
> sync_fill_seg() for preparing batch elements, which simplifies the code without
> performance degradation.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
>  1 file changed, 38 insertions(+), 40 deletions(-)
> 

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-05  6:23   ` [PATCH v4 2/5] vhost: prepare async " xuan.ding
@ 2022-05-05  7:38     ` Yang, YvonneX
  0 siblings, 0 replies; 73+ messages in thread
From: Yang, YvonneX @ 2022-05-05  7:38 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ding, Xuan



> -----Original Message-----
> From: xuan.ding@intel.com <xuan.ding@intel.com>
> Sent: 2022年5月5日 14:24
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>
> Subject: [PATCH v4 2/5] vhost: prepare async for descriptor to mbuf refactoring
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch refactors vhost async enqueue path and dequeue path to use the
> same function async_fill_seg() for preparing batch elements, which simplifies the
> code without performance degradation.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/virtio_net.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 


Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-05  6:23   ` [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-05  7:39     ` Yang, YvonneX
  0 siblings, 0 replies; 73+ messages in thread
From: Yang, YvonneX @ 2022-05-05  7:39 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ding, Xuan



> -----Original Message-----
> From: xuan.ding@intel.com <xuan.ding@intel.com>
> Sent: 2022年5月5日 14:24
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>
> Subject: [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch refactors copy_desc_to_mbuf() used by the sync path to support both
> sync and async descriptor to mbuf filling.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.h      |  1 +
>  lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
>  2 files changed, 38 insertions(+), 11 deletions(-)
> 

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 5/5] examples/vhost: support async dequeue data path
  2022-05-05  6:23   ` [PATCH v4 5/5] examples/vhost: support async dequeue data path xuan.ding
@ 2022-05-05  7:39     ` Yang, YvonneX
  2022-05-05 19:38     ` Maxime Coquelin
  1 sibling, 0 replies; 73+ messages in thread
From: Yang, YvonneX @ 2022-05-05  7:39 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ding, Xuan,
	Ma, WenwuX, Wang, YuanX



> -----Original Message-----
> From: xuan.ding@intel.com <xuan.ding@intel.com>
> Sent: 2022年5月5日 14:24
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> Subject: [PATCH v4 5/5] examples/vhost: support async dequeue data path
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch adds the use case for async dequeue API. Vswitch can leverage DMA
> device to accelerate vhost async dequeue path.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
>  doc/guides/sample_app_ug/vhost.rst |   9 +-
>  examples/vhost/main.c              | 292 ++++++++++++++++++++---------
>  examples/vhost/main.h              |  35 +++-
>  examples/vhost/virtio_net.c        |  16 +-
>  4 files changed, 254 insertions(+), 98 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/vhost.rst
> b/doc/guides/sample_app_ug/vhost.rst
> index a6ce4bc8ac..09db965e70 100644
> --- a/doc/guides/sample_app_ug/vhost.rst
> +++ b/doc/guides/sample_app_ug/vhost.rst
> @@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's
> used in combination with dmas

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 4/5] vhost: support async dequeue for split ring
  2022-05-05  6:23   ` [PATCH v4 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-05  7:40     ` Yang, YvonneX
  2022-05-05 19:36     ` Maxime Coquelin
  1 sibling, 0 replies; 73+ messages in thread
From: Yang, YvonneX @ 2022-05-05  7:40 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ding, Xuan,
	Wang, YuanX



> -----Original Message-----
> From: xuan.ding@intel.com <xuan.ding@intel.com>
> Sent: 2022年5月5日 14:24
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>
> Subject: [PATCH v4 4/5] vhost: support async dequeue for split ring
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch implements asynchronous dequeue data path for vhost split ring, a
> new API rte_vhost_async_try_dequeue_burst() is introduced.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst    |   7 +
>  doc/guides/rel_notes/release_22_07.rst |   4 +
>  lib/vhost/rte_vhost_async.h            |  33 +++
>  lib/vhost/version.map                  |   3 +
>  lib/vhost/virtio_net.c                 | 331
> +++++++++++++++++++++++++
>  5 files changed, 378 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index 886f8f5e72..40cf315170 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst

Tested-by: Yvonne Yang <yvonnex.yang@intel.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 4/5] vhost: support async dequeue for split ring
  2022-05-05  6:23   ` [PATCH v4 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-05-05  7:40     ` Yang, YvonneX
@ 2022-05-05 19:36     ` Maxime Coquelin
  1 sibling, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-05-05 19:36 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Yuan Wang



On 5/5/22 08:23, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch implements asynchronous dequeue data path for vhost split
> ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> ---
>   doc/guides/prog_guide/vhost_lib.rst    |   7 +
>   doc/guides/rel_notes/release_22_07.rst |   4 +
>   lib/vhost/rte_vhost_async.h            |  33 +++
>   lib/vhost/version.map                  |   3 +
>   lib/vhost/virtio_net.c                 | 331 +++++++++++++++++++++++++
>   5 files changed, 378 insertions(+)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 5/5] examples/vhost: support async dequeue data path
  2022-05-05  6:23   ` [PATCH v4 5/5] examples/vhost: support async dequeue data path xuan.ding
  2022-05-05  7:39     ` Yang, YvonneX
@ 2022-05-05 19:38     ` Maxime Coquelin
  1 sibling, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-05-05 19:38 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Wenwu Ma, Yuan Wang



On 5/5/22 08:23, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch adds the use case for async dequeue API. Vswitch can
> leverage DMA device to accelerate vhost async dequeue path.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
>   doc/guides/sample_app_ug/vhost.rst |   9 +-
>   examples/vhost/main.c              | 292 ++++++++++++++++++++---------
>   examples/vhost/main.h              |  35 +++-
>   examples/vhost/virtio_net.c        |  16 +-
>   4 files changed, 254 insertions(+), 98 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 0/5] vhost: support async dequeue data path
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
                     ` (4 preceding siblings ...)
  2022-05-05  6:23   ` [PATCH v4 5/5] examples/vhost: support async dequeue data path xuan.ding
@ 2022-05-05 19:52   ` Maxime Coquelin
  2022-05-06  1:49     ` Ding, Xuan
  5 siblings, 1 reply; 73+ messages in thread
From: Maxime Coquelin @ 2022-05-05 19:52 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia; +Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma

Hi Xuan,

On 5/5/22 08:23, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> The presence of asynchronous path allows applications to offload memory
> copies to DMA engine, so as to save CPU cycles and improve the copy
> performance. This patch set implements vhost async dequeue data path
> for split ring. The code is based on latest enqueue changes [1].
> 
> This patch set is a new design and implementation of [2]. Since dmadev
> was introduced in DPDK 21.11, to simplify application logics, this patch
> integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
> mapping between vrings and DMA virtual channels. Specifically, one vring
> can use multiple different DMA channels and one DMA channel can be
> shared by multiple vrings at the same time.
> 
> A new asynchronous dequeue function is introduced:
>          1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
>                  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
>                  uint16_t count, int *nr_inflight,
>                  uint16_t dma_id, uint16_t vchan_id)
> 
>          Receive packets from the guest and offloads copies to DMA
> virtual channel.
> 
> [1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
> [2] https://mails.dpdk.org/archives/dev/2021-September/218591.html
> 
> v3->v4:
> * fix CI build warnings
> * adjust some indentation
> * pass vq instead of queue_id
> 
> v2->v3:
> * fix mbuf not updated correctly for large packets
> 
> v1->v2:
> * fix a typo
> * fix a bug in desc_to_mbuf filling
> 
> RFC v3 -> v1:
> * add sync and async path descriptor to mbuf refactoring
> * add API description in docs
> 
> RFC v2 -> RFC v3:
> * rebase to latest DPDK version
> 
> RFC v1 -> RFC v2:
> * fix one bug in example
> * rename vchan to vchan_id
> * check if dma_id and vchan_id valid
> * rework all the logs to new standard
> 
> Xuan Ding (5):
>    vhost: prepare sync for descriptor to mbuf refactoring
>    vhost: prepare async for descriptor to mbuf refactoring
>    vhost: merge sync and async descriptor to mbuf filling
>    vhost: support async dequeue for split ring
>    examples/vhost: support async dequeue data path
> 
>   doc/guides/prog_guide/vhost_lib.rst    |   7 +
>   doc/guides/rel_notes/release_22_07.rst |   4 +
>   doc/guides/sample_app_ug/vhost.rst     |   9 +-
>   examples/vhost/main.c                  | 292 +++++++++++-----
>   examples/vhost/main.h                  |  35 +-
>   examples/vhost/virtio_net.c            |  16 +-
>   lib/vhost/rte_vhost_async.h            |  33 ++
>   lib/vhost/version.map                  |   3 +
>   lib/vhost/vhost.h                      |   1 +
>   lib/vhost/virtio_net.c                 | 467 ++++++++++++++++++++++---
>   10 files changed, 716 insertions(+), 151 deletions(-)
> 

I applied your other series about unsafe API to get DMA inflight
packets, so I have some conflicts when applying this series.

Could you please rebase on top of next-virtio/main branch and repost?

Thanks in avance,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 0/5] vhost: support async dequeue data path
  2022-05-05 19:52   ` [PATCH v4 0/5] vhost: " Maxime Coquelin
@ 2022-05-06  1:49     ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-05-06  1:49 UTC (permalink / raw)
  To: Maxime Coquelin, Xia, Chenbo
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, May 6, 2022 3:53 AM
> To: Ding, Xuan <xuan.ding@intel.com>; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com
> Subject: Re: [PATCH v4 0/5] vhost: support async dequeue data path
> 
> Hi Xuan,
> 
> On 5/5/22 08:23, xuan.ding@intel.com wrote:
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > The presence of asynchronous path allows applications to offload
> > memory copies to DMA engine, so as to save CPU cycles and improve the
> > copy performance. This patch set implements vhost async dequeue data
> > path for split ring. The code is based on latest enqueue changes [1].
> >
> > This patch set is a new design and implementation of [2]. Since dmadev
> > was introduced in DPDK 21.11, to simplify application logics, this
> > patch integrates dmadev in vhost. With dmadev integrated, vhost
> > supports M:N mapping between vrings and DMA virtual channels.
> > Specifically, one vring can use multiple different DMA channels and
> > one DMA channel can be shared by multiple vrings at the same time.
> >
> > A new asynchronous dequeue function is introduced:
> >          1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> >                  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
> >                  uint16_t count, int *nr_inflight,
> >                  uint16_t dma_id, uint16_t vchan_id)
> >
> >          Receive packets from the guest and offloads copies to DMA
> > virtual channel.
> >
> > [1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
> > [2] https://mails.dpdk.org/archives/dev/2021-September/218591.html
> >
> > v3->v4:
> > * fix CI build warnings
> > * adjust some indentation
> > * pass vq instead of queue_id
> >
> > v2->v3:
> > * fix mbuf not updated correctly for large packets
> >
> > v1->v2:
> > * fix a typo
> > * fix a bug in desc_to_mbuf filling
> >
> > RFC v3 -> v1:
> > * add sync and async path descriptor to mbuf refactoring
> > * add API description in docs
> >
> > RFC v2 -> RFC v3:
> > * rebase to latest DPDK version
> >
> > RFC v1 -> RFC v2:
> > * fix one bug in example
> > * rename vchan to vchan_id
> > * check if dma_id and vchan_id valid
> > * rework all the logs to new standard
> >
> > Xuan Ding (5):
> >    vhost: prepare sync for descriptor to mbuf refactoring
> >    vhost: prepare async for descriptor to mbuf refactoring
> >    vhost: merge sync and async descriptor to mbuf filling
> >    vhost: support async dequeue for split ring
> >    examples/vhost: support async dequeue data path
> >
> >   doc/guides/prog_guide/vhost_lib.rst    |   7 +
> >   doc/guides/rel_notes/release_22_07.rst |   4 +
> >   doc/guides/sample_app_ug/vhost.rst     |   9 +-
> >   examples/vhost/main.c                  | 292 +++++++++++-----
> >   examples/vhost/main.h                  |  35 +-
> >   examples/vhost/virtio_net.c            |  16 +-
> >   lib/vhost/rte_vhost_async.h            |  33 ++
> >   lib/vhost/version.map                  |   3 +
> >   lib/vhost/vhost.h                      |   1 +
> >   lib/vhost/virtio_net.c                 | 467 ++++++++++++++++++++++---
> >   10 files changed, 716 insertions(+), 151 deletions(-)
> >
> 
> I applied your other series about unsafe API to get DMA inflight packets, so I
> have some conflicts when applying this series.
> 
> Could you please rebase on top of next-virtio/main branch and repost?

Sure, I will rebase on top of latest main branch, please see v5.

Thanks,
Xuan

> 
> Thanks in avance,
> Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (7 preceding siblings ...)
  2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
@ 2022-05-13  2:00 ` xuan.ding
  2022-05-13  2:00   ` [PATCH v5 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (4 more replies)
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
                   ` (2 subsequent siblings)
  11 siblings, 5 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v4->v5:
* rebase to latest DPDK
* add some checks

v3->v4:
* fix CI build warnings
* adjust some indentation
* pass vq instead of queue_id

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 284 ++++++++++-----
 examples/vhost/main.h                  |  32 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  34 ++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 473 ++++++++++++++++++++++---
 10 files changed, 711 insertions(+), 152 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
@ 2022-05-13  2:00   ` xuan.ding
  2022-05-13  2:00   ` [PATCH v5 2/5] vhost: prepare async " xuan.ding
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..d4c94d2a9b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
  2022-05-13  2:00   ` [PATCH v5 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-13  2:00   ` xuan.ding
  2022-05-13  2:00   ` [PATCH v5 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index d4c94d2a9b..a9e2dcd9ce 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,15 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst, (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1173,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
  2022-05-13  2:00   ` [PATCH v5 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-05-13  2:00   ` [PATCH v5 2/5] vhost: prepare async " xuan.ding
@ 2022-05-13  2:00   ` xuan.ding
  2022-05-13  2:00   ` [PATCH v5 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-05-13  2:00   ` [PATCH v5 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..00744b234f 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -180,6 +180,7 @@ struct async_inflight_info {
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
+	struct virtio_net_hdr nethdr;
 };
 
 struct vhost_async {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a9e2dcd9ce..5904839d5c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2487,10 +2487,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2501,6 +2501,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2538,6 +2540,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2553,12 +2556,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2607,11 +2623,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2743,8 +2768,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				   mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2755,6 +2780,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2936,8 +2962,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+			   mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 4/5] vhost: support async dequeue for split ring
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
                     ` (2 preceding siblings ...)
  2022-05-13  2:00   ` [PATCH v5 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-13  2:00   ` xuan.ding
  2022-05-13  2:24     ` Stephen Hemminger
  2022-05-13  2:00   ` [PATCH v5 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 lib/vhost/rte_vhost_async.h            |  34 +++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
 5 files changed, 384 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index f287b76ebf..09c1c24b48 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -282,6 +282,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88b1e478d4..564d88623e 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -70,6 +70,11 @@ New Features
   Added an API which can get the number of inflight packets in
   vhost async data path without using lock.
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to
+  accelerate receiving pkts from guest.
+
 Removed Items
 -------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 70234debf9..161dabc652 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -204,6 +204,40 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  >= 0: The amount of in-flight packets
+ *  -1: meaningless, indicates failed lock acquisition or invalid queue_id/dma_id
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 5841315386..8c7211bf0d 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -90,7 +90,7 @@ EXPERIMENTAL {
 
 	# added in 22.07
 	rte_vhost_async_get_inflight_thread_unsafe;
-
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5904839d5c..8290514e65 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3171,3 +3171,340 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
+			dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer data.\n",
+				dev->ifname, __func__);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, vq, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	dev = get_device(vid);
+	if (!dev || !nr_inflight)
+		return 0;
+
+	*nr_inflight = -1;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: built-in vhost net backend is disabled.\n",
+				dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid virtqueue idx %d.\n",
+				dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(dma_id >= RTE_DMADEV_DEFAULT_MAX)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid dma id %u.\n",
+				dev->ifname, __func__, dma_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+				dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+				dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, mbuf_pool, pkts,
+							 count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, mbuf_pool, pkts,
+							    count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 5/5] examples/vhost: support async dequeue data path
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
                     ` (3 preceding siblings ...)
  2022-05-13  2:00   ` [PATCH v5 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-13  2:00   ` xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:00 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 284 ++++++++++++++++++++---------
 examples/vhost/main.h              |  32 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 243 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c4d46de1c5..d070391727 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,7 +1019,7 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
@@ -1029,22 +1058,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1053,7 +1067,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1325,6 +1339,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1355,25 +1395,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1382,10 +1405,31 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1393,13 +1437,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1478,6 +1517,26 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	int pkts_inflight;
+
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid, queue_id);
+
+	struct rte_mbuf *m_cpl[pkts_inflight];
+
+	while (pkts_inflight) {
+		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid, queue_id, m_cpl,
+							pkts_inflight, dma_id, 0);
+		free_pkts(m_cpl, n_pkt);
+		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
+									queue_id);
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1535,27 +1594,79 @@ destroy_device(int vid)
 		vdev->vid);
 
 	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int pkts_inflight;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, VIRTIO_RXQ);
-		struct rte_mbuf *m_cpl[pkts_inflight];
-
-		while (pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-										VIRTIO_RXQ);
-		}
-
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
 		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
 	}
 
+	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
+	}
+
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1567,6 +1678,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1589,6 +1702,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1620,16 +1744,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1647,22 +1762,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int pkts_inflight;
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[pkts_inflight];
-
-			while (pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-											queue_id);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1887,7 +1989,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7f395c3c9..2fcb8376c5 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,6 +100,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -97,7 +111,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 4/5] vhost: support async dequeue for split ring
  2022-05-13  2:00   ` [PATCH v5 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-13  2:24     ` Stephen Hemminger
  2022-05-13  2:33       ` Ding, Xuan
  0 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2022-05-13  2:24 UTC (permalink / raw)
  To: xuan.ding
  Cc: maxime.coquelin, chenbo.xia, dev, jiayu.hu, cheng1.jiang,
	sunil.pai.g, liangma, Yuan Wang

On Fri, 13 May 2022 02:00:21 +0000
xuan.ding@intel.com wrote:

>  
> +/**
> + * This function tries to receive packets from the guest with offloading
> + * copies to the async channel. The packets that are transfer completed
> + * are returned in "pkts". The other packets that their copies are submitted to
> + * the async channel but not completed are called "in-flight packets".
> + * This function will not return in-flight packets until their copies are
> + * completed by the async channel.
> + *

Please add EXPERIMENTAL header like this, so it shows up in docmentation
correctly.

 *
 * @warning
 * @b EXPERIMENTAL:
 * All functions in this file may be changed or removed without prior notice.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v5 4/5] vhost: support async dequeue for split ring
  2022-05-13  2:24     ` Stephen Hemminger
@ 2022-05-13  2:33       ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-05-13  2:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: maxime.coquelin, Xia, Chenbo, dev, Hu, Jiayu, Jiang, Cheng1,
	Pai G, Sunil, liangma, Wang, YuanX



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, May 13, 2022 10:24 AM
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>;
> dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Wang, YuanX <yuanx.wang@intel.com>
> Subject: Re: [PATCH v5 4/5] vhost: support async dequeue for split ring
> 
> On Fri, 13 May 2022 02:00:21 +0000
> xuan.ding@intel.com wrote:
> 
> >
> > +/**
> > + * This function tries to receive packets from the guest with
> > +offloading
> > + * copies to the async channel. The packets that are transfer
> > +completed
> > + * are returned in "pkts". The other packets that their copies are
> > +submitted to
> > + * the async channel but not completed are called "in-flight packets".
> > + * This function will not return in-flight packets until their copies
> > +are
> > + * completed by the async channel.
> > + *
> 
> Please add EXPERIMENTAL header like this, so it shows up in docmentation
> correctly.

Get it, thanks for your suggestion.

Regards,
Xuan

> 
>  *
>  * @warning
>  * @b EXPERIMENTAL:
>  * All functions in this file may be changed or removed without prior notice.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (8 preceding siblings ...)
  2022-05-13  2:00 ` [PATCH v5 " xuan.ding
@ 2022-05-13  2:50 ` xuan.ding
  2022-05-13  2:50   ` [PATCH v6 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (4 more replies)
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
  11 siblings, 5 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v5->v6:
* adjust EXPERIMENTAL header

v4->v5:
* rebase to latest DPDK
* add some checks

v3->v4:
* fix CI build warnings
* adjust some indentation
* pass vq instead of queue_id

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 284 ++++++++++-----
 examples/vhost/main.h                  |  32 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  37 ++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 473 ++++++++++++++++++++++---
 10 files changed, 714 insertions(+), 152 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
@ 2022-05-13  2:50   ` xuan.ding
  2022-05-13  2:50   ` [PATCH v6 2/5] vhost: prepare async " xuan.ding
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..d4c94d2a9b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
  2022-05-13  2:50   ` [PATCH v6 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-13  2:50   ` xuan.ding
  2022-05-13  2:50   ` [PATCH v6 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index d4c94d2a9b..a9e2dcd9ce 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,15 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst, (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1173,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
  2022-05-13  2:50   ` [PATCH v6 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-05-13  2:50   ` [PATCH v6 2/5] vhost: prepare async " xuan.ding
@ 2022-05-13  2:50   ` xuan.ding
  2022-05-13  2:50   ` [PATCH v6 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-05-13  2:50   ` [PATCH v6 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..00744b234f 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -180,6 +180,7 @@ struct async_inflight_info {
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
+	struct virtio_net_hdr nethdr;
 };
 
 struct vhost_async {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a9e2dcd9ce..5904839d5c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2487,10 +2487,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2501,6 +2501,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2538,6 +2540,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2553,12 +2556,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2607,11 +2623,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2743,8 +2768,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				   mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2755,6 +2780,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2936,8 +2962,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+			   mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 4/5] vhost: support async dequeue for split ring
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-05-13  2:50   ` [PATCH v6 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-13  2:50   ` xuan.ding
  2022-05-13  2:50   ` [PATCH v6 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 lib/vhost/rte_vhost_async.h            |  37 +++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
 5 files changed, 387 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index f287b76ebf..09c1c24b48 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -282,6 +282,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88b1e478d4..564d88623e 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -70,6 +70,11 @@ New Features
   Added an API which can get the number of inflight packets in
   vhost async data path without using lock.
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to
+  accelerate receiving pkts from guest.
+
 Removed Items
 -------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 70234debf9..2789492e38 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -204,6 +204,43 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  >= 0: The amount of in-flight packets
+ *  -1: Meaningless, indicates failed lock acquisition or invalid queue_id/dma_id
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 5841315386..8c7211bf0d 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -90,7 +90,7 @@ EXPERIMENTAL {
 
 	# added in 22.07
 	rte_vhost_async_get_inflight_thread_unsafe;
-
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5904839d5c..8290514e65 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3171,3 +3171,340 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
+			dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer data.\n",
+				dev->ifname, __func__);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, vq, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	dev = get_device(vid);
+	if (!dev || !nr_inflight)
+		return 0;
+
+	*nr_inflight = -1;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: built-in vhost net backend is disabled.\n",
+				dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid virtqueue idx %d.\n",
+				dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(dma_id >= RTE_DMADEV_DEFAULT_MAX)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid dma id %u.\n",
+				dev->ifname, __func__, dma_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+				dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+				dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, mbuf_pool, pkts,
+							 count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, mbuf_pool, pkts,
+							    count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 5/5] examples/vhost: support async dequeue data path
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-05-13  2:50   ` [PATCH v6 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-13  2:50   ` xuan.ding
  2022-05-13  3:27     ` Xia, Chenbo
  4 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-13  2:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 284 ++++++++++++++++++++---------
 examples/vhost/main.h              |  32 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 243 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c4d46de1c5..d070391727 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static uint32_t get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static void init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +241,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +279,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +289,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +366,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,7 +1019,7 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
@@ -1029,22 +1058,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1053,7 +1067,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1325,6 +1339,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1355,25 +1395,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1382,10 +1405,31 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1393,13 +1437,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1478,6 +1517,26 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	int pkts_inflight;
+
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid, queue_id);
+
+	struct rte_mbuf *m_cpl[pkts_inflight];
+
+	while (pkts_inflight) {
+		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid, queue_id, m_cpl,
+							pkts_inflight, dma_id, 0);
+		free_pkts(m_cpl, n_pkt);
+		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
+									queue_id);
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1535,27 +1594,79 @@ destroy_device(int vid)
 		vdev->vid);
 
 	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int pkts_inflight;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, VIRTIO_RXQ);
-		struct rte_mbuf *m_cpl[pkts_inflight];
-
-		while (pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-										VIRTIO_RXQ);
-		}
-
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
 		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
 	}
 
+	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
+	}
+
 	rte_free(vdev);
 }
 
+static int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1567,6 +1678,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1589,6 +1702,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1620,16 +1744,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1647,22 +1762,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int pkts_inflight;
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[pkts_inflight];
-
-			while (pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-											queue_id);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1887,7 +1989,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7f395c3c9..2fcb8376c5 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,6 +100,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -97,7 +111,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v6 5/5] examples/vhost: support async dequeue data path
  2022-05-13  2:50   ` [PATCH v6 5/5] examples/vhost: support async dequeue data path xuan.ding
@ 2022-05-13  3:27     ` Xia, Chenbo
  2022-05-13  3:51       ` Ding, Xuan
  0 siblings, 1 reply; 73+ messages in thread
From: Xia, Chenbo @ 2022-05-13  3:27 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ma, WenwuX,
	Wang, YuanX

> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Friday, May 13, 2022 10:51 AM
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> Subject: [PATCH v6 5/5] examples/vhost: support async dequeue data path
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch adds the use case for async dequeue API. Vswitch can
> leverage DMA device to accelerate vhost async dequeue path.
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/sample_app_ug/vhost.rst |   9 +-
>  examples/vhost/main.c              | 284 ++++++++++++++++++++---------
>  examples/vhost/main.h              |  32 +++-
>  examples/vhost/virtio_net.c        |  16 +-
>  4 files changed, 243 insertions(+), 98 deletions(-)
> 
> diff --git a/doc/guides/sample_app_ug/vhost.rst
> b/doc/guides/sample_app_ug/vhost.rst
> index a6ce4bc8ac..09db965e70 100644
> --- a/doc/guides/sample_app_ug/vhost.rst
> +++ b/doc/guides/sample_app_ug/vhost.rst
> @@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's
> used in combination with dmas
>  **--dmas**
>  This parameter is used to specify the assigned DMA device of a vhost
> device.
>  Async vhost-user net driver will be used if --dmas is set. For example
> ---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for
> vhost
> -device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
> -enqueue operation.
> +--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
> +DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
> +and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
> +operation. The index of the device corresponds to the socket file in
> order,
> +that means vhost device 0 is created through the first socket file, vhost
> +device 1 is created through the second socket file, and so on.
> 
>  Common Issues
>  -------------
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index c4d46de1c5..d070391727 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -63,6 +63,9 @@
> 
>  #define DMA_RING_SIZE 4096
> 
> +#define ASYNC_ENQUEUE_VHOST 1
> +#define ASYNC_DEQUEUE_VHOST 2
> +
>  /* number of mbufs in all pools - if specified on command-line. */
>  static int total_num_mbufs = NUM_MBUFS_DEFAULT;
> 
> @@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>  static char *socket_files;
>  static int nb_sockets;
> 
> +static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
> +
>  /* empty VMDq configuration structure. Filled in programmatically */
>  static struct rte_eth_conf vmdq_conf_default = {
>  	.rxmode = {
> @@ -205,6 +210,18 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE *
> RTE_MAX_VHOST_DEVICE];
>  #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
>  				 / US_PER_S * BURST_TX_DRAIN_US)
> 
> +static int vid2socketid[RTE_MAX_VHOST_DEVICE];
> +
> +static uint32_t get_async_flag_by_socketid(int socketid)
> +{
> +	return dma_bind[socketid].async_flag;
> +}
> +
> +static void init_vid2socketid_array(int vid, int socketid)
> +{
> +	vid2socketid[vid] = socketid;
> +}

Return value and func name should be on separate lines as per coding style.
And above func can be inline, same suggestion for short func below, especially
ones in data path.

Thanks,
Chenbo

> +
>  static inline bool
>  is_dma_configured(int16_t dev_id)
>  {
> @@ -224,7 +241,7 @@ open_dma(const char *value)
>  	char *addrs = input;
>  	char *ptrs[2];
>  	char *start, *end, *substr;
> -	int64_t vid;
> +	int64_t socketid, vring_id;
> 
>  	struct rte_dma_info info;
>  	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
> @@ -262,7 +279,9 @@ open_dma(const char *value)
> 
>  	while (i < args_nr) {
>  		char *arg_temp = dma_arg[i];
> +		char *txd, *rxd;
>  		uint8_t sub_nr;
> +		int async_flag;
> 
>  		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2,
> '@');
>  		if (sub_nr != 2) {
> @@ -270,14 +289,23 @@ open_dma(const char *value)
>  			goto out;
>  		}
> 
> -		start = strstr(ptrs[0], "txd");
> -		if (start == NULL) {
> +		txd = strstr(ptrs[0], "txd");
> +		rxd = strstr(ptrs[0], "rxd");
> +		if (txd) {
> +			start = txd;
> +			vring_id = VIRTIO_RXQ;
> +			async_flag = ASYNC_ENQUEUE_VHOST;
> +		} else if (rxd) {
> +			start = rxd;
> +			vring_id = VIRTIO_TXQ;
> +			async_flag = ASYNC_DEQUEUE_VHOST;
> +		} else {
>  			ret = -1;
>  			goto out;
>  		}
> 
>  		start += 3;
> -		vid = strtol(start, &end, 0);
> +		socketid = strtol(start, &end, 0);
>  		if (end == start) {
>  			ret = -1;
>  			goto out;
> @@ -338,7 +366,8 @@ open_dma(const char *value)
>  		dmas_id[dma_count++] = dev_id;
> 
>  done:
> -		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
> +		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
> +		(dma_info + socketid)->async_flag |= async_flag;
>  		i++;
>  	}
>  out:
> @@ -990,7 +1019,7 @@ complete_async_pkts(struct vhost_dev *vdev)
>  {
>  	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
>  	uint16_t complete_count;
> -	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
> +	int16_t dma_id = dma_bind[vid2socketid[vdev-
> >vid]].dmas[VIRTIO_RXQ].dev_id;
> 
>  	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
>  					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
> @@ -1029,22 +1058,7 @@ drain_vhost(struct vhost_dev *vdev)
>  	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
>  	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
> 
> -	if (builtin_net_driver) {
> -		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
> -	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
> -		uint16_t enqueue_fail = 0;
> -		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
> -
> -		complete_async_pkts(vdev);
> -		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m,
> nr_xmit, dma_id, 0);
> -
> -		enqueue_fail = nr_xmit - ret;
> -		if (enqueue_fail)
> -			free_pkts(&m[ret], nr_xmit - ret);
> -	} else {
> -		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
> -						m, nr_xmit);
> -	}
> +	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ,
> m, nr_xmit);
> 
>  	if (enable_stats) {
>  		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
> @@ -1053,7 +1067,7 @@ drain_vhost(struct vhost_dev *vdev)
>  				__ATOMIC_SEQ_CST);
>  	}
> 
> -	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
> +	if (!dma_bind[vid2socketid[vdev-
> >vid]].dmas[VIRTIO_RXQ].async_enabled)
>  		free_pkts(m, nr_xmit);
>  }
> 
> @@ -1325,6 +1339,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
>  	}
>  }
> 
> +uint16_t
> +async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t rx_count)
> +{
> +	uint16_t enqueue_count;
> +	uint16_t enqueue_fail = 0;
> +	uint16_t dma_id = dma_bind[vid2socketid[dev-
> >vid]].dmas[VIRTIO_RXQ].dev_id;
> +
> +	complete_async_pkts(dev);
> +	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
> +					pkts, rx_count, dma_id, 0);
> +
> +	enqueue_fail = rx_count - enqueue_count;
> +	if (enqueue_fail)
> +		free_pkts(&pkts[enqueue_count], enqueue_fail);
> +
> +	return enqueue_count;
> +}
> +
> +uint16_t
> +sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t rx_count)
> +{
> +	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
> +}
> +
>  static __rte_always_inline void
>  drain_eth_rx(struct vhost_dev *vdev)
>  {
> @@ -1355,25 +1395,8 @@ drain_eth_rx(struct vhost_dev *vdev)
>  		}
>  	}
> 
> -	if (builtin_net_driver) {
> -		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
> -						pkts, rx_count);
> -	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
> -		uint16_t enqueue_fail = 0;
> -		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
> -
> -		complete_async_pkts(vdev);
> -		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
> -					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
> -
> -		enqueue_fail = rx_count - enqueue_count;
> -		if (enqueue_fail)
> -			free_pkts(&pkts[enqueue_count], enqueue_fail);
> -
> -	} else {
> -		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
> -						pkts, rx_count);
> -	}
> +	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
> +					VIRTIO_RXQ, pkts, rx_count);
> 
>  	if (enable_stats) {
>  		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
> @@ -1382,10 +1405,31 @@ drain_eth_rx(struct vhost_dev *vdev)
>  				__ATOMIC_SEQ_CST);
>  	}
> 
> -	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
> +	if (!dma_bind[vid2socketid[vdev-
> >vid]].dmas[VIRTIO_RXQ].async_enabled)
>  		free_pkts(pkts, rx_count);
>  }
> 
> +uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			    struct rte_mempool *mbuf_pool,
> +			    struct rte_mbuf **pkts, uint16_t count)
> +{
> +	int nr_inflight;
> +	uint16_t dequeue_count;
> +	uint16_t dma_id = dma_bind[vid2socketid[dev-
> >vid]].dmas[VIRTIO_TXQ].dev_id;
> +
> +	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
> +			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
> +
> +	return dequeue_count;
> +}
> +
> +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			   struct rte_mempool *mbuf_pool,
> +			   struct rte_mbuf **pkts, uint16_t count)
> +{
> +	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts,
> count);
> +}
> +
>  static __rte_always_inline void
>  drain_virtio_tx(struct vhost_dev *vdev)
>  {
> @@ -1393,13 +1437,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
>  	uint16_t count;
>  	uint16_t i;
> 
> -	if (builtin_net_driver) {
> -		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
> -					pkts, MAX_PKT_BURST);
> -	} else {
> -		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
> -					mbuf_pool, pkts, MAX_PKT_BURST);
> -	}
> +	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
> +				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
> 
>  	/* setup VMDq for the first packet */
>  	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
> @@ -1478,6 +1517,26 @@ switch_worker(void *arg __rte_unused)
>  	return 0;
>  }
> 
> +static void
> +vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
> +{
> +	uint16_t n_pkt = 0;
> +	int pkts_inflight;
> +
> +	int16_t dma_id = dma_bind[vid2socketid[vdev-
> >vid]].dmas[queue_id].dev_id;
> +	pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
> queue_id);
> +
> +	struct rte_mbuf *m_cpl[pkts_inflight];
> +
> +	while (pkts_inflight) {
> +		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
> queue_id, m_cpl,
> +							pkts_inflight, dma_id, 0);
> +		free_pkts(m_cpl, n_pkt);
> +		pkts_inflight =
> rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
> +									queue_id);
> +	}
> +}
> +
>  /*
>   * Remove a device from the specific data core linked list and from the
>   * main linked list. Synchronization  occurs through the use of the
> @@ -1535,27 +1594,79 @@ destroy_device(int vid)
>  		vdev->vid);
> 
>  	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
> -		uint16_t n_pkt = 0;
> -		int pkts_inflight;
> -		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
> -		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
> VIRTIO_RXQ);
> -		struct rte_mbuf *m_cpl[pkts_inflight];
> -
> -		while (pkts_inflight) {
> -			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid,
> VIRTIO_RXQ,
> -						m_cpl, pkts_inflight, dma_id, 0);
> -			free_pkts(m_cpl, n_pkt);
> -			pkts_inflight =
> rte_vhost_async_get_inflight_thread_unsafe(vid,
> -										VIRTIO_RXQ);
> -		}
> -
> +		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
>  		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
>  		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
>  	}
> 
> +	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
> +		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
> +		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
> +		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
> +	}
> +
>  	rte_free(vdev);
>  }
> 
> +static int
> +get_socketid_by_vid(int vid)
> +{
> +	int i;
> +	char ifname[PATH_MAX];
> +	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
> +
> +	for (i = 0; i < nb_sockets; i++) {
> +		char *file = socket_files + i * PATH_MAX;
> +		if (strcmp(file, ifname) == 0)
> +			return i;
> +	}
> +
> +	return -1;
> +}
> +
> +static int
> +init_vhost_queue_ops(int vid)
> +{
> +	if (builtin_net_driver) {
> +		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
> +		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
> +	} else {
> +		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
> +			vdev_queue_ops[vid].enqueue_pkt_burst =
> async_enqueue_pkts;
> +		else
> +			vdev_queue_ops[vid].enqueue_pkt_burst =
> sync_enqueue_pkts;
> +
> +		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
> +			vdev_queue_ops[vid].dequeue_pkt_burst =
> async_dequeue_pkts;
> +		else
> +			vdev_queue_ops[vid].dequeue_pkt_burst =
> sync_dequeue_pkts;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +vhost_async_channel_register(int vid)
> +{
> +	int rx_ret = 0, tx_ret = 0;
> +
> +	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id !=
> INVALID_DMA_ID) {
> +		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
> +		if (rx_ret == 0)
> +
> 	dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
> +	}
> +
> +	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id !=
> INVALID_DMA_ID) {
> +		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
> +		if (tx_ret == 0)
> +
> 	dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
> +	}
> +
> +	return rx_ret | tx_ret;
> +}
> +
> +
> +
>  /*
>   * A new device is added to a data core. First the device is added to the
> main linked list
>   * and then allocated to a specific data core.
> @@ -1567,6 +1678,8 @@ new_device(int vid)
>  	uint16_t i;
>  	uint32_t device_num_min = num_devices;
>  	struct vhost_dev *vdev;
> +	int ret;
> +
>  	vdev = rte_zmalloc("vhost device", sizeof(*vdev),
> RTE_CACHE_LINE_SIZE);
>  	if (vdev == NULL) {
>  		RTE_LOG(INFO, VHOST_DATA,
> @@ -1589,6 +1702,17 @@ new_device(int vid)
>  		}
>  	}
> 
> +	int socketid = get_socketid_by_vid(vid);
> +	if (socketid == -1)
> +		return -1;
> +
> +	init_vid2socketid_array(vid, socketid);
> +
> +	ret =  vhost_async_channel_register(vid);
> +
> +	if (init_vhost_queue_ops(vid) != 0)
> +		return -1;
> +
>  	if (builtin_net_driver)
>  		vs_vhost_net_setup(vdev);
> 
> @@ -1620,16 +1744,7 @@ new_device(int vid)
>  		"(%d) device has been added to data core %d\n",
>  		vid, vdev->coreid);
> 
> -	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
> -		int ret;
> -
> -		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
> -		if (ret == 0)
> -			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
> -		return ret;
> -	}
> -
> -	return 0;
> +	return ret;
>  }
> 
>  static int
> @@ -1647,22 +1762,9 @@ vring_state_changed(int vid, uint16_t queue_id, int
> enable)
>  	if (queue_id != VIRTIO_RXQ)
>  		return 0;
> 
> -	if (dma_bind[vid].dmas[queue_id].async_enabled) {
> -		if (!enable) {
> -			uint16_t n_pkt = 0;
> -			int pkts_inflight;
> -			pkts_inflight =
> rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
> -			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
> -			struct rte_mbuf *m_cpl[pkts_inflight];
> -
> -			while (pkts_inflight) {
> -				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid,
> queue_id,
> -							m_cpl, pkts_inflight, dma_id, 0);
> -				free_pkts(m_cpl, n_pkt);
> -				pkts_inflight =
> rte_vhost_async_get_inflight_thread_unsafe(vid,
> -
> 	queue_id);
> -			}
> -		}
> +	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
> +		if (!enable)
> +			vhost_clear_queue_thread_unsafe(vdev, queue_id);
>  	}
> 
>  	return 0;
> @@ -1887,7 +1989,7 @@ main(int argc, char *argv[])
>  	for (i = 0; i < nb_sockets; i++) {
>  		char *file = socket_files + i * PATH_MAX;
> 
> -		if (dma_count)
> +		if (dma_count && get_async_flag_by_socketid(i) != 0)
>  			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
> 
>  		ret = rte_vhost_driver_register(file, flags);
> diff --git a/examples/vhost/main.h b/examples/vhost/main.h
> index e7f395c3c9..2fcb8376c5 100644
> --- a/examples/vhost/main.h
> +++ b/examples/vhost/main.h
> @@ -61,6 +61,19 @@ struct vhost_dev {
>  	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
>  } __rte_cache_aligned;
> 
> +typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
> +			uint16_t queue_id, struct rte_mbuf **pkts,
> +			uint32_t count);
> +
> +typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
> +			uint16_t queue_id, struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +
> +struct vhost_queue_ops {
> +	vhost_enqueue_burst_t enqueue_pkt_burst;
> +	vhost_dequeue_burst_t dequeue_pkt_burst;
> +};
> +
>  TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
> 
> 
> @@ -87,6 +100,7 @@ struct dma_info {
> 
>  struct dma_for_vhost {
>  	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
> +	uint32_t async_flag;
>  };
> 
>  /* we implement non-extra virtio net features */
> @@ -97,7 +111,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
>  uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  			 struct rte_mbuf **pkts, uint32_t count);
> 
> -uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> -			 struct rte_mempool *mbuf_pool,
> -			 struct rte_mbuf **pkts, uint16_t count);
> +uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mbuf **pkts, uint32_t count);
> +uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
> +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
> +uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			 struct rte_mbuf **pkts, uint32_t count);
> +uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +			struct rte_mempool *mbuf_pool,
> +			struct rte_mbuf **pkts, uint16_t count);
>  #endif /* _MAIN_H_ */
> diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
> index 9064fc3a82..2432a96566 100644
> --- a/examples/vhost/virtio_net.c
> +++ b/examples/vhost/virtio_net.c
> @@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t
> queue_id,
>  	return count;
>  }
> 
> +uint16_t
> +builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +		struct rte_mbuf **pkts, uint32_t count)
> +{
> +	return vs_enqueue_pkts(dev, queue_id, pkts, count);
> +}
> +
>  static __rte_always_inline int
>  dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
>  	    struct rte_mbuf *m, uint16_t desc_idx,
> @@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct
> rte_vhost_vring *vr,
>  	return 0;
>  }
> 
> -uint16_t
> +static uint16_t
>  vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
>  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
>  {
> @@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t
> queue_id,
> 
>  	return i;
>  }
> +
> +uint16_t
> +builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> count)
> +{
> +	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
> +}
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v6 5/5] examples/vhost: support async dequeue data path
  2022-05-13  3:27     ` Xia, Chenbo
@ 2022-05-13  3:51       ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-05-13  3:51 UTC (permalink / raw)
  To: Xia, Chenbo, maxime.coquelin
  Cc: dev, Hu, Jiayu, Jiang, Cheng1, Pai G, Sunil, liangma, Ma, WenwuX,
	Wang, YuanX



> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Friday, May 13, 2022 11:27 AM
> To: Ding, Xuan <xuan.ding@intel.com>; maxime.coquelin@redhat.com
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ma, WenwuX <wenwux.ma@intel.com>; Wang,
> YuanX <yuanx.wang@intel.com>
> Subject: RE: [PATCH v6 5/5] examples/vhost: support async dequeue data
> path
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Friday, May 13, 2022 10:51 AM
> > To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> > <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> > liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Ma, WenwuX
> > <wenwux.ma@intel.com>; Wang, YuanX <yuanx.wang@intel.com>
> > Subject: [PATCH v6 5/5] examples/vhost: support async dequeue data
> > path
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > This patch adds the use case for async dequeue API. Vswitch can
> > leverage DMA device to accelerate vhost async dequeue path.
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> >  doc/guides/sample_app_ug/vhost.rst |   9 +-
> >  examples/vhost/main.c              | 284 ++++++++++++++++++++---------
> >  examples/vhost/main.h              |  32 +++-
> >  examples/vhost/virtio_net.c        |  16 +-
> >  4 files changed, 243 insertions(+), 98 deletions(-)
> >
> > diff --git a/doc/guides/sample_app_ug/vhost.rst
> > b/doc/guides/sample_app_ug/vhost.rst
> > index a6ce4bc8ac..09db965e70 100644
> > --- a/doc/guides/sample_app_ug/vhost.rst
> > +++ b/doc/guides/sample_app_ug/vhost.rst
> > @@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs.
> > It's used in combination with dmas
> >  **--dmas**
> >  This parameter is used to specify the assigned DMA device of a vhost
> > device.
> >  Async vhost-user net driver will be used if --dmas is set. For
> > example ---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel
> > 00:04.0 for vhost -device 0 enqueue operation and use DMA channel
> > 00:04.1 for vhost device 1 -enqueue operation.
> > +--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means
> > +use DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue
> > +operation and use DMA channel 00:04.1/00:04.3 for vhost device 1
> > +enqueue/dequeue operation. The index of the device corresponds to the
> > +socket file in
> > order,
> > +that means vhost device 0 is created through the first socket file,
> > +vhost device 1 is created through the second socket file, and so on.
> >
> >  Common Issues
> >  -------------
> > diff --git a/examples/vhost/main.c b/examples/vhost/main.c index
> > c4d46de1c5..d070391727 100644
> > --- a/examples/vhost/main.c
> > +++ b/examples/vhost/main.c
> > @@ -63,6 +63,9 @@
> >
> >  #define DMA_RING_SIZE 4096
> >
> > +#define ASYNC_ENQUEUE_VHOST 1
> > +#define ASYNC_DEQUEUE_VHOST 2
> > +
> >  /* number of mbufs in all pools - if specified on command-line. */
> > static int total_num_mbufs = NUM_MBUFS_DEFAULT;
> >
> > @@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num =
> > BURST_RX_RETRIES;  static char *socket_files;  static int nb_sockets;
> >
> > +static struct vhost_queue_ops
> vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
> > +
> >  /* empty VMDq configuration structure. Filled in programmatically */
> > static struct rte_eth_conf vmdq_conf_default = {
> >  	.rxmode = {
> > @@ -205,6 +210,18 @@ struct vhost_bufftable
> > *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
> >  #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
> >  				 / US_PER_S * BURST_TX_DRAIN_US)
> >
> > +static int vid2socketid[RTE_MAX_VHOST_DEVICE];
> > +
> > +static uint32_t get_async_flag_by_socketid(int socketid) {
> > +	return dma_bind[socketid].async_flag; }
> > +
> > +static void init_vid2socketid_array(int vid, int socketid) {
> > +	vid2socketid[vid] = socketid;
> > +}
> 
> Return value and func name should be on separate lines as per coding style.
> And above func can be inline, same suggestion for short func below,
> especially ones in data path.

Thanks Chenbo, will fix it in next version.

Regards,
Xuan

> 
> Thanks,
> Chenbo
> 
> > +
> >  static inline bool
> >  is_dma_configured(int16_t dev_id)
> >  {
> > @@ -224,7 +241,7 @@ open_dma(const char *value)
> >  	char *addrs = input;
> >  	char *ptrs[2];
> >  	char *start, *end, *substr;
> > -	int64_t vid;
> > +	int64_t socketid, vring_id;
> >
> >  	struct rte_dma_info info;
> >  	struct rte_dma_conf dev_config = { .nb_vchans = 1 }; @@ -262,7
> > +279,9 @@ open_dma(const char *value)
> >
> >  	while (i < args_nr) {
> >  		char *arg_temp = dma_arg[i];
> > +		char *txd, *rxd;
> >  		uint8_t sub_nr;
> > +		int async_flag;
> >
> >  		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
> >  		if (sub_nr != 2) {
> > @@ -270,14 +289,23 @@ open_dma(const char *value)
> >  			goto out;
> >  		}
> >
> > -		start = strstr(ptrs[0], "txd");
> > -		if (start == NULL) {
> > +		txd = strstr(ptrs[0], "txd");
> > +		rxd = strstr(ptrs[0], "rxd");
> > +		if (txd) {
> > +			start = txd;
> > +			vring_id = VIRTIO_RXQ;
> > +			async_flag = ASYNC_ENQUEUE_VHOST;
> > +		} else if (rxd) {
> > +			start = rxd;
> > +			vring_id = VIRTIO_TXQ;
> > +			async_flag = ASYNC_DEQUEUE_VHOST;
> > +		} else {
> >  			ret = -1;
> >  			goto out;
> >  		}
> >
> >  		start += 3;
> > -		vid = strtol(start, &end, 0);
> > +		socketid = strtol(start, &end, 0);
> >  		if (end == start) {
> >  			ret = -1;
> >  			goto out;
> > @@ -338,7 +366,8 @@ open_dma(const char *value)
> >  		dmas_id[dma_count++] = dev_id;
> >
> >  done:
> > -		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
> > +		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
> > +		(dma_info + socketid)->async_flag |= async_flag;
> >  		i++;
> >  	}
> >  out:
> > @@ -990,7 +1019,7 @@ complete_async_pkts(struct vhost_dev *vdev)  {
> >  	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
> >  	uint16_t complete_count;
> > -	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
> > +	int16_t dma_id = dma_bind[vid2socketid[vdev-
> > >vid]].dmas[VIRTIO_RXQ].dev_id;
> >
> >  	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
> >  					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST,
> dma_id, 0); @@ -1029,22
> > +1058,7 @@ drain_vhost(struct vhost_dev *vdev)
> >  	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
> >  	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
> >
> > -	if (builtin_net_driver) {
> > -		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
> > -	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
> > -		uint16_t enqueue_fail = 0;
> > -		int16_t dma_id = dma_bind[vdev-
> >vid].dmas[VIRTIO_RXQ].dev_id;
> > -
> > -		complete_async_pkts(vdev);
> > -		ret = rte_vhost_submit_enqueue_burst(vdev->vid,
> VIRTIO_RXQ, m,
> > nr_xmit, dma_id, 0);
> > -
> > -		enqueue_fail = nr_xmit - ret;
> > -		if (enqueue_fail)
> > -			free_pkts(&m[ret], nr_xmit - ret);
> > -	} else {
> > -		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
> > -						m, nr_xmit);
> > -	}
> > +	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
> VIRTIO_RXQ,
> > m, nr_xmit);
> >
> >  	if (enable_stats) {
> >  		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
> @@
> > -1053,7 +1067,7 @@ drain_vhost(struct vhost_dev *vdev)
> >  				__ATOMIC_SEQ_CST);
> >  	}
> >
> > -	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
> > +	if (!dma_bind[vid2socketid[vdev-
> > >vid]].dmas[VIRTIO_RXQ].async_enabled)
> >  		free_pkts(m, nr_xmit);
> >  }
> >
> > @@ -1325,6 +1339,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
> >  	}
> >  }
> >
> > +uint16_t
> > +async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +		struct rte_mbuf **pkts, uint32_t rx_count) {
> > +	uint16_t enqueue_count;
> > +	uint16_t enqueue_fail = 0;
> > +	uint16_t dma_id = dma_bind[vid2socketid[dev-
> > >vid]].dmas[VIRTIO_RXQ].dev_id;
> > +
> > +	complete_async_pkts(dev);
> > +	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid,
> queue_id,
> > +					pkts, rx_count, dma_id, 0);
> > +
> > +	enqueue_fail = rx_count - enqueue_count;
> > +	if (enqueue_fail)
> > +		free_pkts(&pkts[enqueue_count], enqueue_fail);
> > +
> > +	return enqueue_count;
> > +}
> > +
> > +uint16_t
> > +sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +		struct rte_mbuf **pkts, uint32_t rx_count) {
> > +	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
> > +}
> > +
> >  static __rte_always_inline void
> >  drain_eth_rx(struct vhost_dev *vdev)
> >  {
> > @@ -1355,25 +1395,8 @@ drain_eth_rx(struct vhost_dev *vdev)
> >  		}
> >  	}
> >
> > -	if (builtin_net_driver) {
> > -		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
> > -						pkts, rx_count);
> > -	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
> > -		uint16_t enqueue_fail = 0;
> > -		int16_t dma_id = dma_bind[vdev-
> >vid].dmas[VIRTIO_RXQ].dev_id;
> > -
> > -		complete_async_pkts(vdev);
> > -		enqueue_count = rte_vhost_submit_enqueue_burst(vdev-
> >vid,
> > -					VIRTIO_RXQ, pkts, rx_count, dma_id,
> 0);
> > -
> > -		enqueue_fail = rx_count - enqueue_count;
> > -		if (enqueue_fail)
> > -			free_pkts(&pkts[enqueue_count], enqueue_fail);
> > -
> > -	} else {
> > -		enqueue_count = rte_vhost_enqueue_burst(vdev->vid,
> VIRTIO_RXQ,
> > -						pkts, rx_count);
> > -	}
> > +	enqueue_count = vdev_queue_ops[vdev-
> >vid].enqueue_pkt_burst(vdev,
> > +					VIRTIO_RXQ, pkts, rx_count);
> >
> >  	if (enable_stats) {
> >  		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
> @@
> > -1382,10 +1405,31 @@ drain_eth_rx(struct vhost_dev *vdev)
> >  				__ATOMIC_SEQ_CST);
> >  	}
> >
> > -	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
> > +	if (!dma_bind[vid2socketid[vdev-
> > >vid]].dmas[VIRTIO_RXQ].async_enabled)
> >  		free_pkts(pkts, rx_count);
> >  }
> >
> > +uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			    struct rte_mempool *mbuf_pool,
> > +			    struct rte_mbuf **pkts, uint16_t count) {
> > +	int nr_inflight;
> > +	uint16_t dequeue_count;
> > +	uint16_t dma_id = dma_bind[vid2socketid[dev-
> > >vid]].dmas[VIRTIO_TXQ].dev_id;
> > +
> > +	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid,
> queue_id,
> > +			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
> > +
> > +	return dequeue_count;
> > +}
> > +
> > +uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			   struct rte_mempool *mbuf_pool,
> > +			   struct rte_mbuf **pkts, uint16_t count) {
> > +	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool,
> pkts,
> > count);
> > +}
> > +
> >  static __rte_always_inline void
> >  drain_virtio_tx(struct vhost_dev *vdev)  { @@ -1393,13 +1437,8 @@
> > drain_virtio_tx(struct vhost_dev *vdev)
> >  	uint16_t count;
> >  	uint16_t i;
> >
> > -	if (builtin_net_driver) {
> > -		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
> > -					pkts, MAX_PKT_BURST);
> > -	} else {
> > -		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
> > -					mbuf_pool, pkts, MAX_PKT_BURST);
> > -	}
> > +	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
> > +				VIRTIO_TXQ, mbuf_pool, pkts,
> MAX_PKT_BURST);
> >
> >  	/* setup VMDq for the first packet */
> >  	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count)
> { @@
> > -1478,6 +1517,26 @@ switch_worker(void *arg __rte_unused)
> >  	return 0;
> >  }
> >
> > +static void
> > +vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t
> > +queue_id) {
> > +	uint16_t n_pkt = 0;
> > +	int pkts_inflight;
> > +
> > +	int16_t dma_id = dma_bind[vid2socketid[vdev-
> > >vid]].dmas[queue_id].dev_id;
> > +	pkts_inflight =
> > +rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
> > queue_id);
> > +
> > +	struct rte_mbuf *m_cpl[pkts_inflight];
> > +
> > +	while (pkts_inflight) {
> > +		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid,
> > queue_id, m_cpl,
> > +							pkts_inflight, dma_id,
> 0);
> > +		free_pkts(m_cpl, n_pkt);
> > +		pkts_inflight =
> > rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
> > +
> 	queue_id);
> > +	}
> > +}
> > +
> >  /*
> >   * Remove a device from the specific data core linked list and from the
> >   * main linked list. Synchronization  occurs through the use of the
> > @@ -1535,27 +1594,79 @@ destroy_device(int vid)
> >  		vdev->vid);
> >
> >  	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
> > -		uint16_t n_pkt = 0;
> > -		int pkts_inflight;
> > -		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
> > -		pkts_inflight =
> rte_vhost_async_get_inflight_thread_unsafe(vid,
> > VIRTIO_RXQ);
> > -		struct rte_mbuf *m_cpl[pkts_inflight];
> > -
> > -		while (pkts_inflight) {
> > -			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid,
> > VIRTIO_RXQ,
> > -						m_cpl, pkts_inflight, dma_id,
> 0);
> > -			free_pkts(m_cpl, n_pkt);
> > -			pkts_inflight =
> > rte_vhost_async_get_inflight_thread_unsafe(vid,
> > -
> 	VIRTIO_RXQ);
> > -		}
> > -
> > +		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
> >  		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
> >  		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
> >  	}
> >
> > +	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
> > +		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
> > +		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
> > +		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
> > +	}
> > +
> >  	rte_free(vdev);
> >  }
> >
> > +static int
> > +get_socketid_by_vid(int vid)
> > +{
> > +	int i;
> > +	char ifname[PATH_MAX];
> > +	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
> > +
> > +	for (i = 0; i < nb_sockets; i++) {
> > +		char *file = socket_files + i * PATH_MAX;
> > +		if (strcmp(file, ifname) == 0)
> > +			return i;
> > +	}
> > +
> > +	return -1;
> > +}
> > +
> > +static int
> > +init_vhost_queue_ops(int vid)
> > +{
> > +	if (builtin_net_driver) {
> > +		vdev_queue_ops[vid].enqueue_pkt_burst =
> builtin_enqueue_pkts;
> > +		vdev_queue_ops[vid].dequeue_pkt_burst =
> builtin_dequeue_pkts;
> > +	} else {
> > +		if
> (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
> > +			vdev_queue_ops[vid].enqueue_pkt_burst =
> > async_enqueue_pkts;
> > +		else
> > +			vdev_queue_ops[vid].enqueue_pkt_burst =
> > sync_enqueue_pkts;
> > +
> > +		if
> (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
> > +			vdev_queue_ops[vid].dequeue_pkt_burst =
> > async_dequeue_pkts;
> > +		else
> > +			vdev_queue_ops[vid].dequeue_pkt_burst =
> > sync_dequeue_pkts;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +vhost_async_channel_register(int vid) {
> > +	int rx_ret = 0, tx_ret = 0;
> > +
> > +	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id !=
> > INVALID_DMA_ID) {
> > +		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
> > +		if (rx_ret == 0)
> > +
> > 	dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled =
> true;
> > +	}
> > +
> > +	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id !=
> > INVALID_DMA_ID) {
> > +		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
> > +		if (tx_ret == 0)
> > +
> > 	dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled =
> true;
> > +	}
> > +
> > +	return rx_ret | tx_ret;
> > +}
> > +
> > +
> > +
> >  /*
> >   * A new device is added to a data core. First the device is added to
> > the main linked list
> >   * and then allocated to a specific data core.
> > @@ -1567,6 +1678,8 @@ new_device(int vid)
> >  	uint16_t i;
> >  	uint32_t device_num_min = num_devices;
> >  	struct vhost_dev *vdev;
> > +	int ret;
> > +
> >  	vdev = rte_zmalloc("vhost device", sizeof(*vdev),
> > RTE_CACHE_LINE_SIZE);
> >  	if (vdev == NULL) {
> >  		RTE_LOG(INFO, VHOST_DATA,
> > @@ -1589,6 +1702,17 @@ new_device(int vid)
> >  		}
> >  	}
> >
> > +	int socketid = get_socketid_by_vid(vid);
> > +	if (socketid == -1)
> > +		return -1;
> > +
> > +	init_vid2socketid_array(vid, socketid);
> > +
> > +	ret =  vhost_async_channel_register(vid);
> > +
> > +	if (init_vhost_queue_ops(vid) != 0)
> > +		return -1;
> > +
> >  	if (builtin_net_driver)
> >  		vs_vhost_net_setup(vdev);
> >
> > @@ -1620,16 +1744,7 @@ new_device(int vid)
> >  		"(%d) device has been added to data core %d\n",
> >  		vid, vdev->coreid);
> >
> > -	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
> > -		int ret;
> > -
> > -		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
> > -		if (ret == 0)
> > -			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled =
> true;
> > -		return ret;
> > -	}
> > -
> > -	return 0;
> > +	return ret;
> >  }
> >
> >  static int
> > @@ -1647,22 +1762,9 @@ vring_state_changed(int vid, uint16_t queue_id,
> > int
> > enable)
> >  	if (queue_id != VIRTIO_RXQ)
> >  		return 0;
> >
> > -	if (dma_bind[vid].dmas[queue_id].async_enabled) {
> > -		if (!enable) {
> > -			uint16_t n_pkt = 0;
> > -			int pkts_inflight;
> > -			pkts_inflight =
> > rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
> > -			int16_t dma_id =
> dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
> > -			struct rte_mbuf *m_cpl[pkts_inflight];
> > -
> > -			while (pkts_inflight) {
> > -				n_pkt =
> rte_vhost_clear_queue_thread_unsafe(vid,
> > queue_id,
> > -							m_cpl, pkts_inflight,
> dma_id, 0);
> > -				free_pkts(m_cpl, n_pkt);
> > -				pkts_inflight =
> > rte_vhost_async_get_inflight_thread_unsafe(vid,
> > -
> > 	queue_id);
> > -			}
> > -		}
> > +	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
> > +		if (!enable)
> > +			vhost_clear_queue_thread_unsafe(vdev, queue_id);
> >  	}
> >
> >  	return 0;
> > @@ -1887,7 +1989,7 @@ main(int argc, char *argv[])
> >  	for (i = 0; i < nb_sockets; i++) {
> >  		char *file = socket_files + i * PATH_MAX;
> >
> > -		if (dma_count)
> > +		if (dma_count && get_async_flag_by_socketid(i) != 0)
> >  			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
> >
> >  		ret = rte_vhost_driver_register(file, flags); diff --git
> > a/examples/vhost/main.h b/examples/vhost/main.h index
> > e7f395c3c9..2fcb8376c5 100644
> > --- a/examples/vhost/main.h
> > +++ b/examples/vhost/main.h
> > @@ -61,6 +61,19 @@ struct vhost_dev {
> >  	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];  }
> > __rte_cache_aligned;
> >
> > +typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
> > +			uint16_t queue_id, struct rte_mbuf **pkts,
> > +			uint32_t count);
> > +
> > +typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
> > +			uint16_t queue_id, struct rte_mempool *mbuf_pool,
> > +			struct rte_mbuf **pkts, uint16_t count);
> > +
> > +struct vhost_queue_ops {
> > +	vhost_enqueue_burst_t enqueue_pkt_burst;
> > +	vhost_dequeue_burst_t dequeue_pkt_burst; };
> > +
> >  TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
> >
> >
> > @@ -87,6 +100,7 @@ struct dma_info {
> >
> >  struct dma_for_vhost {
> >  	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
> > +	uint32_t async_flag;
> >  };
> >
> >  /* we implement non-extra virtio net features */ @@ -97,7 +111,19 @@
> > void vs_vhost_net_remove(struct vhost_dev *dev);  uint16_t
> > vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> >  			 struct rte_mbuf **pkts, uint32_t count);
> >
> > -uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > -			 struct rte_mempool *mbuf_pool,
> > -			 struct rte_mbuf **pkts, uint16_t count);
> > +uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			struct rte_mbuf **pkts, uint32_t count); uint16_t
> > +builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			struct rte_mempool *mbuf_pool,
> > +			struct rte_mbuf **pkts, uint16_t count); uint16_t
> > +sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			 struct rte_mbuf **pkts, uint32_t count); uint16_t
> > +sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			struct rte_mempool *mbuf_pool,
> > +			struct rte_mbuf **pkts, uint16_t count); uint16_t
> > +async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			 struct rte_mbuf **pkts, uint32_t count); uint16_t
> > +async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +			struct rte_mempool *mbuf_pool,
> > +			struct rte_mbuf **pkts, uint16_t count);
> >  #endif /* _MAIN_H_ */
> > diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
> > index 9064fc3a82..2432a96566 100644
> > --- a/examples/vhost/virtio_net.c
> > +++ b/examples/vhost/virtio_net.c
> > @@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t
> > queue_id,
> >  	return count;
> >  }
> >
> > +uint16_t
> > +builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +		struct rte_mbuf **pkts, uint32_t count) {
> > +	return vs_enqueue_pkts(dev, queue_id, pkts, count); }
> > +
> >  static __rte_always_inline int
> >  dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
> >  	    struct rte_mbuf *m, uint16_t desc_idx, @@ -363,7 +370,7 @@
> > dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
> >  	return 0;
> >  }
> >
> > -uint16_t
> > +static uint16_t
> >  vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> >  	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> >  {
> > @@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t
> > queue_id,
> >
> >  	return i;
> >  }
> > +
> > +uint16_t
> > +builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
> > +	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > +{
> > +	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count); }
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (9 preceding siblings ...)
  2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
@ 2022-05-16  2:43 ` xuan.ding
  2022-05-16  2:43   ` [PATCH v7 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (4 more replies)
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
  11 siblings, 5 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v6->v7:
* correct code formatting
* change some functions to inline

v5->v6:
* adjust EXPERIMENTAL header

v4->v5:
* rebase to latest DPDK
* add some checks

v3->v4:
* fix CI build warnings
* adjust some indentation
* pass vq instead of queue_id

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 286 ++++++++++-----
 examples/vhost/main.h                  |  32 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  37 ++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 473 ++++++++++++++++++++++---
 10 files changed, 716 insertions(+), 152 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
@ 2022-05-16  2:43   ` xuan.ding
  2022-05-16  2:43   ` [PATCH v7 2/5] vhost: prepare async " xuan.ding
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..d4c94d2a9b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
  2022-05-16  2:43   ` [PATCH v7 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-16  2:43   ` xuan.ding
  2022-05-16  2:43   ` [PATCH v7 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index d4c94d2a9b..a9e2dcd9ce 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,15 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst, (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1173,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
  2022-05-16  2:43   ` [PATCH v7 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-05-16  2:43   ` [PATCH v7 2/5] vhost: prepare async " xuan.ding
@ 2022-05-16  2:43   ` xuan.ding
  2022-05-16  2:43   ` [PATCH v7 4/5] vhost: support async dequeue for split ring xuan.ding
  2022-05-16  2:43   ` [PATCH v7 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..00744b234f 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -180,6 +180,7 @@ struct async_inflight_info {
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
+	struct virtio_net_hdr nethdr;
 };
 
 struct vhost_async {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a9e2dcd9ce..5904839d5c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2487,10 +2487,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2501,6 +2501,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2538,6 +2540,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2553,12 +2556,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2607,11 +2623,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2743,8 +2768,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				   mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2755,6 +2780,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2936,8 +2962,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+			   mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 4/5] vhost: support async dequeue for split ring
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-05-16  2:43   ` [PATCH v7 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-16  2:43   ` xuan.ding
  2022-05-16  5:52     ` Hu, Jiayu
  2022-05-16  2:43   ` [PATCH v7 5/5] examples/vhost: support async dequeue data path xuan.ding
  4 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   7 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 lib/vhost/rte_vhost_async.h            |  37 +++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
 5 files changed, 387 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index f287b76ebf..09c1c24b48 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -282,6 +282,13 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
+
+  Receives (dequeues) ``count`` packets from guest to host in async data path,
+  and stored them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88b1e478d4..564d88623e 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -70,6 +70,11 @@ New Features
   Added an API which can get the number of inflight packets in
   vhost async data path without using lock.
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to
+  accelerate receiving pkts from guest.
+
 Removed Items
 -------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 70234debf9..2789492e38 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -204,6 +204,43 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * This function tries to receive packets from the guest with offloading
+ * copies to the async channel. The packets that are transfer completed
+ * are returned in "pkts". The other packets that their copies are submitted to
+ * the async channel but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the async channel.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  >= 0: The amount of in-flight packets
+ *  -1: Meaningless, indicates failed lock acquisition or invalid queue_id/dma_id
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 5841315386..8c7211bf0d 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -90,7 +90,7 @@ EXPERIMENTAL {
 
 	# added in 22.07
 	rte_vhost_async_get_inflight_thread_unsafe;
-
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5904839d5c..8290514e65 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3171,3 +3171,340 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
+			dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer data.\n",
+				dev->ifname, __func__);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, vq, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		uint16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	dev = get_device(vid);
+	if (!dev || !nr_inflight)
+		return 0;
+
+	*nr_inflight = -1;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: built-in vhost net backend is disabled.\n",
+				dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid virtqueue idx %d.\n",
+				dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(dma_id >= RTE_DMADEV_DEFAULT_MAX)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid dma id %u.\n",
+				dev->ifname, __func__, dma_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+				dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+				dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, mbuf_pool, pkts,
+							 count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, mbuf_pool, pkts,
+							    count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v7 5/5] examples/vhost: support async dequeue data path
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-05-16  2:43   ` [PATCH v7 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-16  2:43   ` xuan.ding
  4 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16  2:43 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 286 ++++++++++++++++++++---------
 examples/vhost/main.h              |  32 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 245 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c4d46de1c5..50e55de22c 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,20 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static inline uint32_t
+get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static inline void
+init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +243,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +281,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +291,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +368,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,7 +1021,7 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
@@ -1029,22 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1053,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1325,6 +1341,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1355,25 +1397,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1382,10 +1407,31 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1393,13 +1439,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1478,6 +1519,26 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	int pkts_inflight;
+
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid, queue_id);
+
+	struct rte_mbuf *m_cpl[pkts_inflight];
+
+	while (pkts_inflight) {
+		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid, queue_id, m_cpl,
+							pkts_inflight, dma_id, 0);
+		free_pkts(m_cpl, n_pkt);
+		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
+									queue_id);
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1535,27 +1596,79 @@ destroy_device(int vid)
 		vdev->vid);
 
 	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int pkts_inflight;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, VIRTIO_RXQ);
-		struct rte_mbuf *m_cpl[pkts_inflight];
-
-		while (pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-										VIRTIO_RXQ);
-		}
-
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
 		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
 	}
 
+	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
+	}
+
 	rte_free(vdev);
 }
 
+static inline int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static inline int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1567,6 +1680,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1589,6 +1704,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1620,16 +1746,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1647,22 +1764,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int pkts_inflight;
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[pkts_inflight];
-
-			while (pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-											queue_id);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1887,7 +1991,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7f395c3c9..2fcb8376c5 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,6 +100,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -97,7 +111,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v7 4/5] vhost: support async dequeue for split ring
  2022-05-16  2:43   ` [PATCH v7 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-16  5:52     ` Hu, Jiayu
  2022-05-16  6:10       ` Ding, Xuan
  0 siblings, 1 reply; 73+ messages in thread
From: Hu, Jiayu @ 2022-05-16  5:52 UTC (permalink / raw)
  To: Ding, Xuan, maxime.coquelin, Xia, Chenbo
  Cc: dev, Jiang, Cheng1, Pai G, Sunil, liangma, Wang, YuanX

Hi Xuan,

> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Monday, May 16, 2022 10:43 AM
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>
> Subject: [PATCH v7 4/5] vhost: support async dequeue for split ring
> 
> From: Xuan Ding <xuan.ding@intel.com>
> 
> This patch implements asynchronous dequeue data path for vhost split ring,
> a new API rte_vhost_async_try_dequeue_burst() is introduced.
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  doc/guides/prog_guide/vhost_lib.rst    |   7 +
>  doc/guides/rel_notes/release_22_07.rst |   5 +
>  lib/vhost/rte_vhost_async.h            |  37 +++
>  lib/vhost/version.map                  |   2 +-
>  lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
>  5 files changed, 387 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/prog_guide/vhost_lib.rst
> b/doc/guides/prog_guide/vhost_lib.rst
> index f287b76ebf..09c1c24b48 100644
> --- a/doc/guides/prog_guide/vhost_lib.rst
> +++ b/doc/guides/prog_guide/vhost_lib.rst
> @@ -282,6 +282,13 @@ The following is an overview of some key Vhost API
> functions:
>    Clear inflight packets which are submitted to DMA engine in vhost async
> data
>    path. Completed packets are returned to applications through ``pkts``.
> 
> +* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> +  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> +	int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``

In dmadev library, dma_id is int16_t, rather than uint16_t.

> +
> +  Receives (dequeues) ``count`` packets from guest to host in async
> + data path,  and stored them at ``pkts``.
> +
>  Vhost-user Implementations
>  --------------------------
> 
> diff --git a/doc/guides/rel_notes/release_22_07.rst
> b/doc/guides/rel_notes/release_22_07.rst
> index 88b1e478d4..564d88623e 100644
> --- a/doc/guides/rel_notes/release_22_07.rst
> +++ b/doc/guides/rel_notes/release_22_07.rst
> @@ -70,6 +70,11 @@ New Features
>    Added an API which can get the number of inflight packets in
>    vhost async data path without using lock.
> 
> +* **Added vhost async dequeue API to receive pkts from guest.**
> +
> +  Added vhost async dequeue API which can leverage DMA devices to
> + accelerate receiving pkts from guest.
> +
>  Removed Items
>  -------------
> 
> diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h index
> 70234debf9..2789492e38 100644
> --- a/lib/vhost/rte_vhost_async.h
> +++ b/lib/vhost/rte_vhost_async.h
> @@ -204,6 +204,43 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int
> vid, uint16_t queue_id,  __rte_experimental  int
> rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice
> + *
> + * This function tries to receive packets from the guest with
> +offloading
> + * copies to the async channel. The packets that are transfer completed

In rte_vhost_async.h, DMA vChannel is referred as async channel or async copy
engine. But I think it's better to replace all with DMA vChannel to be consistent
with DPDK.

> + * are returned in "pkts". The other packets that their copies are
> +submitted to
> + * the async channel but not completed are called "in-flight packets".
> + * This function will not return in-flight packets until their copies
> +are
> + * completed by the async channel.
> + *
> + * @param vid
> + *  ID of vhost device to dequeue data
> + * @param queue_id
> + *  ID of virtqueue to dequeue data
> + * @param mbuf_pool
> + *  Mbuf_pool where host mbuf is allocated
> + * @param pkts
> + *  Blank array to keep successfully dequeued packets
> + * @param count
> + *  Size of the packet array
> + * @param nr_inflight
> + *  >= 0: The amount of in-flight packets

Better to add more descriptions about the meaning of "nr_inflight".

Thanks,
Jiayu

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v7 4/5] vhost: support async dequeue for split ring
  2022-05-16  5:52     ` Hu, Jiayu
@ 2022-05-16  6:10       ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-05-16  6:10 UTC (permalink / raw)
  To: Hu, Jiayu, maxime.coquelin, Xia, Chenbo
  Cc: dev, Jiang, Cheng1, Pai G, Sunil, liangma, Wang, YuanX

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu <jiayu.hu@intel.com>
> Sent: Monday, May 16, 2022 1:53 PM
> To: Ding, Xuan <xuan.ding@intel.com>; maxime.coquelin@redhat.com; Xia,
> Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Jiang, Cheng1 <cheng1.jiang@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; liangma@liangbit.com; Wang, YuanX
> <yuanx.wang@intel.com>
> Subject: RE: [PATCH v7 4/5] vhost: support async dequeue for split ring
> 
> Hi Xuan,
> 
> > -----Original Message-----
> > From: Ding, Xuan <xuan.ding@intel.com>
> > Sent: Monday, May 16, 2022 10:43 AM
> > To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> > Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Jiang, Cheng1
> > <cheng1.jiang@intel.com>; Pai G, Sunil <sunil.pai.g@intel.com>;
> > liangma@liangbit.com; Ding, Xuan <xuan.ding@intel.com>; Wang, YuanX
> > <yuanx.wang@intel.com>
> > Subject: [PATCH v7 4/5] vhost: support async dequeue for split ring
> >
> > From: Xuan Ding <xuan.ding@intel.com>
> >
> > This patch implements asynchronous dequeue data path for vhost split
> > ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.
> >
> > Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
> > Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> >  doc/guides/prog_guide/vhost_lib.rst    |   7 +
> >  doc/guides/rel_notes/release_22_07.rst |   5 +
> >  lib/vhost/rte_vhost_async.h            |  37 +++
> >  lib/vhost/version.map                  |   2 +-
> >  lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
> >  5 files changed, 387 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index f287b76ebf..09c1c24b48 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -282,6 +282,13 @@ The following is an overview of some key Vhost
> > API
> > functions:
> >    Clear inflight packets which are submitted to DMA engine in vhost
> > async data
> >    path. Completed packets are returned to applications through ``pkts``.
> >
> > +* ``rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
> > +  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > +count, int *nr_inflight, uint16_t dma_id, uint16_t vchan_id)``
> 
> In dmadev library, dma_id is int16_t, rather than uint16_t.

Thanks for the reminder. I will fix it in v8.

> 
> > +
> > +  Receives (dequeues) ``count`` packets from guest to host in async
> > + data path,  and stored them at ``pkts``.
> > +
> >  Vhost-user Implementations
> >  --------------------------
> >
> > diff --git a/doc/guides/rel_notes/release_22_07.rst
> > b/doc/guides/rel_notes/release_22_07.rst
> > index 88b1e478d4..564d88623e 100644
> > --- a/doc/guides/rel_notes/release_22_07.rst
> > +++ b/doc/guides/rel_notes/release_22_07.rst
> > @@ -70,6 +70,11 @@ New Features
> >    Added an API which can get the number of inflight packets in
> >    vhost async data path without using lock.
> >
> > +* **Added vhost async dequeue API to receive pkts from guest.**
> > +
> > +  Added vhost async dequeue API which can leverage DMA devices to
> > + accelerate receiving pkts from guest.
> > +
> >  Removed Items
> >  -------------
> >
> > diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
> > index
> > 70234debf9..2789492e38 100644
> > --- a/lib/vhost/rte_vhost_async.h
> > +++ b/lib/vhost/rte_vhost_async.h
> > @@ -204,6 +204,43 @@ uint16_t
> rte_vhost_clear_queue_thread_unsafe(int
> > vid, uint16_t queue_id,  __rte_experimental  int
> > rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * This function tries to receive packets from the guest with
> > +offloading
> > + * copies to the async channel. The packets that are transfer
> > +completed
> 
> In rte_vhost_async.h, DMA vChannel is referred as async channel or async
> copy engine. But I think it's better to replace all with DMA vChannel to be
> consistent with DPDK.

Thanks for your suggestion, I will refine the API description in next version.

> 
> > + * are returned in "pkts". The other packets that their copies are
> > +submitted to
> > + * the async channel but not completed are called "in-flight packets".
> > + * This function will not return in-flight packets until their copies
> > +are
> > + * completed by the async channel.
> > + *
> > + * @param vid
> > + *  ID of vhost device to dequeue data
> > + * @param queue_id
> > + *  ID of virtqueue to dequeue data
> > + * @param mbuf_pool
> > + *  Mbuf_pool where host mbuf is allocated
> > + * @param pkts
> > + *  Blank array to keep successfully dequeued packets
> > + * @param count
> > + *  Size of the packet array
> > + * @param nr_inflight
> > + *  >= 0: The amount of in-flight packets
> 
> Better to add more descriptions about the meaning of "nr_inflight".

Same as above.

Regards,
Xuan

> 
> Thanks,
> Jiayu


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 0/5] vhost: support async dequeue data path
  2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
                   ` (10 preceding siblings ...)
  2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
@ 2022-05-16 11:10 ` xuan.ding
  2022-05-16 11:10   ` [PATCH v8 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
                     ` (5 more replies)
  11 siblings, 6 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

The presence of asynchronous path allows applications to offload memory
copies to DMA engine, so as to save CPU cycles and improve the copy
performance. This patch set implements vhost async dequeue data path
for split ring. The code is based on latest enqueue changes [1].

This patch set is a new design and implementation of [2]. Since dmadev
was introduced in DPDK 21.11, to simplify application logics, this patch
integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
mapping between vrings and DMA virtual channels. Specifically, one vring
can use multiple different DMA channels and one DMA channel can be
shared by multiple vrings at the same time.

A new asynchronous dequeue function is introduced:
        1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
                struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
                uint16_t count, int *nr_inflight,
                uint16_t dma_id, uint16_t vchan_id)

        Receive packets from the guest and offloads copies to DMA
virtual channel.

[1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
[2] https://mails.dpdk.org/archives/dev/2021-September/218591.html

v7->v8:
* change dma_id to int16_t
* refine API documentation

v6->v7:
* correct code formatting
* change some functions to inline

v5->v6:
* adjust EXPERIMENTAL header

v4->v5:
* rebase to latest DPDK
* add some checks

v3->v4:
* fix CI build warnings
* adjust some indentation
* pass vq instead of queue_id

v2->v3:
* fix mbuf not updated correctly for large packets

v1->v2:
* fix a typo
* fix a bug in desc_to_mbuf filling

RFC v3 -> v1:
* add sync and async path descriptor to mbuf refactoring
* add API description in docs

RFC v2 -> RFC v3:
* rebase to latest DPDK version

RFC v1 -> RFC v2:
* fix one bug in example
* rename vchan to vchan_id
* check if dma_id and vchan_id valid
* rework all the logs to new standard

Xuan Ding (5):
  vhost: prepare sync for descriptor to mbuf refactoring
  vhost: prepare async for descriptor to mbuf refactoring
  vhost: merge sync and async descriptor to mbuf filling
  vhost: support async dequeue for split ring
  examples/vhost: support async dequeue data path

 doc/guides/prog_guide/vhost_lib.rst    |   6 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 doc/guides/sample_app_ug/vhost.rst     |   9 +-
 examples/vhost/main.c                  | 286 ++++++++++-----
 examples/vhost/main.h                  |  32 +-
 examples/vhost/virtio_net.c            |  16 +-
 lib/vhost/rte_vhost_async.h            |  37 ++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/vhost.h                      |   1 +
 lib/vhost/virtio_net.c                 | 473 ++++++++++++++++++++++---
 10 files changed, 715 insertions(+), 152 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 1/5] vhost: prepare sync for descriptor to mbuf refactoring
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
@ 2022-05-16 11:10   ` xuan.ding
  2022-05-16 11:10   ` [PATCH v8 2/5] vhost: prepare async " xuan.ding
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch extracts the descriptors to buffers filling from
copy_desc_to_mbuf() into a dedicated function. Besides, enqueue
and dequeue path are refactored to use the same function
sync_fill_seg() for preparing batch elements, which simplifies
the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 78 ++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5f432b0d77..d4c94d2a9b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1030,23 +1030,36 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline void
-sync_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+sync_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_addr, uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 
 	if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >= vq->size)) {
-		rte_memcpy((void *)((uintptr_t)(buf_addr)),
+		if (to_desc) {
+			rte_memcpy((void *)((uintptr_t)(buf_addr)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
+		} else {
+			rte_memcpy(rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
+				(void *)((uintptr_t)(buf_addr)),
+				cpy_len);
+		}
 		vhost_log_cache_write_iova(dev, vq, buf_iova, cpy_len);
 		PRINT_PACKET(dev, (uintptr_t)(buf_addr), cpy_len, 0);
 	} else {
-		batch_copy[vq->batch_copy_nb_elems].dst =
-			(void *)((uintptr_t)(buf_addr));
-		batch_copy[vq->batch_copy_nb_elems].src =
-			rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		if (to_desc) {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				(void *)((uintptr_t)(buf_addr));
+			batch_copy[vq->batch_copy_nb_elems].src =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+		} else {
+			batch_copy[vq->batch_copy_nb_elems].dst =
+				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset);
+			batch_copy[vq->batch_copy_nb_elems].src =
+				(void *)((uintptr_t)(buf_addr));
+		}
 		batch_copy[vq->batch_copy_nb_elems].log_addr = buf_iova;
 		batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
 		vq->batch_copy_nb_elems++;
@@ -1158,9 +1171,9 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 						buf_iova + buf_offset, cpy_len) < 0)
 				goto error;
 		} else {
-			sync_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-					buf_addr + buf_offset,
-					buf_iova + buf_offset, cpy_len);
+			sync_fill_seg(dev, vq, m, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, true);
 		}
 
 		mbuf_avail  -= cpy_len;
@@ -2473,8 +2486,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
 		  bool legacy_ol_flags)
 {
-	uint32_t buf_avail, buf_offset;
-	uint64_t buf_addr, buf_len;
+	uint32_t buf_avail, buf_offset, buf_len;
+	uint64_t buf_addr, buf_iova;
 	uint32_t mbuf_avail, mbuf_offset;
 	uint32_t cpy_len;
 	struct rte_mbuf *cur = m, *prev = m;
@@ -2482,16 +2495,13 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
-	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
-	int error = 0;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
+	buf_iova = buf_vec[vec_idx].buf_iova;
 	buf_len = buf_vec[vec_idx].buf_len;
 
-	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) {
-		error = -1;
-		goto out;
-	}
+	if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1))
+		return -1;
 
 	if (virtio_net_with_host_offload(dev)) {
 		if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) {
@@ -2515,11 +2525,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		buf_offset = dev->vhost_hlen - buf_len;
 		vec_idx++;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 		buf_avail  = buf_len - buf_offset;
 	} else if (buf_len == dev->vhost_hlen) {
 		if (unlikely(++vec_idx >= nr_vec))
-			goto out;
+			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
 		buf_len = buf_vec[vec_idx].buf_len;
 
@@ -2539,22 +2550,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (likely(cpy_len > MAX_BATCH_LEN ||
-					vq->batch_copy_nb_elems >= vq->size ||
-					(hdr && cur == m))) {
-			rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset),
-					(void *)((uintptr_t)(buf_addr +
-							buf_offset)), cpy_len);
-		} else {
-			batch_copy[vq->batch_copy_nb_elems].dst =
-				rte_pktmbuf_mtod_offset(cur, void *,
-						mbuf_offset);
-			batch_copy[vq->batch_copy_nb_elems].src =
-				(void *)((uintptr_t)(buf_addr + buf_offset));
-			batch_copy[vq->batch_copy_nb_elems].len = cpy_len;
-			vq->batch_copy_nb_elems++;
-		}
+		sync_fill_seg(dev, vq, cur, mbuf_offset,
+			      buf_addr + buf_offset,
+			      buf_iova + buf_offset, cpy_len, false);
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2567,6 +2565,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				break;
 
 			buf_addr = buf_vec[vec_idx].buf_addr;
+			buf_iova = buf_vec[vec_idx].buf_iova;
 			buf_len = buf_vec[vec_idx].buf_len;
 
 			buf_offset = 0;
@@ -2585,8 +2584,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			if (unlikely(cur == NULL)) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to allocate memory for mbuf.\n",
 						dev->ifname);
-				error = -1;
-				goto out;
+				goto error;
 			}
 
 			prev->next = cur;
@@ -2606,9 +2604,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	if (hdr)
 		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
 
-out:
-
-	return error;
+	return 0;
+error:
+	return -1;
 }
 
 static void
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 2/5] vhost: prepare async for descriptor to mbuf refactoring
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
  2022-05-16 11:10   ` [PATCH v8 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
@ 2022-05-16 11:10   ` xuan.ding
  2022-05-16 11:10   ` [PATCH v8 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors vhost async enqueue path and dequeue path to use
the same function async_fill_seg() for preparing batch elements,
which simplifies the code without performance degradation.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/virtio_net.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index d4c94d2a9b..a9e2dcd9ce 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async)
 }
 
 static __rte_always_inline int
-async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
+async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		struct rte_mbuf *m, uint32_t mbuf_offset,
-		uint64_t buf_iova, uint32_t cpy_len)
+		uint64_t buf_iova, uint32_t cpy_len, bool to_desc)
 {
 	struct vhost_async *async = vq->async;
 	uint64_t mapped_len;
 	uint32_t buf_offset = 0;
+	void *src, *dst;
 	void *host_iova;
 
 	while (cpy_len) {
@@ -1015,10 +1016,15 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			return -1;
 		}
 
-		if (unlikely(async_iter_add_iovec(dev, async,
-						(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-							mbuf_offset),
-						host_iova, (size_t)mapped_len)))
+		if (to_desc) {
+			src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+			dst = host_iova;
+		} else {
+			src = host_iova;
+			dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset);
+		}
+
+		if (unlikely(async_iter_add_iovec(dev, async, src, dst, (size_t)mapped_len)))
 			return -1;
 
 		cpy_len -= (uint32_t)mapped_len;
@@ -1167,8 +1173,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
 		if (is_async) {
-			if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset,
-						buf_iova + buf_offset, cpy_len) < 0)
+			if (async_fill_seg(dev, vq, m, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, true) < 0)
 				goto error;
 		} else {
 			sync_fill_seg(dev, vq, m, mbuf_offset,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 3/5] vhost: merge sync and async descriptor to mbuf filling
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
  2022-05-16 11:10   ` [PATCH v8 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
  2022-05-16 11:10   ` [PATCH v8 2/5] vhost: prepare async " xuan.ding
@ 2022-05-16 11:10   ` xuan.ding
  2022-05-16 11:10   ` [PATCH v8 4/5] vhost: support async dequeue for split ring xuan.ding
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding

From: Xuan Ding <xuan.ding@intel.com>

This patch refactors copy_desc_to_mbuf() used by the sync
path to support both sync and async descriptor to mbuf filling.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.h      |  1 +
 lib/vhost/virtio_net.c | 48 ++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index a9edc271aa..00744b234f 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -180,6 +180,7 @@ struct async_inflight_info {
 	struct rte_mbuf *mbuf;
 	uint16_t descs; /* num of descs inflight */
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
+	struct virtio_net_hdr nethdr;
 };
 
 struct vhost_async {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index a9e2dcd9ce..5904839d5c 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2487,10 +2487,10 @@ copy_vnet_hdr_from_desc(struct virtio_net_hdr *hdr,
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		  struct buf_vector *buf_vec, uint16_t nr_vec,
 		  struct rte_mbuf *m, struct rte_mempool *mbuf_pool,
-		  bool legacy_ol_flags)
+		  bool legacy_ol_flags, uint16_t slot_idx, bool is_async)
 {
 	uint32_t buf_avail, buf_offset, buf_len;
 	uint64_t buf_addr, buf_iova;
@@ -2501,6 +2501,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr *hdr = NULL;
 	/* A counter to avoid desc dead loop chain */
 	uint16_t vec_idx = 0;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info;
 
 	buf_addr = buf_vec[vec_idx].buf_addr;
 	buf_iova = buf_vec[vec_idx].buf_iova;
@@ -2538,6 +2540,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		if (unlikely(++vec_idx >= nr_vec))
 			goto error;
 		buf_addr = buf_vec[vec_idx].buf_addr;
+		buf_iova = buf_vec[vec_idx].buf_iova;
 		buf_len = buf_vec[vec_idx].buf_len;
 
 		buf_offset = 0;
@@ -2553,12 +2556,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	mbuf_offset = 0;
 	mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
+
+	if (is_async) {
+		pkts_info = async->pkts_info;
+		if (async_iter_initialize(dev, async))
+			return -1;
+	}
+
 	while (1) {
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		sync_fill_seg(dev, vq, cur, mbuf_offset,
-			      buf_addr + buf_offset,
-			      buf_iova + buf_offset, cpy_len, false);
+		if (is_async) {
+			if (async_fill_seg(dev, vq, cur, mbuf_offset,
+					   buf_iova + buf_offset, cpy_len, false) < 0)
+				goto error;
+		} else {
+			sync_fill_seg(dev, vq, cur, mbuf_offset,
+				      buf_addr + buf_offset,
+				      buf_iova + buf_offset, cpy_len, false);
+		}
 
 		mbuf_avail  -= cpy_len;
 		mbuf_offset += cpy_len;
@@ -2607,11 +2623,20 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	prev->data_len = mbuf_offset;
 	m->pkt_len    += mbuf_offset;
 
-	if (hdr)
-		vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	if (is_async) {
+		async_iter_finalize(async);
+		if (hdr)
+			pkts_info[slot_idx].nethdr = *hdr;
+	} else {
+		if (hdr)
+			vhost_dequeue_offload(dev, hdr, m, legacy_ol_flags);
+	}
 
 	return 0;
 error:
+	if (is_async)
+		async_iter_cancel(async);
+
 	return -1;
 }
 
@@ -2743,8 +2768,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
-				mbuf_pool, legacy_ol_flags);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i],
+				   mbuf_pool, legacy_ol_flags, 0, false);
 		if (unlikely(err)) {
 			if (!allocerr_warned) {
 				VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
@@ -2755,6 +2780,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			i++;
 			break;
 		}
+
 	}
 
 	if (dropped)
@@ -2936,8 +2962,8 @@ vhost_dequeue_single_packed(struct virtio_net *dev,
 		return -1;
 	}
 
-	err = copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
-				mbuf_pool, legacy_ol_flags);
+	err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts,
+			   mbuf_pool, legacy_ol_flags, 0, false);
 	if (unlikely(err)) {
 		if (!allocerr_warned) {
 			VHOST_LOG_DATA(ERR, "(%s) failed to copy desc to mbuf.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 4/5] vhost: support async dequeue for split ring
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
                     ` (2 preceding siblings ...)
  2022-05-16 11:10   ` [PATCH v8 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
@ 2022-05-16 11:10   ` xuan.ding
  2022-06-16 14:38     ` David Marchand
  2022-05-16 11:10   ` [PATCH v8 5/5] examples/vhost: support async dequeue data path xuan.ding
  2022-05-17 13:22   ` [PATCH v8 0/5] vhost: " Maxime Coquelin
  5 siblings, 1 reply; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch implements asynchronous dequeue data path for vhost split
ring, a new API rte_vhost_async_try_dequeue_burst() is introduced.

Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    |   6 +
 doc/guides/rel_notes/release_22_07.rst |   5 +
 lib/vhost/rte_vhost_async.h            |  37 +++
 lib/vhost/version.map                  |   2 +-
 lib/vhost/virtio_net.c                 | 337 +++++++++++++++++++++++++
 5 files changed, 386 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index f287b76ebf..98f4509d1a 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -282,6 +282,12 @@ The following is an overview of some key Vhost API functions:
   Clear inflight packets which are submitted to DMA engine in vhost async data
   path. Completed packets are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count,
+  nr_inflight, dma_id, vchan_id)``
+
+  Receive ``count`` packets from guest to host in async data path,
+  and store them at ``pkts``.
+
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88b1e478d4..564d88623e 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -70,6 +70,11 @@ New Features
   Added an API which can get the number of inflight packets in
   vhost async data path without using lock.
 
+* **Added vhost async dequeue API to receive pkts from guest.**
+
+  Added vhost async dequeue API which can leverage DMA devices to
+  accelerate receiving pkts from guest.
+
 Removed Items
 -------------
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 70234debf9..a1e7f674ed 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -204,6 +204,43 @@ uint16_t rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id,
 __rte_experimental
 int rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * This function tries to receive packets from the guest with offloading
+ * copies to the DMA vChannels. Successfully dequeued packets are returned
+ * in "pkts". The other packets that their copies are submitted to
+ * the DMA vChannels but not completed are called "in-flight packets".
+ * This function will not return in-flight packets until their copies are
+ * completed by the DMA vChannels.
+ *
+ * @param vid
+ *  ID of vhost device to dequeue data
+ * @param queue_id
+ *  ID of virtqueue to dequeue data
+ * @param mbuf_pool
+ *  Mbuf_pool where host mbuf is allocated
+ * @param pkts
+ *  Blank array to keep successfully dequeued packets
+ * @param count
+ *  Size of the packet array
+ * @param nr_inflight
+ *  >= 0: The amount of in-flight packets
+ *  -1: Meaningless, indicates failed lock acquisition or invalid queue_id/dma_id
+ * @param dma_id
+ *  The identifier of DMA device
+ * @param vchan_id
+ *  The identifier of virtual DMA channel
+ * @return
+ *  Number of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, int16_t dma_id, uint16_t vchan_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 5841315386..8c7211bf0d 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -90,7 +90,7 @@ EXPERIMENTAL {
 
 	# added in 22.07
 	rte_vhost_async_get_inflight_thread_unsafe;
-
+	rte_vhost_async_try_dequeue_burst;
 };
 
 INTERNAL {
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 5904839d5c..c6b11bcb6f 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3171,3 +3171,340 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 	return count;
 }
+
+static __rte_always_inline uint16_t
+async_poll_dequeue_completed_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mbuf **pkts, uint16_t count, int16_t dma_id,
+		uint16_t vchan_id, bool legacy_ol_flags)
+{
+	uint16_t start_idx, from, i;
+	uint16_t nr_cpl_pkts = 0;
+	struct async_inflight_info *pkts_info = vq->async->pkts_info;
+
+	vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE);
+
+	start_idx = async_get_first_inflight_pkt_idx(vq);
+
+	from = start_idx;
+	while (vq->async->pkts_cmpl_flag[from] && count--) {
+		vq->async->pkts_cmpl_flag[from] = false;
+		from = (from + 1) & (vq->size - 1);
+		nr_cpl_pkts++;
+	}
+
+	if (nr_cpl_pkts == 0)
+		return 0;
+
+	for (i = 0; i < nr_cpl_pkts; i++) {
+		from = (start_idx + i) & (vq->size - 1);
+		pkts[i] = pkts_info[from].mbuf;
+
+		if (virtio_net_with_host_offload(dev))
+			vhost_dequeue_offload(dev, &pkts_info[from].nethdr, pkts[i],
+					      legacy_ol_flags);
+	}
+
+	/* write back completed descs to used ring and update used idx */
+	write_back_completed_descs_split(vq, nr_cpl_pkts);
+	__atomic_add_fetch(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+	vhost_vring_call_split(dev, vq);
+
+	vq->async->pkts_inflight_n -= nr_cpl_pkts;
+
+	return nr_cpl_pkts;
+}
+
+static __rte_always_inline uint16_t
+virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+		int16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
+{
+	static bool allocerr_warned;
+	bool dropped = false;
+	uint16_t free_entries;
+	uint16_t pkt_idx, slot_idx = 0;
+	uint16_t nr_done_pkts = 0;
+	uint16_t pkt_err = 0;
+	uint16_t n_xfer;
+	struct vhost_async *async = vq->async;
+	struct async_inflight_info *pkts_info = async->pkts_info;
+	struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
+	uint16_t pkts_size = count;
+
+	/**
+	 * The ordering between avail index and
+	 * desc reads needs to be enforced.
+	 */
+	free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
+			vq->last_avail_idx;
+	if (free_entries == 0)
+		goto out;
+
+	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
+
+	async_iter_reset(async);
+
+	count = RTE_MIN(count, MAX_PKT_BURST);
+	count = RTE_MIN(count, free_entries);
+	VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
+			dev->ifname, count);
+
+	if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
+		goto out;
+
+	for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
+		uint16_t head_idx = 0;
+		uint16_t nr_vec = 0;
+		uint16_t to;
+		uint32_t buf_len;
+		int err;
+		struct buf_vector buf_vec[BUF_VECTOR_MAX];
+		struct rte_mbuf *pkt = pkts_prealloc[pkt_idx];
+
+		if (unlikely(fill_vec_buf_split(dev, vq, vq->last_avail_idx,
+						&nr_vec, buf_vec,
+						&head_idx, &buf_len,
+						VHOST_ACCESS_RO) < 0)) {
+			dropped = true;
+			break;
+		}
+
+		err = virtio_dev_pktmbuf_prep(dev, pkt, buf_len);
+		if (unlikely(err)) {
+			/**
+			 * mbuf allocation fails for jumbo packets when external
+			 * buffer allocation is not allowed and linear buffer
+			 * is required. Drop this packet.
+			 */
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed mbuf alloc of size %d from %s\n",
+					dev->ifname, __func__, buf_len, mbuf_pool->name);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		slot_idx = (async->pkts_idx + pkt_idx) & (vq->size - 1);
+		err = desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkt, mbuf_pool,
+					legacy_ol_flags, slot_idx, true);
+		if (unlikely(err)) {
+			if (!allocerr_warned) {
+				VHOST_LOG_DATA(ERR,
+					"(%s) %s: Failed to offload copies to async channel.\n",
+					dev->ifname, __func__);
+				allocerr_warned = true;
+			}
+			dropped = true;
+			break;
+		}
+
+		pkts_info[slot_idx].mbuf = pkt;
+
+		/* store used descs */
+		to = async->desc_idx_split & (vq->size - 1);
+		async->descs_split[to].id = head_idx;
+		async->descs_split[to].len = 0;
+		async->desc_idx_split++;
+
+		vq->last_avail_idx++;
+	}
+
+	if (unlikely(dropped))
+		rte_pktmbuf_free_bulk(&pkts_prealloc[pkt_idx], count - pkt_idx);
+
+	n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx,
+					  async->iov_iter, pkt_idx);
+
+	async->pkts_inflight_n += n_xfer;
+
+	pkt_err = pkt_idx - n_xfer;
+	if (unlikely(pkt_err)) {
+		VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer data.\n",
+				dev->ifname, __func__);
+
+		pkt_idx = n_xfer;
+		/* recover available ring */
+		vq->last_avail_idx -= pkt_err;
+
+		/**
+		 * recover async channel copy related structures and free pktmbufs
+		 * for error pkts.
+		 */
+		async->desc_idx_split -= pkt_err;
+		while (pkt_err-- > 0) {
+			rte_pktmbuf_free(pkts_info[slot_idx & (vq->size - 1)].mbuf);
+			slot_idx--;
+		}
+	}
+
+	async->pkts_idx += pkt_idx;
+	if (async->pkts_idx >= vq->size)
+		async->pkts_idx -= vq->size;
+
+out:
+	/* DMA device may serve other queues, unconditionally check completed. */
+	nr_done_pkts = async_poll_dequeue_completed_split(dev, vq, pkts, pkts_size,
+							  dma_id, vchan_id, legacy_ol_flags);
+
+	return nr_done_pkts;
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_legacy(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		int16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, true);
+}
+
+__rte_noinline
+static uint16_t
+virtio_dev_tx_async_split_compliant(struct virtio_net *dev,
+		struct vhost_virtqueue *vq, struct rte_mempool *mbuf_pool,
+		struct rte_mbuf **pkts, uint16_t count,
+		int16_t dma_id, uint16_t vchan_id)
+{
+	return virtio_dev_tx_async_split(dev, vq, mbuf_pool,
+				pkts, count, dma_id, vchan_id, false);
+}
+
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+	int *nr_inflight, int16_t dma_id, uint16_t vchan_id)
+{
+	struct virtio_net *dev;
+	struct rte_mbuf *rarp_mbuf = NULL;
+	struct vhost_virtqueue *vq;
+	int16_t success = 1;
+
+	dev = get_device(vid);
+	if (!dev || !nr_inflight)
+		return 0;
+
+	*nr_inflight = -1;
+
+	if (unlikely(!(dev->flags & VIRTIO_DEV_BUILTIN_VIRTIO_NET))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: built-in vhost net backend is disabled.\n",
+				dev->ifname, __func__);
+		return 0;
+	}
+
+	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid virtqueue idx %d.\n",
+				dev->ifname, __func__, queue_id);
+		return 0;
+	}
+
+	if (unlikely(dma_id < 0 || dma_id >= RTE_DMADEV_DEFAULT_MAX)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid dma id %d.\n",
+				dev->ifname, __func__, dma_id);
+		return 0;
+	}
+
+	if (unlikely(!dma_copy_track[dma_id].vchans ||
+				!dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: invalid channel %d:%u.\n", dev->ifname, __func__,
+				dma_id, vchan_id);
+		return 0;
+	}
+
+	vq = dev->virtqueue[queue_id];
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
+		return 0;
+
+	if (unlikely(vq->enabled == 0)) {
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (unlikely(!vq->async)) {
+		VHOST_LOG_DATA(ERR, "(%s) %s: async not registered for queue id %d.\n",
+				dev->ifname, __func__, queue_id);
+		count = 0;
+		goto out_access_unlock;
+	}
+
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_lock(vq);
+
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0)) {
+			count = 0;
+			goto out;
+		}
+
+	/*
+	 * Construct a RARP broadcast packet, and inject it to the "pkts"
+	 * array, to looks like that guest actually send such packet.
+	 *
+	 * Check user_send_rarp() for more information.
+	 *
+	 * broadcast_rarp shares a cacheline in the virtio_net structure
+	 * with some fields that are accessed during enqueue and
+	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * and exchange. This could result in false sharing between enqueue
+	 * and dequeue.
+	 *
+	 * Prevent unnecessary false sharing by reading broadcast_rarp first
+	 * and only performing compare and exchange if the read indicates it
+	 * is likely to be set.
+	 */
+	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
+			__atomic_compare_exchange_n(&dev->broadcast_rarp,
+			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+
+		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
+		if (rarp_mbuf == NULL) {
+			VHOST_LOG_DATA(ERR, "Failed to make RARP packet.\n");
+			count = 0;
+			goto out;
+		}
+		/*
+		 * Inject it to the head of "pkts" array, so that switch's mac
+		 * learning table will get updated first.
+		 */
+		pkts[0] = rarp_mbuf;
+		pkts++;
+		count -= 1;
+	}
+
+	if (unlikely(vq_is_packed(dev))) {
+		static bool not_support_pack_log;
+		if (!not_support_pack_log) {
+			VHOST_LOG_DATA(ERR,
+				"(%s) %s: async dequeue does not support packed ring.\n",
+				dev->ifname, __func__);
+			not_support_pack_log = true;
+		}
+		count = 0;
+		goto out;
+	}
+
+	if (dev->flags & VIRTIO_DEV_LEGACY_OL_FLAGS)
+		count = virtio_dev_tx_async_split_legacy(dev, vq, mbuf_pool, pkts,
+							 count, dma_id, vchan_id);
+	else
+		count = virtio_dev_tx_async_split_compliant(dev, vq, mbuf_pool, pkts,
+							    count, dma_id, vchan_id);
+
+	*nr_inflight = vq->async->pkts_inflight_n;
+
+out:
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+		vhost_user_iotlb_rd_unlock(vq);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
+	if (unlikely(rarp_mbuf != NULL))
+		count += 1;
+
+	return count;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v8 5/5] examples/vhost: support async dequeue data path
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
                     ` (3 preceding siblings ...)
  2022-05-16 11:10   ` [PATCH v8 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-05-16 11:10   ` xuan.ding
  2022-05-17 13:22   ` [PATCH v8 0/5] vhost: " Maxime Coquelin
  5 siblings, 0 replies; 73+ messages in thread
From: xuan.ding @ 2022-05-16 11:10 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia
  Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma, Xuan Ding,
	Wenwu Ma, Yuan Wang

From: Xuan Ding <xuan.ding@intel.com>

This patch adds the use case for async dequeue API. Vswitch can
leverage DMA device to accelerate vhost async dequeue path.

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
Signed-off-by: Yuan Wang <yuanx.wang@intel.com>
Signed-off-by: Xuan Ding <xuan.ding@intel.com>
Tested-by: Yvonne Yang <yvonnex.yang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/main.c              | 286 ++++++++++++++++++++---------
 examples/vhost/main.h              |  32 +++-
 examples/vhost/virtio_net.c        |  16 +-
 4 files changed, 245 insertions(+), 98 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst
index a6ce4bc8ac..09db965e70 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -------------
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c4d46de1c5..5bc34b0c52 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -63,6 +63,9 @@
 
 #define DMA_RING_SIZE 4096
 
+#define ASYNC_ENQUEUE_VHOST 1
+#define ASYNC_DEQUEUE_VHOST 2
+
 /* number of mbufs in all pools - if specified on command-line. */
 static int total_num_mbufs = NUM_MBUFS_DEFAULT;
 
@@ -116,6 +119,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[RTE_MAX_VHOST_DEVICE];
+
 /* empty VMDq configuration structure. Filled in programmatically */
 static struct rte_eth_conf vmdq_conf_default = {
 	.rxmode = {
@@ -205,6 +210,20 @@ struct vhost_bufftable *vhost_txbuff[RTE_MAX_LCORE * RTE_MAX_VHOST_DEVICE];
 #define MBUF_TABLE_DRAIN_TSC	((rte_get_tsc_hz() + US_PER_S - 1) \
 				 / US_PER_S * BURST_TX_DRAIN_US)
 
+static int vid2socketid[RTE_MAX_VHOST_DEVICE];
+
+static inline uint32_t
+get_async_flag_by_socketid(int socketid)
+{
+	return dma_bind[socketid].async_flag;
+}
+
+static inline void
+init_vid2socketid_array(int vid, int socketid)
+{
+	vid2socketid[vid] = socketid;
+}
+
 static inline bool
 is_dma_configured(int16_t dev_id)
 {
@@ -224,7 +243,7 @@ open_dma(const char *value)
 	char *addrs = input;
 	char *ptrs[2];
 	char *start, *end, *substr;
-	int64_t vid;
+	int64_t socketid, vring_id;
 
 	struct rte_dma_info info;
 	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
@@ -262,7 +281,9 @@ open_dma(const char *value)
 
 	while (i < args_nr) {
 		char *arg_temp = dma_arg[i];
+		char *txd, *rxd;
 		uint8_t sub_nr;
+		int async_flag;
 
 		sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
 		if (sub_nr != 2) {
@@ -270,14 +291,23 @@ open_dma(const char *value)
 			goto out;
 		}
 
-		start = strstr(ptrs[0], "txd");
-		if (start == NULL) {
+		txd = strstr(ptrs[0], "txd");
+		rxd = strstr(ptrs[0], "rxd");
+		if (txd) {
+			start = txd;
+			vring_id = VIRTIO_RXQ;
+			async_flag = ASYNC_ENQUEUE_VHOST;
+		} else if (rxd) {
+			start = rxd;
+			vring_id = VIRTIO_TXQ;
+			async_flag = ASYNC_DEQUEUE_VHOST;
+		} else {
 			ret = -1;
 			goto out;
 		}
 
 		start += 3;
-		vid = strtol(start, &end, 0);
+		socketid = strtol(start, &end, 0);
 		if (end == start) {
 			ret = -1;
 			goto out;
@@ -338,7 +368,8 @@ open_dma(const char *value)
 		dmas_id[dma_count++] = dev_id;
 
 done:
-		(dma_info + vid)->dmas[VIRTIO_RXQ].dev_id = dev_id;
+		(dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+		(dma_info + socketid)->async_flag |= async_flag;
 		i++;
 	}
 out:
@@ -990,7 +1021,7 @@ complete_async_pkts(struct vhost_dev *vdev)
 {
 	struct rte_mbuf *p_cpl[MAX_PKT_BURST];
 	uint16_t complete_count;
-	int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].dev_id;
 
 	complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
 					VIRTIO_RXQ, p_cpl, MAX_PKT_BURST, dma_id, 0);
@@ -1029,22 +1060,7 @@ drain_vhost(struct vhost_dev *vdev)
 	uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
 	struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-	if (builtin_net_driver) {
-		ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ, m, nr_xmit, dma_id, 0);
-
-		enqueue_fail = nr_xmit - ret;
-		if (enqueue_fail)
-			free_pkts(&m[ret], nr_xmit - ret);
-	} else {
-		ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						m, nr_xmit);
-	}
+	ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev, VIRTIO_RXQ, m, nr_xmit);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1053,7 +1069,7 @@ drain_vhost(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(m, nr_xmit);
 }
 
@@ -1325,6 +1341,32 @@ drain_mbuf_table(struct mbuf_table *tx_q)
 	}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	uint16_t enqueue_count;
+	uint16_t enqueue_fail = 0;
+	uint16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_RXQ].dev_id;
+
+	complete_async_pkts(dev);
+	enqueue_count = rte_vhost_submit_enqueue_burst(dev->vid, queue_id,
+					pkts, rx_count, dma_id, 0);
+
+	enqueue_fail = rx_count - enqueue_count;
+	if (enqueue_fail)
+		free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+	return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t rx_count)
+{
+	return rte_vhost_enqueue_burst(dev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1355,25 +1397,8 @@ drain_eth_rx(struct vhost_dev *vdev)
 		}
 	}
 
-	if (builtin_net_driver) {
-		enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-						pkts, rx_count);
-	} else if (dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t enqueue_fail = 0;
-		int16_t dma_id = dma_bind[vdev->vid].dmas[VIRTIO_RXQ].dev_id;
-
-		complete_async_pkts(vdev);
-		enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-					VIRTIO_RXQ, pkts, rx_count, dma_id, 0);
-
-		enqueue_fail = rx_count - enqueue_count;
-		if (enqueue_fail)
-			free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-	} else {
-		enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-						pkts, rx_count);
-	}
+	enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+					VIRTIO_RXQ, pkts, rx_count);
 
 	if (enable_stats) {
 		__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1382,10 +1407,31 @@ drain_eth_rx(struct vhost_dev *vdev)
 				__ATOMIC_SEQ_CST);
 	}
 
-	if (!dma_bind[vdev->vid].dmas[VIRTIO_RXQ].async_enabled)
+	if (!dma_bind[vid2socketid[vdev->vid]].dmas[VIRTIO_RXQ].async_enabled)
 		free_pkts(pkts, rx_count);
 }
 
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			    struct rte_mempool *mbuf_pool,
+			    struct rte_mbuf **pkts, uint16_t count)
+{
+	int nr_inflight;
+	uint16_t dequeue_count;
+	int16_t dma_id = dma_bind[vid2socketid[dev->vid]].dmas[VIRTIO_TXQ].dev_id;
+
+	dequeue_count = rte_vhost_async_try_dequeue_burst(dev->vid, queue_id,
+			mbuf_pool, pkts, count, &nr_inflight, dma_id, 0);
+
+	return dequeue_count;
+}
+
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			   struct rte_mempool *mbuf_pool,
+			   struct rte_mbuf **pkts, uint16_t count)
+{
+	return rte_vhost_dequeue_burst(dev->vid, queue_id, mbuf_pool, pkts, count);
+}
+
 static __rte_always_inline void
 drain_virtio_tx(struct vhost_dev *vdev)
 {
@@ -1393,13 +1439,8 @@ drain_virtio_tx(struct vhost_dev *vdev)
 	uint16_t count;
 	uint16_t i;
 
-	if (builtin_net_driver) {
-		count = vs_dequeue_pkts(vdev, VIRTIO_TXQ, mbuf_pool,
-					pkts, MAX_PKT_BURST);
-	} else {
-		count = rte_vhost_dequeue_burst(vdev->vid, VIRTIO_TXQ,
-					mbuf_pool, pkts, MAX_PKT_BURST);
-	}
+	count = vdev_queue_ops[vdev->vid].dequeue_pkt_burst(vdev,
+				VIRTIO_TXQ, mbuf_pool, pkts, MAX_PKT_BURST);
 
 	/* setup VMDq for the first packet */
 	if (unlikely(vdev->ready == DEVICE_MAC_LEARNING) && count) {
@@ -1478,6 +1519,26 @@ switch_worker(void *arg __rte_unused)
 	return 0;
 }
 
+static void
+vhost_clear_queue_thread_unsafe(struct vhost_dev *vdev, uint16_t queue_id)
+{
+	uint16_t n_pkt = 0;
+	int pkts_inflight;
+
+	int16_t dma_id = dma_bind[vid2socketid[vdev->vid]].dmas[queue_id].dev_id;
+	pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid, queue_id);
+
+	struct rte_mbuf *m_cpl[pkts_inflight];
+
+	while (pkts_inflight) {
+		n_pkt = rte_vhost_clear_queue_thread_unsafe(vdev->vid, queue_id, m_cpl,
+							pkts_inflight, dma_id, 0);
+		free_pkts(m_cpl, n_pkt);
+		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vdev->vid,
+									queue_id);
+	}
+}
+
 /*
  * Remove a device from the specific data core linked list and from the
  * main linked list. Synchronization  occurs through the use of the
@@ -1535,27 +1596,79 @@ destroy_device(int vid)
 		vdev->vid);
 
 	if (dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled) {
-		uint16_t n_pkt = 0;
-		int pkts_inflight;
-		int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-		pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, VIRTIO_RXQ);
-		struct rte_mbuf *m_cpl[pkts_inflight];
-
-		while (pkts_inflight) {
-			n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, VIRTIO_RXQ,
-						m_cpl, pkts_inflight, dma_id, 0);
-			free_pkts(m_cpl, n_pkt);
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-										VIRTIO_RXQ);
-		}
-
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_RXQ);
 		rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
 		dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = false;
 	}
 
+	if (dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled) {
+		vhost_clear_queue_thread_unsafe(vdev, VIRTIO_TXQ);
+		rte_vhost_async_channel_unregister(vid, VIRTIO_TXQ);
+		dma_bind[vid].dmas[VIRTIO_TXQ].async_enabled = false;
+	}
+
 	rte_free(vdev);
 }
 
+static inline int
+get_socketid_by_vid(int vid)
+{
+	int i;
+	char ifname[PATH_MAX];
+	rte_vhost_get_ifname(vid, ifname, sizeof(ifname));
+
+	for (i = 0; i < nb_sockets; i++) {
+		char *file = socket_files + i * PATH_MAX;
+		if (strcmp(file, ifname) == 0)
+			return i;
+	}
+
+	return -1;
+}
+
+static int
+init_vhost_queue_ops(int vid)
+{
+	if (builtin_net_driver) {
+		vdev_queue_ops[vid].enqueue_pkt_burst = builtin_enqueue_pkts;
+		vdev_queue_ops[vid].dequeue_pkt_burst = builtin_dequeue_pkts;
+	} else {
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled)
+			vdev_queue_ops[vid].enqueue_pkt_burst = async_enqueue_pkts;
+		else
+			vdev_queue_ops[vid].enqueue_pkt_burst = sync_enqueue_pkts;
+
+		if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled)
+			vdev_queue_ops[vid].dequeue_pkt_burst = async_dequeue_pkts;
+		else
+			vdev_queue_ops[vid].dequeue_pkt_burst = sync_dequeue_pkts;
+	}
+
+	return 0;
+}
+
+static inline int
+vhost_async_channel_register(int vid)
+{
+	int rx_ret = 0, tx_ret = 0;
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
+		rx_ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
+		if (rx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_RXQ].async_enabled = true;
+	}
+
+	if (dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].dev_id != INVALID_DMA_ID) {
+		tx_ret = rte_vhost_async_channel_register(vid, VIRTIO_TXQ);
+		if (tx_ret == 0)
+			dma_bind[vid2socketid[vid]].dmas[VIRTIO_TXQ].async_enabled = true;
+	}
+
+	return rx_ret | tx_ret;
+}
+
+
+
 /*
  * A new device is added to a data core. First the device is added to the main linked list
  * and then allocated to a specific data core.
@@ -1567,6 +1680,8 @@ new_device(int vid)
 	uint16_t i;
 	uint32_t device_num_min = num_devices;
 	struct vhost_dev *vdev;
+	int ret;
+
 	vdev = rte_zmalloc("vhost device", sizeof(*vdev), RTE_CACHE_LINE_SIZE);
 	if (vdev == NULL) {
 		RTE_LOG(INFO, VHOST_DATA,
@@ -1589,6 +1704,17 @@ new_device(int vid)
 		}
 	}
 
+	int socketid = get_socketid_by_vid(vid);
+	if (socketid == -1)
+		return -1;
+
+	init_vid2socketid_array(vid, socketid);
+
+	ret =  vhost_async_channel_register(vid);
+
+	if (init_vhost_queue_ops(vid) != 0)
+		return -1;
+
 	if (builtin_net_driver)
 		vs_vhost_net_setup(vdev);
 
@@ -1620,16 +1746,7 @@ new_device(int vid)
 		"(%d) device has been added to data core %d\n",
 		vid, vdev->coreid);
 
-	if (dma_bind[vid].dmas[VIRTIO_RXQ].dev_id != INVALID_DMA_ID) {
-		int ret;
-
-		ret = rte_vhost_async_channel_register(vid, VIRTIO_RXQ);
-		if (ret == 0)
-			dma_bind[vid].dmas[VIRTIO_RXQ].async_enabled = true;
-		return ret;
-	}
-
-	return 0;
+	return ret;
 }
 
 static int
@@ -1647,22 +1764,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
 	if (queue_id != VIRTIO_RXQ)
 		return 0;
 
-	if (dma_bind[vid].dmas[queue_id].async_enabled) {
-		if (!enable) {
-			uint16_t n_pkt = 0;
-			int pkts_inflight;
-			pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id);
-			int16_t dma_id = dma_bind[vid].dmas[VIRTIO_RXQ].dev_id;
-			struct rte_mbuf *m_cpl[pkts_inflight];
-
-			while (pkts_inflight) {
-				n_pkt = rte_vhost_clear_queue_thread_unsafe(vid, queue_id,
-							m_cpl, pkts_inflight, dma_id, 0);
-				free_pkts(m_cpl, n_pkt);
-				pkts_inflight = rte_vhost_async_get_inflight_thread_unsafe(vid,
-											queue_id);
-			}
-		}
+	if (dma_bind[vid2socketid[vid]].dmas[queue_id].async_enabled) {
+		if (!enable)
+			vhost_clear_queue_thread_unsafe(vdev, queue_id);
 	}
 
 	return 0;
@@ -1887,7 +1991,7 @@ main(int argc, char *argv[])
 	for (i = 0; i < nb_sockets; i++) {
 		char *file = socket_files + i * PATH_MAX;
 
-		if (dma_count)
+		if (dma_count && get_async_flag_by_socketid(i) != 0)
 			flags = flags | RTE_VHOST_USER_ASYNC_COPY;
 
 		ret = rte_vhost_driver_register(file, flags);
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index e7f395c3c9..2fcb8376c5 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -61,6 +61,19 @@ struct vhost_dev {
 	struct vhost_queue queues[MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;
 
+typedef uint16_t (*vhost_enqueue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mbuf **pkts,
+			uint32_t count);
+
+typedef uint16_t (*vhost_dequeue_burst_t)(struct vhost_dev *dev,
+			uint16_t queue_id, struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+
+struct vhost_queue_ops {
+	vhost_enqueue_burst_t enqueue_pkt_burst;
+	vhost_dequeue_burst_t dequeue_pkt_burst;
+};
+
 TAILQ_HEAD(vhost_dev_tailq_list, vhost_dev);
 
 
@@ -87,6 +100,7 @@ struct dma_info {
 
 struct dma_for_vhost {
 	struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
+	uint32_t async_flag;
 };
 
 /* we implement non-extra virtio net features */
@@ -97,7 +111,19 @@ void vs_vhost_net_remove(struct vhost_dev *dev);
 uint16_t vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			 struct rte_mbuf **pkts, uint32_t count);
 
-uint16_t vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
-			 struct rte_mempool *mbuf_pool,
-			 struct rte_mbuf **pkts, uint16_t count);
+uint16_t builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mbuf **pkts, uint32_t count);
+uint16_t builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t sync_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
+uint16_t async_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			 struct rte_mbuf **pkts, uint32_t count);
+uint16_t async_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+			struct rte_mempool *mbuf_pool,
+			struct rte_mbuf **pkts, uint16_t count);
 #endif /* _MAIN_H_ */
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 9064fc3a82..2432a96566 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -238,6 +238,13 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	return count;
 }
 
+uint16_t
+builtin_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+		struct rte_mbuf **pkts, uint32_t count)
+{
+	return vs_enqueue_pkts(dev, queue_id, pkts, count);
+}
+
 static __rte_always_inline int
 dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	    struct rte_mbuf *m, uint16_t desc_idx,
@@ -363,7 +370,7 @@ dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
 	return 0;
 }
 
-uint16_t
+static uint16_t
 vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
 {
@@ -440,3 +447,10 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	return i;
 }
+
+uint16_t
+builtin_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
+{
+	return vs_dequeue_pkts(dev, queue_id, mbuf_pool, pkts, count);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v8 0/5] vhost: support async dequeue data path
  2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
                     ` (4 preceding siblings ...)
  2022-05-16 11:10   ` [PATCH v8 5/5] examples/vhost: support async dequeue data path xuan.ding
@ 2022-05-17 13:22   ` Maxime Coquelin
  5 siblings, 0 replies; 73+ messages in thread
From: Maxime Coquelin @ 2022-05-17 13:22 UTC (permalink / raw)
  To: xuan.ding, chenbo.xia; +Cc: dev, jiayu.hu, cheng1.jiang, sunil.pai.g, liangma



On 5/16/22 13:10, xuan.ding@intel.com wrote:
> From: Xuan Ding <xuan.ding@intel.com>
> 
> The presence of asynchronous path allows applications to offload memory
> copies to DMA engine, so as to save CPU cycles and improve the copy
> performance. This patch set implements vhost async dequeue data path
> for split ring. The code is based on latest enqueue changes [1].
> 
> This patch set is a new design and implementation of [2]. Since dmadev
> was introduced in DPDK 21.11, to simplify application logics, this patch
> integrates dmadev in vhost. With dmadev integrated, vhost supports M:N
> mapping between vrings and DMA virtual channels. Specifically, one vring
> can use multiple different DMA channels and one DMA channel can be
> shared by multiple vrings at the same time.
> 
> A new asynchronous dequeue function is introduced:
>          1) rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
>                  struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
>                  uint16_t count, int *nr_inflight,
>                  uint16_t dma_id, uint16_t vchan_id)
> 
>          Receive packets from the guest and offloads copies to DMA
> virtual channel.
> 
> [1] https://mails.dpdk.org/archives/dev/2022-February/234555.html
> [2] https://mails.dpdk.org/archives/dev/2021-September/218591.html
> 
> v7->v8:
> * change dma_id to int16_t
> * refine API documentation
> 
> v6->v7:
> * correct code formatting
> * change some functions to inline
> 
> v5->v6:
> * adjust EXPERIMENTAL header
> 
> v4->v5:
> * rebase to latest DPDK
> * add some checks
> 
> v3->v4:
> * fix CI build warnings
> * adjust some indentation
> * pass vq instead of queue_id
> 
> v2->v3:
> * fix mbuf not updated correctly for large packets
> 
> v1->v2:
> * fix a typo
> * fix a bug in desc_to_mbuf filling
> 
> RFC v3 -> v1:
> * add sync and async path descriptor to mbuf refactoring
> * add API description in docs
> 
> RFC v2 -> RFC v3:
> * rebase to latest DPDK version
> 
> RFC v1 -> RFC v2:
> * fix one bug in example
> * rename vchan to vchan_id
> * check if dma_id and vchan_id valid
> * rework all the logs to new standard
> 
> Xuan Ding (5):
>    vhost: prepare sync for descriptor to mbuf refactoring
>    vhost: prepare async for descriptor to mbuf refactoring
>    vhost: merge sync and async descriptor to mbuf filling
>    vhost: support async dequeue for split ring
>    examples/vhost: support async dequeue data path
> 
>   doc/guides/prog_guide/vhost_lib.rst    |   6 +
>   doc/guides/rel_notes/release_22_07.rst |   5 +
>   doc/guides/sample_app_ug/vhost.rst     |   9 +-
>   examples/vhost/main.c                  | 286 ++++++++++-----
>   examples/vhost/main.h                  |  32 +-
>   examples/vhost/virtio_net.c            |  16 +-
>   lib/vhost/rte_vhost_async.h            |  37 ++
>   lib/vhost/version.map                  |   2 +-
>   lib/vhost/vhost.h                      |   1 +
>   lib/vhost/virtio_net.c                 | 473 ++++++++++++++++++++++---
>   10 files changed, 715 insertions(+), 152 deletions(-)
> 

Applied to dpdk-next-virtio/main.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v8 4/5] vhost: support async dequeue for split ring
  2022-05-16 11:10   ` [PATCH v8 4/5] vhost: support async dequeue for split ring xuan.ding
@ 2022-06-16 14:38     ` David Marchand
  2022-06-16 14:40       ` David Marchand
  0 siblings, 1 reply; 73+ messages in thread
From: David Marchand @ 2022-06-16 14:38 UTC (permalink / raw)
  To: Xuan Ding
  Cc: Maxime Coquelin, Xia, Chenbo, dev, Jiayu Hu, Cheng Jiang,
	Sunil Pai G, liangma, Yuan Wang, Mcnamara, John

On Mon, May 16, 2022 at 1:16 PM <xuan.ding@intel.com> wrote:
> +static __rte_always_inline uint16_t
> +virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> +               struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> +               int16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
> +{
> +       static bool allocerr_warned;
> +       bool dropped = false;
> +       uint16_t free_entries;
> +       uint16_t pkt_idx, slot_idx = 0;
> +       uint16_t nr_done_pkts = 0;
> +       uint16_t pkt_err = 0;
> +       uint16_t n_xfer;
> +       struct vhost_async *async = vq->async;
> +       struct async_inflight_info *pkts_info = async->pkts_info;
> +       struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];

Why do we need this array?
Plus, see blow.

> +       uint16_t pkts_size = count;
> +
> +       /**
> +        * The ordering between avail index and
> +        * desc reads needs to be enforced.
> +        */
> +       free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
> +                       vq->last_avail_idx;
> +       if (free_entries == 0)
> +               goto out;
> +
> +       rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
> +
> +       async_iter_reset(async);
> +
> +       count = RTE_MIN(count, MAX_PKT_BURST);
> +       count = RTE_MIN(count, free_entries);
> +       VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
> +                       dev->ifname, count);
> +
> +       if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))

'count' is provided by the user of the vhost async dequeue public API.
There is no check that it is not bigger than MAX_PKT_BURST.

Calling rte_pktmbuf_alloc_bulk on a fixed-size array pkts_prealloc,
allocated on the stack, it may cause a stack overflow.



This code is mostly copy/pasted from the "sync" code.
I see a fix on the stats has been sent.
I point here another bug.
There are probably more...

<grmbl>
I don't like how async code has been added in the vhost library by Intel.

Maxime did a cleanup on the enqueue patch
https://patchwork.dpdk.org/project/dpdk/list/?series=20020&state=%2A&archive=both.
I see that the recent dequeue path additions have the same method of
copying/pasting code and adding some branches in a non systematic way.
Please clean this code and stop copy/pasting without a valid reason.
</grmbl>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v8 4/5] vhost: support async dequeue for split ring
  2022-06-16 14:38     ` David Marchand
@ 2022-06-16 14:40       ` David Marchand
  2022-06-17  6:34         ` Ding, Xuan
  0 siblings, 1 reply; 73+ messages in thread
From: David Marchand @ 2022-06-16 14:40 UTC (permalink / raw)
  To: Xuan Ding
  Cc: Maxime Coquelin, Xia, Chenbo, dev, Jiayu Hu, Cheng Jiang,
	Sunil Pai G, liangma, Yuan Wang, Mcnamara, John

On Thu, Jun 16, 2022 at 4:38 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Mon, May 16, 2022 at 1:16 PM <xuan.ding@intel.com> wrote:
> > +static __rte_always_inline uint16_t
> > +virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > +               struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
> > +               int16_t dma_id, uint16_t vchan_id, bool legacy_ol_flags)
> > +{
> > +       static bool allocerr_warned;
> > +       bool dropped = false;
> > +       uint16_t free_entries;
> > +       uint16_t pkt_idx, slot_idx = 0;
> > +       uint16_t nr_done_pkts = 0;
> > +       uint16_t pkt_err = 0;
> > +       uint16_t n_xfer;
> > +       struct vhost_async *async = vq->async;
> > +       struct async_inflight_info *pkts_info = async->pkts_info;
> > +       struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
>
> Why do we need this array?
> Plus, see blow.
>
> > +       uint16_t pkts_size = count;
> > +
> > +       /**
> > +        * The ordering between avail index and
> > +        * desc reads needs to be enforced.
> > +        */
> > +       free_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
> > +                       vq->last_avail_idx;
> > +       if (free_entries == 0)
> > +               goto out;
> > +
> > +       rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
> > +
> > +       async_iter_reset(async);
> > +
> > +       count = RTE_MIN(count, MAX_PKT_BURST);

^^^
Ok, my point about the overflow does not stand.
Just the pkts_prealloc array is probably useless.

> > +       count = RTE_MIN(count, free_entries);
> > +       VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
> > +                       dev->ifname, count);
> > +
> > +       if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
>
> 'count' is provided by the user of the vhost async dequeue public API.
> There is no check that it is not bigger than MAX_PKT_BURST.
>
> Calling rte_pktmbuf_alloc_bulk on a fixed-size array pkts_prealloc,
> allocated on the stack, it may cause a stack overflow.

The rest still stands for me.
vvv

>
>
>
> This code is mostly copy/pasted from the "sync" code.
> I see a fix on the stats has been sent.
> I point here another bug.
> There are probably more...
>
> <grmbl>
> I don't like how async code has been added in the vhost library by Intel.
>
> Maxime did a cleanup on the enqueue patch
> https://patchwork.dpdk.org/project/dpdk/list/?series=20020&state=%2A&archive=both.
> I see that the recent dequeue path additions have the same method of
> copying/pasting code and adding some branches in a non systematic way.
> Please clean this code and stop copy/pasting without a valid reason.
> </grmbl>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v8 4/5] vhost: support async dequeue for split ring
  2022-06-16 14:40       ` David Marchand
@ 2022-06-17  6:34         ` Ding, Xuan
  0 siblings, 0 replies; 73+ messages in thread
From: Ding, Xuan @ 2022-06-17  6:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Maxime Coquelin, Xia, Chenbo, dev, Hu, Jiayu, Jiang, Cheng1,
	Pai G, Sunil, liangma, Wang, YuanX, Mcnamara, John

Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, June 16, 2022 10:40 PM
> To: Ding, Xuan <xuan.ding@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo
> <chenbo.xia@intel.com>; dev <dev@dpdk.org>; Hu, Jiayu
> <jiayu.hu@intel.com>; Jiang, Cheng1 <cheng1.jiang@intel.com>; Pai G, Sunil
> <sunil.pai.g@intel.com>; liangma@liangbit.com; Wang, YuanX
> <yuanx.wang@intel.com>; Mcnamara, John <john.mcnamara@intel.com>
> Subject: Re: [PATCH v8 4/5] vhost: support async dequeue for split ring
> 
> On Thu, Jun 16, 2022 at 4:38 PM David Marchand
> <david.marchand@redhat.com> wrote:
> >
> > On Mon, May 16, 2022 at 1:16 PM <xuan.ding@intel.com> wrote:
> > > +static __rte_always_inline uint16_t
> > > +virtio_dev_tx_async_split(struct virtio_net *dev, struct vhost_virtqueue
> *vq,
> > > +               struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
> uint16_t count,
> > > +               int16_t dma_id, uint16_t vchan_id, bool
> > > +legacy_ol_flags) {
> > > +       static bool allocerr_warned;
> > > +       bool dropped = false;
> > > +       uint16_t free_entries;
> > > +       uint16_t pkt_idx, slot_idx = 0;
> > > +       uint16_t nr_done_pkts = 0;
> > > +       uint16_t pkt_err = 0;
> > > +       uint16_t n_xfer;
> > > +       struct vhost_async *async = vq->async;
> > > +       struct async_inflight_info *pkts_info = async->pkts_info;
> > > +       struct rte_mbuf *pkts_prealloc[MAX_PKT_BURST];
> >
> > Why do we need this array?
> > Plus, see blow.
> >
> > > +       uint16_t pkts_size = count;
> > > +
> > > +       /**
> > > +        * The ordering between avail index and
> > > +        * desc reads needs to be enforced.
> > > +        */
> > > +       free_entries = __atomic_load_n(&vq->avail->idx,
> __ATOMIC_ACQUIRE) -
> > > +                       vq->last_avail_idx;
> > > +       if (free_entries == 0)
> > > +               goto out;
> > > +
> > > +       rte_prefetch0(&vq->avail->ring[vq->last_avail_idx &
> > > + (vq->size - 1)]);
> > > +
> > > +       async_iter_reset(async);
> > > +
> > > +       count = RTE_MIN(count, MAX_PKT_BURST);
> 
> ^^^
> Ok, my point about the overflow does not stand.
> Just the pkts_prealloc array is probably useless.

Allocating a bulk of mbufs by rte_pktmbuf_alloc_bulk() is for performance consideration.
The pkts_prealloc array is for keeping a temporary variable to orderly update async_inflight_info,
which is required in async path.

> 
> > > +       count = RTE_MIN(count, free_entries);
> > > +       VHOST_LOG_DATA(DEBUG, "(%s) about to dequeue %u buffers\n",
> > > +                       dev->ifname, count);
> > > +
> > > +       if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts_prealloc, count))
> >
> > 'count' is provided by the user of the vhost async dequeue public API.
> > There is no check that it is not bigger than MAX_PKT_BURST.
> >
> > Calling rte_pktmbuf_alloc_bulk on a fixed-size array pkts_prealloc,
> > allocated on the stack, it may cause a stack overflow.
> 
> The rest still stands for me.
> vvv
> 
> >
> >
> >
> > This code is mostly copy/pasted from the "sync" code.
> > I see a fix on the stats has been sent.

I need to explain the fix here.

The async dequeue patches and stats patches were both sent out
and merged in this release, while stats patch requires changes in both sync/async enq
and deq. Sorry for not notice this change in the stats patch that was merged first, and
that is the reason for this bug.

> > I point here another bug.
> > There are probably more...
> >
> > <grmbl>
> > I don't like how async code has been added in the vhost library by Intel.
> >
> > Maxime did a cleanup on the enqueue patch
> >
> https://patchwork.dpdk.org/project/dpdk/list/?series=20020&state=%2A&ar
> chive=both.
> > I see that the recent dequeue path additions have the same method of
> > copying/pasting code and adding some branches in a non systematic way.
> > Please clean this code and stop copy/pasting without a valid reason.
> > </grmbl>

The cleanup for the code in dequeue patch was suggested in RFC v3.

https://patchwork.dpdk.org/project/dpdk/patch/20220310065407.17145-2-xuan.ding@intel.com/

With merging copy_desc_to_mbuf and async_desc_to_mbuf in a single function, and reuse the
fill_seg function, the code can be simplified without performance degradation.

Could you help to point out where in the current code that needs to be further cleaned?
Are you referring to the common parts of async deq API and sync deq API
can be abstracted into a function, such as ARAP?

https://patchwork.dpdk.org/project/dpdk/patch/20220516111041.63914-5-xuan.ding@intel.com/

There are code duplications between async deq API and sync deq API, these parts are both needed.
But the code clean of dequeue is by no means mindless copy/pasting.

Hope to get your insights.

Thanks,
Xuan

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2022-06-17  6:34 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 15:25 [PATCH v1 0/5] vhost: support async dequeue data path xuan.ding
2022-04-07 15:25 ` [PATCH v1 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-04-07 15:25 ` [PATCH v1 2/5] vhost: prepare async " xuan.ding
2022-04-07 15:25 ` [PATCH v1 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-04-07 15:25 ` [PATCH v1 4/5] vhost: support async dequeue for split ring xuan.ding
2022-04-07 15:25 ` [PATCH v1 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-04-11 10:00 ` [PATCH v2 0/5] vhost: " xuan.ding
2022-04-11 10:00   ` [PATCH v2 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-04-11 10:00   ` [PATCH v2 2/5] vhost: prepare async " xuan.ding
2022-04-11 10:00   ` [PATCH v2 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-04-11 10:00   ` [PATCH v2 4/5] vhost: support async dequeue for split ring xuan.ding
2022-04-11 10:00   ` [PATCH v2 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-04-19  3:43 ` [PATCH v3 0/5] vhost: " xuan.ding
2022-04-19  3:43   ` [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-04-22 15:30     ` Maxime Coquelin
2022-04-19  3:43   ` [PATCH v3 2/5] vhost: prepare async " xuan.ding
2022-04-22 15:32     ` Maxime Coquelin
2022-04-19  3:43   ` [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-04-22 11:06     ` David Marchand
2022-04-22 15:46       ` Maxime Coquelin
2022-04-24  2:02         ` Ding, Xuan
2022-04-22 15:43     ` Maxime Coquelin
2022-04-19  3:43   ` [PATCH v3 4/5] vhost: support async dequeue for split ring xuan.ding
2022-04-19  3:43   ` [PATCH v3 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-05  6:23 ` [PATCH v4 0/5] vhost: " xuan.ding
2022-05-05  6:23   ` [PATCH v4 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-05-05  7:37     ` Yang, YvonneX
2022-05-05  6:23   ` [PATCH v4 2/5] vhost: prepare async " xuan.ding
2022-05-05  7:38     ` Yang, YvonneX
2022-05-05  6:23   ` [PATCH v4 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-05-05  7:39     ` Yang, YvonneX
2022-05-05  6:23   ` [PATCH v4 4/5] vhost: support async dequeue for split ring xuan.ding
2022-05-05  7:40     ` Yang, YvonneX
2022-05-05 19:36     ` Maxime Coquelin
2022-05-05  6:23   ` [PATCH v4 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-05  7:39     ` Yang, YvonneX
2022-05-05 19:38     ` Maxime Coquelin
2022-05-05 19:52   ` [PATCH v4 0/5] vhost: " Maxime Coquelin
2022-05-06  1:49     ` Ding, Xuan
2022-05-13  2:00 ` [PATCH v5 " xuan.ding
2022-05-13  2:00   ` [PATCH v5 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-05-13  2:00   ` [PATCH v5 2/5] vhost: prepare async " xuan.ding
2022-05-13  2:00   ` [PATCH v5 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-05-13  2:00   ` [PATCH v5 4/5] vhost: support async dequeue for split ring xuan.ding
2022-05-13  2:24     ` Stephen Hemminger
2022-05-13  2:33       ` Ding, Xuan
2022-05-13  2:00   ` [PATCH v5 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-13  2:50 ` [PATCH v6 0/5] vhost: " xuan.ding
2022-05-13  2:50   ` [PATCH v6 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-05-13  2:50   ` [PATCH v6 2/5] vhost: prepare async " xuan.ding
2022-05-13  2:50   ` [PATCH v6 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-05-13  2:50   ` [PATCH v6 4/5] vhost: support async dequeue for split ring xuan.ding
2022-05-13  2:50   ` [PATCH v6 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-13  3:27     ` Xia, Chenbo
2022-05-13  3:51       ` Ding, Xuan
2022-05-16  2:43 ` [PATCH v7 0/5] vhost: " xuan.ding
2022-05-16  2:43   ` [PATCH v7 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-05-16  2:43   ` [PATCH v7 2/5] vhost: prepare async " xuan.ding
2022-05-16  2:43   ` [PATCH v7 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-05-16  2:43   ` [PATCH v7 4/5] vhost: support async dequeue for split ring xuan.ding
2022-05-16  5:52     ` Hu, Jiayu
2022-05-16  6:10       ` Ding, Xuan
2022-05-16  2:43   ` [PATCH v7 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-16 11:10 ` [PATCH v8 0/5] vhost: " xuan.ding
2022-05-16 11:10   ` [PATCH v8 1/5] vhost: prepare sync for descriptor to mbuf refactoring xuan.ding
2022-05-16 11:10   ` [PATCH v8 2/5] vhost: prepare async " xuan.ding
2022-05-16 11:10   ` [PATCH v8 3/5] vhost: merge sync and async descriptor to mbuf filling xuan.ding
2022-05-16 11:10   ` [PATCH v8 4/5] vhost: support async dequeue for split ring xuan.ding
2022-06-16 14:38     ` David Marchand
2022-06-16 14:40       ` David Marchand
2022-06-17  6:34         ` Ding, Xuan
2022-05-16 11:10   ` [PATCH v8 5/5] examples/vhost: support async dequeue data path xuan.ding
2022-05-17 13:22   ` [PATCH v8 0/5] vhost: " Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).