From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0F51BA0352; Tue, 8 Feb 2022 18:47:07 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D5BD441140; Tue, 8 Feb 2022 18:47:06 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 8B38D4111B for ; Tue, 8 Feb 2022 18:47:05 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644342424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mfc4pqsFDBe6rX557zoLY5MoaKzTAGVOb3pumfdHOH4=; b=NXWUct++j0arzil+af8d4JHWP3Kd2GMxniitrEkA7EaiIztM6r6wrluKM3SAQM+EP6G4vA Q52YHdha02pwpLI8I6pXYxCJsimg4E2eyLtgcPso4p/y7X/Bv1w8Mi0GySX4nvV54EXVa+ GFpKv3sjd9m8Ax+kvnxKo4Cs38gJ1qE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-548-gFUnRhtWNcqFrrQU3XRO4g-1; Tue, 08 Feb 2022 12:46:15 -0500 X-MC-Unique: gFUnRhtWNcqFrrQU3XRO4g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 89F9E80363B; Tue, 8 Feb 2022 17:46:13 +0000 (UTC) Received: from [10.39.208.19] (unknown [10.39.208.19]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 96215510FA; Tue, 8 Feb 2022 17:46:09 +0000 (UTC) Message-ID: Date: Tue, 8 Feb 2022 18:46:07 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 To: Jiayu Hu , dev@dpdk.org Cc: i.maximets@ovn.org, chenbo.xia@intel.com, xuan.ding@intel.com, cheng1.jiang@intel.com, liangma@liangbit.com, Sunil Pai G References: <20220124164011.1402593-2-jiayu.hu@intel.com> <20220208104031.1885640-1-jiayu.hu@intel.com> <20220208104031.1885640-2-jiayu.hu@intel.com> From: Maxime Coquelin Subject: Re: [PATCH v3 1/1] vhost: integrate dmadev in asynchronous data-path In-Reply-To: <20220208104031.1885640-2-jiayu.hu@intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Jiayu, On 2/8/22 11:40, Jiayu Hu wrote: > Since dmadev is introduced in 21.11, to avoid the overhead of vhost DMA > abstraction layer and simplify application logics, this patch integrates > dmadev in asynchronous data path. > > Signed-off-by: Jiayu Hu > Signed-off-by: Sunil Pai G > --- > doc/guides/prog_guide/vhost_lib.rst | 97 +++++----- > examples/vhost/Makefile | 2 +- > examples/vhost/ioat.c | 218 ---------------------- > examples/vhost/ioat.h | 63 ------- > examples/vhost/main.c | 252 +++++++++++++++++++++----- > examples/vhost/main.h | 11 ++ > examples/vhost/meson.build | 6 +- > lib/vhost/meson.build | 2 +- > lib/vhost/rte_vhost.h | 2 + > lib/vhost/rte_vhost_async.h | 145 ++++----------- > lib/vhost/version.map | 3 + > lib/vhost/vhost.c | 122 +++++++++---- > lib/vhost/vhost.h | 85 ++++++++- > lib/vhost/virtio_net.c | 271 +++++++++++++++++++++++----- > 14 files changed, 689 insertions(+), 590 deletions(-) > delete mode 100644 examples/vhost/ioat.c > delete mode 100644 examples/vhost/ioat.h Thanks for the quick follow-up patch. I notice checkpatch is complaining for a few too long lines, I can fix them by myself if the rest is good, otherwise remember to run checkpatch for the next iteration. > --- a/lib/vhost/vhost.c > +++ b/lib/vhost/vhost.c > @@ -25,7 +25,7 @@ > #include "vhost.h" > #include "vhost_user.h" > > -struct virtio_net *vhost_devices[MAX_VHOST_DEVICE]; > +struct virtio_net *vhost_devices[RTE_MAX_VHOST_DEVICE]; > pthread_mutex_t vhost_dev_lock = PTHREAD_MUTEX_INITIALIZER; > > /* Called with iotlb_lock read-locked */ > @@ -343,6 +343,7 @@ vhost_free_async_mem(struct vhost_virtqueue *vq) > return; > > rte_free(vq->async->pkts_info); > + rte_free(vq->async->pkts_cmpl_flag); > > rte_free(vq->async->buffers_packed); > vq->async->buffers_packed = NULL; > @@ -665,12 +666,12 @@ vhost_new_device(void) > int i; > > pthread_mutex_lock(&vhost_dev_lock); > - for (i = 0; i < MAX_VHOST_DEVICE; i++) { > + for (i = 0; i < RTE_MAX_VHOST_DEVICE; i++) { > if (vhost_devices[i] == NULL) > break; > } > > - if (i == MAX_VHOST_DEVICE) { > + if (i == RTE_MAX_VHOST_DEVICE) { > VHOST_LOG_CONFIG(ERR, "failed to find a free slot for new device.\n"); > pthread_mutex_unlock(&vhost_dev_lock); > return -1; > @@ -1621,8 +1622,7 @@ rte_vhost_extern_callback_register(int vid, > } > > static __rte_always_inline int > -async_channel_register(int vid, uint16_t queue_id, > - struct rte_vhost_async_channel_ops *ops) > +async_channel_register(int vid, uint16_t queue_id) > { > struct virtio_net *dev = get_device(vid); > struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; > @@ -1651,6 +1651,14 @@ async_channel_register(int vid, uint16_t queue_id, > goto out_free_async; > } > > + async->pkts_cmpl_flag = rte_zmalloc_socket(NULL, vq->size * sizeof(bool), RTE_CACHE_LINE_SIZE, > + node); > + if (!async->pkts_cmpl_flag) { > + VHOST_LOG_CONFIG(ERR, "(%s) failed to allocate async pkts_cmpl_flag (qid: %d)\n", > + dev->ifname, queue_id); > + goto out_free_async; > + } > + > if (vq_is_packed(dev)) { > async->buffers_packed = rte_malloc_socket(NULL, > vq->size * sizeof(struct vring_used_elem_packed), > @@ -1671,9 +1679,6 @@ async_channel_register(int vid, uint16_t queue_id, > } > } > > - async->ops.check_completed_copies = ops->check_completed_copies; > - async->ops.transfer_data = ops->transfer_data; > - > vq->async = async; > > return 0; > @@ -1686,15 +1691,13 @@ async_channel_register(int vid, uint16_t queue_id, > } > > int > -rte_vhost_async_channel_register(int vid, uint16_t queue_id, > - struct rte_vhost_async_config config, > - struct rte_vhost_async_channel_ops *ops) > +rte_vhost_async_channel_register(int vid, uint16_t queue_id) > { > struct vhost_virtqueue *vq; > struct virtio_net *dev = get_device(vid); > int ret; > > - if (dev == NULL || ops == NULL) > + if (dev == NULL) > return -1; > > if (queue_id >= VHOST_MAX_VRING) > @@ -1705,33 +1708,20 @@ rte_vhost_async_channel_register(int vid, uint16_t queue_id, > if (unlikely(vq == NULL || !dev->async_copy)) > return -1; > > - if (unlikely(!(config.features & RTE_VHOST_ASYNC_INORDER))) { > - VHOST_LOG_CONFIG(ERR, > - "(%s) async copy is not supported on non-inorder mode (qid: %d)\n", > - dev->ifname, queue_id); > - return -1; > - } > - > - if (unlikely(ops->check_completed_copies == NULL || > - ops->transfer_data == NULL)) > - return -1; > - > rte_spinlock_lock(&vq->access_lock); > - ret = async_channel_register(vid, queue_id, ops); > + ret = async_channel_register(vid, queue_id); > rte_spinlock_unlock(&vq->access_lock); > > return ret; > } > > int > -rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id, > - struct rte_vhost_async_config config, > - struct rte_vhost_async_channel_ops *ops) > +rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id) > { > struct vhost_virtqueue *vq; > struct virtio_net *dev = get_device(vid); > > - if (dev == NULL || ops == NULL) > + if (dev == NULL) > return -1; > > if (queue_id >= VHOST_MAX_VRING) > @@ -1742,18 +1732,7 @@ rte_vhost_async_channel_register_thread_unsafe(int vid, uint16_t queue_id, > if (unlikely(vq == NULL || !dev->async_copy)) > return -1; > > - if (unlikely(!(config.features & RTE_VHOST_ASYNC_INORDER))) { > - VHOST_LOG_CONFIG(ERR, > - "(%s) async copy is not supported on non-inorder mode (qid: %d)\n", > - dev->ifname, queue_id); > - return -1; > - } > - > - if (unlikely(ops->check_completed_copies == NULL || > - ops->transfer_data == NULL)) > - return -1; > - > - return async_channel_register(vid, queue_id, ops); > + return async_channel_register(vid, queue_id); > } > > int > @@ -1832,6 +1811,69 @@ rte_vhost_async_channel_unregister_thread_unsafe(int vid, uint16_t queue_id) > return 0; > } > > +int > +rte_vhost_async_dma_configure(int16_t dma_id, uint16_t vchan_id) > +{ > + struct rte_dma_info info; > + void *pkts_cmpl_flag_addr; > + uint16_t max_desc; > + > + if (!rte_dma_is_valid(dma_id)) { > + VHOST_LOG_CONFIG(ERR, "DMA %d is not found. Cannot use it in vhost\n", dma_id); > + return -1; > + } > + > + rte_dma_info_get(dma_id, &info); > + if (vchan_id >= info.max_vchans) { > + VHOST_LOG_CONFIG(ERR, "Invalid vChannel ID. Cannot use DMA %d vChannel %u for " > + "vhost\n", dma_id, vchan_id); > + return -1; > + } > + > + if (!dma_copy_track[dma_id].vchans) { > + struct async_dma_vchan_info *vchans; > + > + vchans = rte_zmalloc(NULL, sizeof(struct async_dma_vchan_info) * info.max_vchans, > + RTE_CACHE_LINE_SIZE); > + if (vchans == NULL) { > + VHOST_LOG_CONFIG(ERR, "Failed to allocate vchans, Cannot use DMA %d " > + "vChannel %u for vhost.\n", dma_id, vchan_id); Please remove the "cannot use in Vhost" here and above and below. The messages are already prefixed with VHOST_CONFIG, so this is redundant. > + return -1; > + } > + > + dma_copy_track[dma_id].vchans = vchans; > + } > + > + if (dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr) { > + VHOST_LOG_CONFIG(INFO, "DMA %d vChannel %u has registered in vhost. Ignore\n", "DMA %d vChannel %u already registered.\n" > + dma_id, vchan_id); > + return 0; > + } > + > + max_desc = info.max_desc; > + if (!rte_is_power_of_2(max_desc)) > + max_desc = rte_align32pow2(max_desc); > + > + pkts_cmpl_flag_addr = rte_zmalloc(NULL, sizeof(bool *) * max_desc, RTE_CACHE_LINE_SIZE); > + if (!pkts_cmpl_flag_addr) { > + VHOST_LOG_CONFIG(ERR, "Failed to allocate pkts_cmpl_flag_addr for DMA %d " > + "vChannel %u. Cannot use it for vhost\n", dma_id, vchan_id); > + > + if (dma_copy_track[dma_id].nr_vchans == 0) { > + rte_free(dma_copy_track[dma_id].vchans); > + dma_copy_track[dma_id].vchans = NULL; > + } > + return -1; > + } > + > + dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr = pkts_cmpl_flag_addr; > + dma_copy_track[dma_id].vchans[vchan_id].ring_size = max_desc; > + dma_copy_track[dma_id].vchans[vchan_id].ring_mask = max_desc - 1; > + dma_copy_track[dma_id].nr_vchans++; > + > + return 0; > +} > + > int > rte_vhost_async_get_inflight(int vid, uint16_t queue_id) > { > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > index b3f0c1d07c..1c2ee29600 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include "rte_vhost.h" > #include "rte_vdpa.h" > @@ -50,6 +51,9 @@ > > #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST) > #define VHOST_MAX_ASYNC_VEC 2048 > +#define VIRTIO_MAX_RX_PKTLEN 9728U > +#define VHOST_DMA_MAX_COPY_COMPLETE ((VIRTIO_MAX_RX_PKTLEN / RTE_MBUF_DEFAULT_DATAROOM) \ > + * MAX_PKT_BURST) > > #define PACKED_DESC_ENQUEUE_USED_FLAG(w) \ > ((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \ > @@ -119,6 +123,58 @@ struct vring_used_elem_packed { > uint32_t count; > }; > > +/** > + * iovec > + */ > +struct vhost_iovec { > + void *src_addr; > + void *dst_addr; > + size_t len; > +}; > + > +/** > + * iovec iterator > + */ > +struct vhost_iov_iter { > + /** pointer to the iovec array */ > + struct vhost_iovec *iov; > + /** number of iovec in this iterator */ > + unsigned long nr_segs; > +}; > + > +struct async_dma_vchan_info { > + /* circular array to track if packet copy completes */ > + bool **pkts_cmpl_flag_addr; > + > + /* max elements in 'pkts_cmpl_flag_addr' */ > + uint16_t ring_size; > + /* ring index mask for 'pkts_cmpl_flag_addr' */ > + uint16_t ring_mask; > + > + /** > + * DMA virtual channel lock. Although it is able to bind DMA > + * virtual channels to data plane threads, vhost control plane > + * thread could call data plane functions too, thus causing > + * DMA device contention. > + * > + * For example, in VM exit case, vhost control plane thread needs > + * to clear in-flight packets before disable vring, but there could > + * be anotther data plane thread is enqueuing packets to the same > + * vring with the same DMA virtual channel. As dmadev PMD functions > + * are lock-free, the control plane and data plane threads could > + * operate the same DMA virtual channel at the same time. > + */ > + rte_spinlock_t dma_lock; > +}; > + > +struct async_dma_info { > + struct async_dma_vchan_info *vchans; > + /* number of registered virtual channels */ > + uint16_t nr_vchans; > +}; > + > +extern struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX]; > + > /** > * inflight async packet information > */ > @@ -129,16 +185,32 @@ struct async_inflight_info { > }; > > struct vhost_async { > - /* operation callbacks for DMA */ > - struct rte_vhost_async_channel_ops ops; > - > - struct rte_vhost_iov_iter iov_iter[VHOST_MAX_ASYNC_IT]; > - struct rte_vhost_iovec iovec[VHOST_MAX_ASYNC_VEC]; > + struct vhost_iov_iter iov_iter[VHOST_MAX_ASYNC_IT]; > + struct vhost_iovec iovec[VHOST_MAX_ASYNC_VEC]; > uint16_t iter_idx; > uint16_t iovec_idx; > > /* data transfer status */ > struct async_inflight_info *pkts_info; > + /** > + * Packet reorder array. "true" indicates that DMA device > + * completes all copies for the packet. > + * > + * Note that this array could be written by multiple threads > + * simultaneously. For example, in the case of thread0 and > + * thread1 RX packets from NIC and then enqueue packets to > + * vring0 and vring1 with own DMA device DMA0 and DMA1, it's > + * possible for thread0 to get completed copies belonging to > + * vring1 from DMA0, while thread0 is calling rte_vhost_poll > + * _enqueue_completed() for vring0 and thread1 is calling > + * rte_vhost_submit_enqueue_burst() for vring1. In this case, > + * vq->access_lock cannot protect pkts_cmpl_flag of vring1. > + * > + * However, since offloading is per-packet basis, each packet > + * flag will only be written by one thread. And single byte > + * write is atomic, so no lock for pkts_cmpl_flag is needed. > + */ > + bool *pkts_cmpl_flag; > uint16_t pkts_idx; > uint16_t pkts_inflight_n; > union { > @@ -568,8 +640,7 @@ extern int vhost_data_log_level; > #define PRINT_PACKET(device, addr, size, header) do {} while (0) > #endif > > -#define MAX_VHOST_DEVICE 1024 > -extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE]; > +extern struct virtio_net *vhost_devices[RTE_MAX_VHOST_DEVICE]; > > #define VHOST_BINARY_SEARCH_THRESH 256 > > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > index f19713137c..cc4e2504ac 100644 > --- a/lib/vhost/virtio_net.c > +++ b/lib/vhost/virtio_net.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -25,6 +26,9 @@ > > #define MAX_BATCH_LEN 256 > > +/* DMA device copy operation tracking array. */ > +struct async_dma_info dma_copy_track[RTE_DMADEV_DEFAULT_MAX]; > + > static __rte_always_inline bool > rxvq_is_mergeable(struct virtio_net *dev) > { > @@ -43,6 +47,136 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) > return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; > } > > +static __rte_always_inline int64_t > +vhost_async_dma_transfer_one(struct virtio_net *dev, struct vhost_virtqueue *vq, > + int16_t dma_id, uint16_t vchan_id, uint16_t flag_idx, > + struct vhost_iov_iter *pkt) > +{ > + struct async_dma_vchan_info *dma_info = &dma_copy_track[dma_id].vchans[vchan_id]; > + uint16_t ring_mask = dma_info->ring_mask; > + static bool vhost_async_dma_copy_log; > + > + > + struct vhost_iovec *iov = pkt->iov; > + int copy_idx = 0; > + uint32_t nr_segs = pkt->nr_segs; > + uint16_t i; > + > + if (rte_dma_burst_capacity(dma_id, vchan_id) < nr_segs) > + return -1; > + > + for (i = 0; i < nr_segs; i++) { > + copy_idx = rte_dma_copy(dma_id, vchan_id, (rte_iova_t)iov[i].src_addr, > + (rte_iova_t)iov[i].dst_addr, iov[i].len, RTE_DMA_OP_FLAG_LLC); > + /** > + * Since all memory is pinned and DMA vChannel > + * ring has enough space, failure should be a > + * rare case. If failure happens, it means DMA > + * device encounters serious errors; in this > + * case, please stop async data-path and check > + * what has happened to DMA device. > + */ > + if (unlikely(copy_idx < 0)) { > + if (!vhost_async_dma_copy_log) { > + VHOST_LOG_DATA(ERR, "(%s) DMA %d vChannel %u reports error in " > + "rte_dma_copy(). Please stop async data-path and " > + "debug what has happened to DMA device\n", Please try to maje the log message shorter. Something like "DMA copy failed for channel :". > + dev->ifname, dma_id, vchan_id); > + vhost_async_dma_copy_log = true; > + } > + return -1; > + } > + } > + > + /** > + * Only store packet completion flag address in the last copy's > + * slot, and other slots are set to NULL. > + */ > + dma_info->pkts_cmpl_flag_addr[copy_idx & ring_mask] = &vq->async->pkts_cmpl_flag[flag_idx]; > + > + return nr_segs; > +} > + > +static __rte_always_inline uint16_t > +vhost_async_dma_transfer(struct virtio_net *dev, struct vhost_virtqueue *vq, > + int16_t dma_id, uint16_t vchan_id, uint16_t head_idx, > + struct vhost_iov_iter *pkts, uint16_t nr_pkts) > +{ > + struct async_dma_vchan_info *dma_info = &dma_copy_track[dma_id].vchans[vchan_id]; > + int64_t ret, nr_copies = 0; > + uint16_t pkt_idx; > + > + rte_spinlock_lock(&dma_info->dma_lock); > + > + for (pkt_idx = 0; pkt_idx < nr_pkts; pkt_idx++) { > + ret = vhost_async_dma_transfer_one(dev, vq, dma_id, vchan_id, head_idx, &pkts[pkt_idx]); > + if (unlikely(ret < 0)) > + break; > + > + nr_copies += ret; > + head_idx++; > + if (head_idx >= vq->size) > + head_idx -= vq->size; > + } > + > + if (likely(nr_copies > 0)) > + rte_dma_submit(dma_id, vchan_id); > + > + rte_spinlock_unlock(&dma_info->dma_lock); > + > + return pkt_idx; > +} Thanks for reworking vhost_async_dma_transfer()!It is much cleaner & easy to understand IMHO. > +static __rte_always_inline uint16_t > +vhost_async_dma_check_completed(struct virtio_net *dev, int16_t dma_id, uint16_t vchan_id, > + uint16_t max_pkts) > +{ > + struct async_dma_vchan_info *dma_info = &dma_copy_track[dma_id].vchans[vchan_id]; > + uint16_t ring_mask = dma_info->ring_mask; > + uint16_t last_idx = 0; > + uint16_t nr_copies; > + uint16_t copy_idx; > + uint16_t i; > + bool has_error = false; > + static bool vhost_async_dma_complete_log; > + > + rte_spinlock_lock(&dma_info->dma_lock); > + > + /** > + * Print error log for debugging, if DMA reports error during > + * DMA transfer. We do not handle error in vhost level. > + */ > + nr_copies = rte_dma_completed(dma_id, vchan_id, max_pkts, &last_idx, &has_error); > + if (unlikely(!vhost_async_dma_complete_log && has_error)) { > + VHOST_LOG_DATA(ERR, "(%s) DMA %d vChannel %u reports error in " > + "rte_dma_completed()\n", dev->ifname, dma_id, vchan_id); "(%s) DMA completion failure on channel %d:%d" > + vhost_async_dma_complete_log = true; > + } else if (nr_copies == 0) { > + goto out; > + } > + > + copy_idx = last_idx - nr_copies + 1; > + for (i = 0; i < nr_copies; i++) { > + bool *flag; > + > + flag = dma_info->pkts_cmpl_flag_addr[copy_idx & ring_mask]; > + if (flag) { > + /** > + * Mark the packet flag as received. The flag > + * could belong to another virtqueue but write > + * is atomic. > + */ > + *flag = true; > + dma_info->pkts_cmpl_flag_addr[copy_idx & ring_mask] = NULL; > + } > + copy_idx++; > + } > + > +out: > + rte_spinlock_unlock(&dma_info->dma_lock); > + return nr_copies; > +} > + > static inline void > do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq) > { > @@ -794,7 +928,7 @@ copy_vnet_hdr_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > static __rte_always_inline int > async_iter_initialize(struct virtio_net *dev, struct vhost_async *async) > { > - struct rte_vhost_iov_iter *iter; > + struct vhost_iov_iter *iter; > > if (unlikely(async->iovec_idx >= VHOST_MAX_ASYNC_VEC)) { > VHOST_LOG_DATA(ERR, "(%s) no more async iovec available\n", dev->ifname); > @@ -812,8 +946,8 @@ static __rte_always_inline int > async_iter_add_iovec(struct virtio_net *dev, struct vhost_async *async, > void *src, void *dst, size_t len) > { > - struct rte_vhost_iov_iter *iter; > - struct rte_vhost_iovec *iovec; > + struct vhost_iov_iter *iter; > + struct vhost_iovec *iovec; > > if (unlikely(async->iovec_idx >= VHOST_MAX_ASYNC_VEC)) { > static bool vhost_max_async_vec_log; > @@ -848,7 +982,7 @@ async_iter_finalize(struct vhost_async *async) > static __rte_always_inline void > async_iter_cancel(struct vhost_async *async) > { > - struct rte_vhost_iov_iter *iter; > + struct vhost_iov_iter *iter; > > iter = async->iov_iter + async->iter_idx; > async->iovec_idx -= iter->nr_segs; > @@ -1448,9 +1582,9 @@ store_dma_desc_info_packed(struct vring_used_elem_packed *s_ring, > } > > static __rte_noinline uint32_t > -virtio_dev_rx_async_submit_split(struct virtio_net *dev, > - struct vhost_virtqueue *vq, uint16_t queue_id, > - struct rte_mbuf **pkts, uint32_t count) > +virtio_dev_rx_async_submit_split(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count, > + int16_t dma_id, uint16_t vchan_id) > { > struct buf_vector buf_vec[BUF_VECTOR_MAX]; > uint32_t pkt_idx = 0; > @@ -1460,7 +1594,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > struct vhost_async *async = vq->async; > struct async_inflight_info *pkts_info = async->pkts_info; > uint32_t pkt_err = 0; > - int32_t n_xfer; > + uint16_t n_xfer; > uint16_t slot_idx = 0; > > /* > @@ -1502,17 +1636,16 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, > if (unlikely(pkt_idx == 0)) > return 0; > > - n_xfer = async->ops.transfer_data(dev->vid, queue_id, async->iov_iter, 0, pkt_idx); > - if (unlikely(n_xfer < 0)) { > - VHOST_LOG_DATA(ERR, "(%s) %s: failed to transfer data for queue id %d.\n", > - dev->ifname, __func__, queue_id); > - n_xfer = 0; > - } > + n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx, > + async->iov_iter, pkt_idx); > > pkt_err = pkt_idx - n_xfer; > if (unlikely(pkt_err)) { > uint16_t num_descs = 0; > > + VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer %u packets for queue %u.\n", > + dev->ifname, __func__, pkt_err, queue_id); > + > /* update number of completed packets */ > pkt_idx = n_xfer; > > @@ -1655,13 +1788,13 @@ dma_error_handler_packed(struct vhost_virtqueue *vq, uint16_t slot_idx, > } > > static __rte_noinline uint32_t > -virtio_dev_rx_async_submit_packed(struct virtio_net *dev, > - struct vhost_virtqueue *vq, uint16_t queue_id, > - struct rte_mbuf **pkts, uint32_t count) > +virtio_dev_rx_async_submit_packed(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count, > + int16_t dma_id, uint16_t vchan_id) > { > uint32_t pkt_idx = 0; > uint32_t remained = count; > - int32_t n_xfer; > + uint16_t n_xfer; > uint16_t num_buffers; > uint16_t num_descs; > > @@ -1693,19 +1826,17 @@ virtio_dev_rx_async_submit_packed(struct virtio_net *dev, > if (unlikely(pkt_idx == 0)) > return 0; > > - n_xfer = async->ops.transfer_data(dev->vid, queue_id, async->iov_iter, 0, pkt_idx); > - if (unlikely(n_xfer < 0)) { > - VHOST_LOG_DATA(ERR, "(%s) %s: failed to transfer data for queue id %d.\n", > - dev->ifname, __func__, queue_id); > - n_xfer = 0; > - } > - > - pkt_err = pkt_idx - n_xfer; > + n_xfer = vhost_async_dma_transfer(dev, vq, dma_id, vchan_id, async->pkts_idx, async->iov_iter, > + pkt_idx); > > async_iter_reset(async); > > - if (unlikely(pkt_err)) > + pkt_err = pkt_idx - n_xfer; > + if (unlikely(pkt_err)) { > + VHOST_LOG_DATA(DEBUG, "(%s) %s: failed to transfer %u packets for queue %u.\n", > + dev->ifname, __func__, pkt_err, queue_id); > dma_error_handler_packed(vq, slot_idx, pkt_err, &pkt_idx); > + } > > if (likely(vq->shadow_used_idx)) { > /* keep used descriptors. */ > @@ -1825,28 +1956,40 @@ write_back_completed_descs_packed(struct vhost_virtqueue *vq, > > static __rte_always_inline uint16_t > vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count) > + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, > + uint16_t vchan_id) > { > struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; > struct vhost_async *async = vq->async; > struct async_inflight_info *pkts_info = async->pkts_info; > - int32_t n_cpl; > + uint16_t nr_cpl_pkts = 0; > uint16_t n_descs = 0, n_buffers = 0; > uint16_t start_idx, from, i; > > - n_cpl = async->ops.check_completed_copies(dev->vid, queue_id, 0, count); > - if (unlikely(n_cpl < 0)) { > - VHOST_LOG_DATA(ERR, "(%s) %s: failed to check completed copies for queue id %d.\n", > - dev->ifname, __func__, queue_id); > - return 0; > + /* Check completed copies for the given DMA vChannel */ > + vhost_async_dma_check_completed(dev, dma_id, vchan_id, VHOST_DMA_MAX_COPY_COMPLETE); > + > + start_idx = async_get_first_inflight_pkt_idx(vq); > + /** > + * Calculate the number of copy completed packets. > + * Note that there may be completed packets even if > + * no copies are reported done by the given DMA vChannel。 > + * For example, multiple data plane threads enqueue packets > + * to the same virtqueue with their own DMA vChannels. > + */ > + from = start_idx; > + while (vq->async->pkts_cmpl_flag[from] && count--) { > + vq->async->pkts_cmpl_flag[from] = false; > + from++; > + if (from >= vq->size) > + from -= vq->size; > + nr_cpl_pkts++; > } > > - if (n_cpl == 0) > + if (nr_cpl_pkts == 0) > return 0; > > - start_idx = async_get_first_inflight_pkt_idx(vq); > - > - for (i = 0; i < n_cpl; i++) { > + for (i = 0; i < nr_cpl_pkts; i++) { > from = (start_idx + i) % vq->size; > /* Only used with packed ring */ > n_buffers += pkts_info[from].nr_buffers; > @@ -1855,7 +1998,7 @@ vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, > pkts[i] = pkts_info[from].mbuf; > } > > - async->pkts_inflight_n -= n_cpl; > + async->pkts_inflight_n -= nr_cpl_pkts; > > if (likely(vq->enabled && vq->access_ok)) { > if (vq_is_packed(dev)) { > @@ -1876,12 +2019,13 @@ vhost_poll_enqueue_completed(struct virtio_net *dev, uint16_t queue_id, > } > } > > - return n_cpl; > + return nr_cpl_pkts; > } > > uint16_t > rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count) > + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, > + uint16_t vchan_id) > { > struct virtio_net *dev = get_device(vid); > struct vhost_virtqueue *vq; > @@ -1905,9 +2049,20 @@ rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, > return 0; > } > > - rte_spinlock_lock(&vq->access_lock); > + if (unlikely(!dma_copy_track[dma_id].vchans || > + !dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) { > + VHOST_LOG_DATA(ERR, "(%s) %s: invalid DMA %d vChannel %u.\n", dev->ifname, __func__, > + dma_id, vchan_id); > + return 0; > + } > > - n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count); > + if (!rte_spinlock_trylock(&vq->access_lock)) { > + VHOST_LOG_DATA(DEBUG, "(%s) failed to poll completed packets from queue id %u. " > + "virtqueue busy.\n", dev->ifname, queue_id); Please try to make it shorter so that the string fits in a single line. > + return 0; > + } > + > + n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count, dma_id, vchan_id); > > rte_spinlock_unlock(&vq->access_lock); > > @@ -1916,7 +2071,8 @@ rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, > > uint16_t > rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count) > + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, > + uint16_t vchan_id) > { > struct virtio_net *dev = get_device(vid); > struct vhost_virtqueue *vq; > @@ -1940,14 +2096,21 @@ rte_vhost_clear_queue_thread_unsafe(int vid, uint16_t queue_id, > return 0; > } > > - n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count); > + if (unlikely(!dma_copy_track[dma_id].vchans || > + !dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) { > + VHOST_LOG_DATA(ERR, "(%s) %s: invalid DMA %d vChannel %u.\n", dev->ifname, __func__, > + dma_id, vchan_id); > + return 0; > + } > + > + n_pkts_cpl = vhost_poll_enqueue_completed(dev, queue_id, pkts, count, dma_id, vchan_id); > > return n_pkts_cpl; > } > > static __rte_always_inline uint32_t > virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, > - struct rte_mbuf **pkts, uint32_t count) > + struct rte_mbuf **pkts, uint32_t count, int16_t dma_id, uint16_t vchan_id) > { > struct vhost_virtqueue *vq; > uint32_t nb_tx = 0; > @@ -1959,6 +2122,13 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, > return 0; > } > > + if (unlikely(!dma_copy_track[dma_id].vchans || > + !dma_copy_track[dma_id].vchans[vchan_id].pkts_cmpl_flag_addr)) { > + VHOST_LOG_DATA(ERR, "(%s) %s: invalid DMA %d vChannel %u.\n", dev->ifname, __func__, > + dma_id, vchan_id); > + return 0; > + } > + > vq = dev->virtqueue[queue_id]; > > rte_spinlock_lock(&vq->access_lock); > @@ -1979,10 +2149,10 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, > > if (vq_is_packed(dev)) > nb_tx = virtio_dev_rx_async_submit_packed(dev, vq, queue_id, > - pkts, count); > + pkts, count, dma_id, vchan_id); > else > nb_tx = virtio_dev_rx_async_submit_split(dev, vq, queue_id, > - pkts, count); > + pkts, count, dma_id, vchan_id); > > out: > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > @@ -1996,7 +2166,8 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id, > > uint16_t > rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, > - struct rte_mbuf **pkts, uint16_t count) > + struct rte_mbuf **pkts, uint16_t count, int16_t dma_id, > + uint16_t vchan_id) > { > struct virtio_net *dev = get_device(vid); > > @@ -2009,7 +2180,7 @@ rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id, > return 0; > } > > - return virtio_dev_rx_async_submit(dev, queue_id, pkts, count); > + return virtio_dev_rx_async_submit(dev, queue_id, pkts, count, dma_id, vchan_id); > } > > static inline bool Overall, it looks good to me, only cosmetic issues to be fixed. It is too late for -rc1, but please send a new version and I'll pick it it for -rc2. Thanks, Maxime