From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 15C97A04A5; Thu, 18 Jun 2020 07:51:02 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E8DE358C4; Thu, 18 Jun 2020 07:51:00 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id DF18C5323 for ; Thu, 18 Jun 2020 07:50:58 +0200 (CEST) IronPort-SDR: b+kpBw2k4VEe/JU12Rq6+kkWfaudOfAKopCx+dYX8h0btrpLi7xdMsyyp3oRqKRF08UZvG2pRY ndcbGVV3tI8g== X-IronPort-AV: E=McAfee;i="6000,8403,9655"; a="143952341" X-IronPort-AV: E=Sophos;i="5.73,525,1583222400"; d="scan'208";a="143952341" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jun 2020 22:50:57 -0700 IronPort-SDR: UJ3fr5SYtDLF6iPmddIccKaClrSxThIKAnXXAuEepVnnli6ukgOeehT3e7LTUsBFTbxf3bV/b2 mE+XTOgXa5CA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,525,1583222400"; d="scan'208";a="277514372" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by orsmga006.jf.intel.com with ESMTP; 17 Jun 2020 22:50:57 -0700 Received: from fmsmsx157.amr.corp.intel.com (10.18.116.73) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 17 Jun 2020 22:50:57 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by FMSMSX157.amr.corp.intel.com (10.18.116.73) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 17 Jun 2020 22:50:55 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.89]) by SHSMSX104.ccr.corp.intel.com ([10.239.4.70]) with mapi id 14.03.0439.000; Thu, 18 Jun 2020 13:50:52 +0800 From: "Liu, Yong" To: "Fu, Patrick" CC: "Fu, Patrick" , "Jiang, Cheng1" , "Liang, Cunming" , "dev@dpdk.org" , "maxime.coquelin@redhat.com" , "Xia, Chenbo" , "Wang, Zhihong" , "Ye, Xiaolong" Thread-Topic: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API Thread-Index: AQHWP9kO0UniNN6SkUuyHu6M92vF+6jd4Hcg Date: Thu, 18 Jun 2020 05:50:51 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E635F601B@SHSMSX103.ccr.corp.intel.com> References: <1591869725-13331-1-git-send-email-patrick.fu@intel.com> <1591869725-13331-2-git-send-email-patrick.fu@intel.com> In-Reply-To: <1591869725-13331-2-git-send-email-patrick.fu@intel.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Thanks, Patrick. So comments are inline. > -----Original Message----- > From: dev On Behalf Of patrick.fu@intel.com > Sent: Thursday, June 11, 2020 6:02 PM > To: dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo > ; Wang, Zhihong ; Ye, > Xiaolong > Cc: Fu, Patrick ; Jiang, Cheng1 > ; Liang, Cunming > Subject: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path > registration API >=20 > From: Patrick >=20 > This patch introduces registration/un-registration APIs > for async data path together with all required data > structures and DMA callback function proto-types. >=20 > Signed-off-by: Patrick > --- > lib/librte_vhost/Makefile | 3 +- > lib/librte_vhost/rte_vhost.h | 1 + > lib/librte_vhost/rte_vhost_async.h | 134 > +++++++++++++++++++++++++++++++++++++ > lib/librte_vhost/socket.c | 20 ++++++ > lib/librte_vhost/vhost.c | 74 +++++++++++++++++++- > lib/librte_vhost/vhost.h | 30 ++++++++- > lib/librte_vhost/vhost_user.c | 28 ++++++-- > 7 files changed, 283 insertions(+), 7 deletions(-) > create mode 100644 lib/librte_vhost/rte_vhost_async.h >=20 > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile > index e592795..3aed094 100644 > --- a/lib/librte_vhost/Makefile > +++ b/lib/librte_vhost/Makefile > @@ -41,7 +41,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) :=3D fd_man.c > iotlb.c socket.c vhost.c \ > vhost_user.c virtio_net.c vdpa.c >=20 > # install includes > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include +=3D rte_vhost.h rte_vdpa.h > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include +=3D rte_vhost.h rte_vdpa.h > \ > + rte_vhost_async.h >=20 Hi Patrick, Please also update meson build for newly added file. Thanks, Marvin > # only compile vhost crypto when cryptodev is enabled > ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h > index d43669f..cec4d07 100644 > --- a/lib/librte_vhost/rte_vhost.h > +++ b/lib/librte_vhost/rte_vhost.h > @@ -35,6 +35,7 @@ > #define RTE_VHOST_USER_EXTBUF_SUPPORT (1ULL << 5) > /* support only linear buffers (no chained mbufs) */ > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) > +#define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) >=20 > /** Protocol features. */ > #ifndef VHOST_USER_PROTOCOL_F_MQ > diff --git a/lib/librte_vhost/rte_vhost_async.h > b/lib/librte_vhost/rte_vhost_async.h > new file mode 100644 > index 0000000..82f2ebe > --- /dev/null > +++ b/lib/librte_vhost/rte_vhost_async.h > @@ -0,0 +1,134 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Intel Corporation > + */ s/2018/2020/=20 > + > +#ifndef _RTE_VHOST_ASYNC_H_ > +#define _RTE_VHOST_ASYNC_H_ > + > +#include "rte_vhost.h" > + > +/** > + * iovec iterator > + */ > +struct iov_it { > + /** offset to the first byte of interesting data */ > + size_t offset; > + /** total bytes of data in this iterator */ > + size_t count; > + /** pointer to the iovec array */ > + struct iovec *iov; > + /** number of iovec in this iterator */ > + unsigned long nr_segs; > +}; Patrick, I think structure named as "it" is too generic for understanding, please us= e more meaningful name like "iov_iter".=20 > + > +/** > + * dma transfer descriptor pair > + */ > +struct dma_trans_desc { > + /** source memory iov_it */ > + struct iov_it *src; > + /** destination memory iov_it */ > + struct iov_it *dst; > +}; > + This series patch named as sync copy, and dma is just one async copy metho= d which underneath hardware supplied.=20 IMHO, structure is better to named as "async_copy_desc" which matched the o= verall concept.=20 > +/** > + * dma transfer status > + */ > +struct dma_trans_status { > + /** An array of application specific data for source memory */ > + uintptr_t *src_opaque_data; > + /** An array of application specific data for destination memory */ > + uintptr_t *dst_opaque_data; > +}; > + Same as pervious comment. > +/** > + * dma operation callbacks to be implemented by applications > + */ > +struct rte_vhost_async_channel_ops { > + /** > + * instruct a DMA channel to perform copies for a batch of packets > + * > + * @param vid > + * id of vhost device to perform data copies > + * @param queue_id > + * queue id to perform data copies > + * @param descs > + * an array of DMA transfer memory descriptors > + * @param opaque_data > + * opaque data pair sending to DMA engine > + * @param count > + * number of elements in the "descs" array > + * @return > + * -1 on failure, number of descs processed on success > + */ > + int (*transfer_data)(int vid, uint16_t queue_id, > + struct dma_trans_desc *descs, > + struct dma_trans_status *opaque_data, > + uint16_t count); > + /** > + * check copy-completed packets from a DMA channel > + * @param vid > + * id of vhost device to check copy completion > + * @param queue_id > + * queue id to check copyp completion > + * @param opaque_data > + * buffer to receive the opaque data pair from DMA engine > + * @param max_packets > + * max number of packets could be completed > + * @return > + * -1 on failure, number of iov segments completed on success > + */ > + int (*check_completed_copies)(int vid, uint16_t queue_id, > + struct dma_trans_status *opaque_data, > + uint16_t max_packets); > +}; > + > +/** > + * dma channel feature bit definition > + */ > +struct dma_channel_features { > + union { > + uint32_t intval; > + struct { > + uint32_t inorder:1; > + uint32_t resvd0115:15; > + uint32_t threshold:12; > + uint32_t resvd2831:4; > + }; > + }; > +}; > + Naming feature bits as "intval" may cause confusion, why not just use its m= eaning like "engine_features"? I'm not sure whether format "resvd0115" match dpdk copy style. In my mind, = dpdk will use resvd_0 and resvd_1 for two reserved elements. > +/** > + * register a dma channel for vhost > + * > + * @param vid > + * vhost device id DMA channel to be attached to > + * @param queue_id > + * vhost queue id DMA channel to be attached to > + * @param features > + * DMA channel feature bit > + * b0 : DMA supports inorder data transfer > + * b1 - b15: reserved > + * b16 - b27: Packet length threshold for DMA transfer > + * b28 - b31: reserved > + * @param ops > + * DMA operation callbacks > + * @return > + * 0 on success, -1 on failures > + */ > +int rte_vhost_async_channel_register(int vid, uint16_t queue_id, > + uint32_t features, struct rte_vhost_async_channel_ops *ops); > + > +/** > + * unregister a dma channel for vhost > + * > + * @param vid > + * vhost device id DMA channel to be detached > + * @param queue_id > + * vhost queue id DMA channel to be detached > + * @return > + * 0 on success, -1 on failures > + */ > +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id); > + > +#endif /* _RTE_VDPA_H_ */ > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 0a66ef9..f817783 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -42,6 +42,7 @@ struct vhost_user_socket { > bool use_builtin_virtio_net; > bool extbuf; > bool linearbuf; > + bool async_copy; >=20 > /* > * The "supported_features" indicates the feature bits the > @@ -210,6 +211,7 @@ struct vhost_user { > size_t size; > struct vhost_user_connection *conn; > int ret; > + struct virtio_net *dev; >=20 > if (vsocket =3D=3D NULL) > return; > @@ -241,6 +243,13 @@ struct vhost_user { > if (vsocket->linearbuf) > vhost_enable_linearbuf(vid); >=20 > + if (vsocket->async_copy) { > + dev =3D get_device(vid); > + > + if (dev) > + dev->async_copy =3D 1; > + } > + IMHO, user can chose which queue utilize async copy as backend hardware res= ource is limited.=20 So should async_copy enable flag be saved in virtqueue structure?=20 > VHOST_LOG_CONFIG(INFO, "new device, handle is %d\n", vid); >=20 > if (vsocket->notify_ops->new_connection) { > @@ -891,6 +900,17 @@ struct vhost_user_reconnect_list { > goto out_mutex; > } >=20 > + vsocket->async_copy =3D flags & RTE_VHOST_USER_ASYNC_COPY; > + > + if (vsocket->async_copy && > + (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > + RTE_VHOST_USER_POSTCOPY_SUPPORT))) { > + VHOST_LOG_CONFIG(ERR, "error: enabling async copy and > IOMMU " > + "or post-copy feature simultaneously is not " > + "supported\n"); > + goto out_mutex; > + } > + > /* > * Set the supported features correctly for the builtin vhost-user > * net driver. > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 0266318..e6b688a 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -332,8 +332,13 @@ > { > if (vq_is_packed(dev)) > rte_free(vq->shadow_used_packed); > - else > + else { > rte_free(vq->shadow_used_split); > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + } > rte_free(vq->batch_copy_elems); > rte_mempool_free(vq->iotlb_pool); > rte_free(vq); > @@ -1527,3 +1532,70 @@ int rte_vhost_extern_callback_register(int vid, > if (vhost_data_log_level >=3D 0) > rte_log_set_level(vhost_data_log_level, > RTE_LOG_WARNING); > } > + > +int rte_vhost_async_channel_register(int vid, uint16_t queue_id, > + uint32_t features, > + struct rte_vhost_async_channel_ops > *ops) > +{ > + struct vhost_virtqueue *vq; > + struct virtio_net *dev =3D get_device(vid); > + struct dma_channel_features f; > + > + if (dev =3D=3D NULL || ops =3D=3D NULL) > + return -1; > + > + f.intval =3D features; > + > + vq =3D dev->virtqueue[queue_id]; > + > + if (vq =3D=3D NULL) > + return -1; > + > + /** packed queue is not supported */ > + if (vq_is_packed(dev) || !f.inorder) > + return -1; > + Virtio already has in_order concept, these two names are so like and can be= easily messed up. Please consider how to distinguish them. > + if (ops->check_completed_copies =3D=3D NULL || > + ops->transfer_data =3D=3D NULL) > + return -1; > + Previous error is unlikely to be true, unlikely macro may be helpful for un= derstanding.=20 > + rte_spinlock_lock(&vq->access_lock); > + > + vq->async_ops.check_completed_copies =3D ops- > >check_completed_copies; > + vq->async_ops.transfer_data =3D ops->transfer_data; > + > + vq->async_inorder =3D f.inorder; > + vq->async_threshold =3D f.threshold; > + > + vq->async_registered =3D true; > + > + rte_spinlock_unlock(&vq->access_lock); > + > + return 0; > +} > + > +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id) > +{ > + struct vhost_virtqueue *vq; > + struct virtio_net *dev =3D get_device(vid); > + > + if (dev =3D=3D NULL) > + return -1; > + > + vq =3D dev->virtqueue[queue_id]; > + > + if (vq =3D=3D NULL) > + return -1; > + > + rte_spinlock_lock(&vq->access_lock); > + > + vq->async_ops.transfer_data =3D NULL; > + vq->async_ops.check_completed_copies =3D NULL; > + > + vq->async_registered =3D false; > + > + rte_spinlock_unlock(&vq->access_lock); > + > + return 0; > +} > + > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index df98d15..a7fbe23 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -23,6 +23,8 @@ > #include "rte_vhost.h" > #include "rte_vdpa.h" >=20 > +#include "rte_vhost_async.h" > + > /* Used to indicate that the device is running on a data core */ > #define VIRTIO_DEV_RUNNING 1 > /* Used to indicate that the device is ready to operate */ > @@ -39,6 +41,11 @@ >=20 > #define VHOST_LOG_CACHE_NR 32 >=20 > +#define MAX_PKT_BURST 32 > + > +#define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2) > +#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2) > + > #define PACKED_DESC_ENQUEUE_USED_FLAG(w) \ > ((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | > VRING_DESC_F_WRITE) : \ > VRING_DESC_F_WRITE) > @@ -200,6 +207,25 @@ struct vhost_virtqueue { > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > int iotlb_cache_nr; > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > + > + /* operation callbacks for async dma */ > + struct rte_vhost_async_channel_ops async_ops; > + > + struct iov_it it_pool[VHOST_MAX_ASYNC_IT]; > + struct iovec vec_pool[VHOST_MAX_ASYNC_VEC]; > + > + /* async data transfer status */ > + uintptr_t **async_pkts_pending; > + #define ASYNC_PENDING_INFO_N_MSK 0xFFFF > + #define ASYNC_PENDING_INFO_N_SFT 16 > + uint64_t *async_pending_info; > + uint16_t async_pkts_idx; > + uint16_t async_pkts_inflight_n; > + > + /* vq async features */ > + bool async_inorder; > + bool async_registered; > + uint16_t async_threshold; > } __rte_cache_aligned; >=20 > /* Old kernels have no such macros defined */ > @@ -353,6 +379,7 @@ struct virtio_net { > int16_t broadcast_rarp; > uint32_t nr_vring; > int dequeue_zero_copy; > + int async_copy; > int extbuf; > int linearbuf; > struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; > @@ -702,7 +729,8 @@ uint64_t translate_log_addr(struct virtio_net *dev, > struct vhost_virtqueue *vq, > /* Don't kick guest if we don't reach index specified by guest. */ > if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { > uint16_t old =3D vq->signalled_used; > - uint16_t new =3D vq->last_used_idx; > + uint16_t new =3D vq->async_pkts_inflight_n ? > + vq->used->idx:vq->last_used_idx; > bool signalled_used_valid =3D vq->signalled_used_valid; >=20 > vq->signalled_used =3D new; > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.= c > index 84bebad..d7600bf 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -464,12 +464,25 @@ > } else { > if (vq->shadow_used_split) > rte_free(vq->shadow_used_split); > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + > vq->shadow_used_split =3D rte_malloc(NULL, > vq->size * sizeof(struct vring_used_elem), > RTE_CACHE_LINE_SIZE); > - if (!vq->shadow_used_split) { > + vq->async_pkts_pending =3D rte_malloc(NULL, > + vq->size * sizeof(uintptr_t), > + RTE_CACHE_LINE_SIZE); > + vq->async_pending_info =3D rte_malloc(NULL, > + vq->size * sizeof(uint64_t), > + RTE_CACHE_LINE_SIZE); > + if (!vq->shadow_used_split || > + !vq->async_pkts_pending || > + !vq->async_pending_info) { > VHOST_LOG_CONFIG(ERR, > - "failed to allocate memory for > shadow used ring.\n"); > + "failed to allocate memory for vq > internal data.\n"); If async copy not enabled, there will be no need to allocate related struct= ures.=20 > return RTE_VHOST_MSG_RESULT_ERR; > } > } > @@ -1147,7 +1160,8 @@ > goto err_mmap; > } >=20 > - populate =3D (dev->dequeue_zero_copy) ? MAP_POPULATE : 0; > + populate =3D (dev->dequeue_zero_copy || dev->async_copy) ? > + MAP_POPULATE : 0; > mmap_addr =3D mmap(NULL, mmap_size, PROT_READ | > PROT_WRITE, > MAP_SHARED | populate, fd, 0); >=20 > @@ -1162,7 +1176,7 @@ > reg->host_user_addr =3D (uint64_t)(uintptr_t)mmap_addr + > mmap_offset; >=20 > - if (dev->dequeue_zero_copy) > + if (dev->dequeue_zero_copy || dev->async_copy) > if (add_guest_pages(dev, reg, alignment) < 0) { > VHOST_LOG_CONFIG(ERR, > "adding guest pages to region %u > failed.\n", > @@ -1945,6 +1959,12 @@ static int vhost_user_set_vring_err(struct > virtio_net **pdev __rte_unused, > } else { > rte_free(vq->shadow_used_split); > vq->shadow_used_split =3D NULL; > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + vq->async_pkts_pending =3D NULL; > + vq->async_pending_info =3D NULL; > } >=20 > rte_free(vq->batch_copy_elems); > -- > 1.8.3.1