From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C99C1A04F5; Fri, 19 Jun 2020 02:40:29 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1754ADE3; Fri, 19 Jun 2020 02:40:29 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 44FECDE0 for ; Fri, 19 Jun 2020 02:40:27 +0200 (CEST) IronPort-SDR: wUbY1JJCnIKDTWNKq+1T7PkLqmvSyR8sRLAhjixLWwK82ZxT9Yr3JceJmv7+43/+NZj8lWNxLT X+HyBoux9lRQ== X-IronPort-AV: E=McAfee;i="6000,8403,9656"; a="141088305" X-IronPort-AV: E=Sophos;i="5.75,253,1589266800"; d="scan'208";a="141088305" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jun 2020 17:40:26 -0700 IronPort-SDR: BwZ7gQWREDD11SiT4vIwwg1VQEZ/duOHpdeh/iYIXwxLOxa0muUbv5gWOdOLQ/unLEfomanJAI Gu15YUwAjSSQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,253,1589266800"; d="scan'208";a="477440891" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by fmsmga005.fm.intel.com with ESMTP; 18 Jun 2020 17:40:26 -0700 Received: from FMSMSX110.amr.corp.intel.com (10.18.116.10) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 18 Jun 2020 17:40:26 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by fmsmsx110.amr.corp.intel.com (10.18.116.10) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 18 Jun 2020 17:40:25 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.89]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.208]) with mapi id 14.03.0439.000; Fri, 19 Jun 2020 08:40:22 +0800 From: "Liu, Yong" To: "Fu, Patrick" CC: "Jiang, Cheng1" , "Liang, Cunming" , "dev@dpdk.org" , "maxime.coquelin@redhat.com" , "Xia, Chenbo" , "Wang, Zhihong" , "Ye, Xiaolong" Thread-Topic: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API Thread-Index: AQHWP9kO0UniNN6SkUuyHu6M92vF+6jd4Hcg//+51wCAAYmGQA== Date: Fri, 19 Jun 2020 00:40:22 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E635F6A2C@SHSMSX103.ccr.corp.intel.com> References: <1591869725-13331-1-git-send-email-patrick.fu@intel.com> <1591869725-13331-2-git-send-email-patrick.fu@intel.com> <86228AFD5BCD8E4EBFD2B90117B5E81E635F601B@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Fu, Patrick > Sent: Thursday, June 18, 2020 5:09 PM > To: Liu, Yong > Cc: Jiang, Cheng1 ; Liang, Cunming > ; dev@dpdk.org; maxime.coquelin@redhat.com; > Xia, Chenbo ; Wang, Zhihong > ; Ye, Xiaolong > Subject: RE: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path > registration API >=20 >=20 >=20 > > -----Original Message----- > > From: Liu, Yong > > Sent: Thursday, June 18, 2020 1:51 PM > > To: Fu, Patrick > > Cc: Fu, Patrick ; Jiang, Cheng1 > > ; Liang, Cunming ; > > dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo > > ; Wang, Zhihong ; Ye, > > Xiaolong > > Subject: RE: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path > > registration API > > > > Thanks, Patrick. So comments are inline. > > > > > -----Original Message----- > > > From: dev On Behalf Of patrick.fu@intel.com > > > Sent: Thursday, June 11, 2020 6:02 PM > > > To: dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo > > > ; Wang, Zhihong ; > Ye, > > > Xiaolong > > > Cc: Fu, Patrick ; Jiang, Cheng1 > > > ; Liang, Cunming > > > Subject: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path > > > registration API > > > > > > From: Patrick > > > > > > This patch introduces registration/un-registration APIs for async dat= a > > > path together with all required data structures and DMA callback > > > function proto-types. > > > > > > Signed-off-by: Patrick > > > --- > > > lib/librte_vhost/Makefile | 3 +- > > > lib/librte_vhost/rte_vhost.h | 1 + > > > lib/librte_vhost/rte_vhost_async.h | 134 > > > +++++++++++++++++++++++++++++++++++++ > > > lib/librte_vhost/socket.c | 20 ++++++ > > > lib/librte_vhost/vhost.c | 74 +++++++++++++++++++- > > > lib/librte_vhost/vhost.h | 30 ++++++++- > > > lib/librte_vhost/vhost_user.c | 28 ++++++-- > > > 7 files changed, 283 insertions(+), 7 deletions(-) create mode > > > 100644 lib/librte_vhost/rte_vhost_async.h > > > > > > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile > > > index e592795..3aed094 100644 > > > --- a/lib/librte_vhost/Makefile > > > +++ b/lib/librte_vhost/Makefile > > > @@ -41,7 +41,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) :=3D fd_man.c > > iotlb.c > > > socket.c vhost.c \ > > > vhost_user.c virtio_net.c vdpa.c > > > > > > # install includes > > > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include +=3D rte_vhost.h > > rte_vdpa.h > > > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include +=3D rte_vhost.h > > rte_vdpa.h > > > \ > > > +rte_vhost_async.h > > > > > Hi Patrick, > > Please also update meson build for newly added file. > > > > Thanks, > > Marvin > > > > > # only compile vhost crypto when cryptodev is enabled ifeq > > > ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) > > > diff --git a/lib/librte_vhost/rte_vhost.h > > > b/lib/librte_vhost/rte_vhost.h index d43669f..cec4d07 100644 > > > --- a/lib/librte_vhost/rte_vhost.h > > > +++ b/lib/librte_vhost/rte_vhost.h > > > @@ -35,6 +35,7 @@ > > > #define RTE_VHOST_USER_EXTBUF_SUPPORT(1ULL << 5) > > > /* support only linear buffers (no chained mbufs) */ > > > #define RTE_VHOST_USER_LINEARBUF_SUPPORT(1ULL << 6) > > > +#define RTE_VHOST_USER_ASYNC_COPY(1ULL << 7) > > > > > > /** Protocol features. */ > > > #ifndef VHOST_USER_PROTOCOL_F_MQ > > > diff --git a/lib/librte_vhost/rte_vhost_async.h > > > b/lib/librte_vhost/rte_vhost_async.h > > > new file mode 100644 > > > index 0000000..82f2ebe > > > --- /dev/null > > > +++ b/lib/librte_vhost/rte_vhost_async.h > > > @@ -0,0 +1,134 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * Copyright(c) 2018 Intel Corporation */ > > > > s/2018/2020/ > > > > > + > > > +#ifndef _RTE_VHOST_ASYNC_H_ > > > +#define _RTE_VHOST_ASYNC_H_ > > > + > > > +#include "rte_vhost.h" > > > + > > > +/** > > > + * iovec iterator > > > + */ > > > +struct iov_it { > > > +/** offset to the first byte of interesting data */ > > > +size_t offset; > > > +/** total bytes of data in this iterator */ > > > +size_t count; > > > +/** pointer to the iovec array */ > > > +struct iovec *iov; > > > +/** number of iovec in this iterator */ > > > +unsigned long nr_segs; > > > +}; > > > > Patrick, > > I think structure named as "it" is too generic for understanding, pleas= e > use > > more meaningful name like "iov_iter". > > > > > + > > > +/** > > > + * dma transfer descriptor pair > > > + */ > > > +struct dma_trans_desc { > > > +/** source memory iov_it */ > > > +struct iov_it *src; > > > +/** destination memory iov_it */ > > > +struct iov_it *dst; > > > +}; > > > + > > > > This series patch named as sync copy, and dma is just one async copy > > method which underneath hardware supplied. > > IMHO, structure is better to named as "async_copy_desc" which matched > the > > overall concept. > > > > > +/** > > > + * dma transfer status > > > + */ > > > +struct dma_trans_status { > > > +/** An array of application specific data for source memory */ > > > +uintptr_t *src_opaque_data; > > > +/** An array of application specific data for destination memory */ > > > +uintptr_t *dst_opaque_data; > > > +}; > > > + > > Same as pervious comment. > > > > > +/** > > > + * dma operation callbacks to be implemented by applications */ > > > +struct rte_vhost_async_channel_ops { > > > +/** > > > + * instruct a DMA channel to perform copies for a batch of packets > > > + * > > > + * @param vid > > > + * id of vhost device to perform data copies > > > + * @param queue_id > > > + * queue id to perform data copies > > > + * @param descs > > > + * an array of DMA transfer memory descriptors > > > + * @param opaque_data > > > + * opaque data pair sending to DMA engine > > > + * @param count > > > + * number of elements in the "descs" array > > > + * @return > > > + * -1 on failure, number of descs processed on success > > > + */ > > > +int (*transfer_data)(int vid, uint16_t queue_id, > > > +struct dma_trans_desc *descs, > > > +struct dma_trans_status *opaque_data, > > > +uint16_t count); > > > +/** > > > + * check copy-completed packets from a DMA channel > > > + * @param vid > > > + * id of vhost device to check copy completion > > > + * @param queue_id > > > + * queue id to check copyp completion > > > + * @param opaque_data > > > + * buffer to receive the opaque data pair from DMA engine > > > + * @param max_packets > > > + * max number of packets could be completed > > > + * @return > > > + * -1 on failure, number of iov segments completed on success > > > + */ > > > +int (*check_completed_copies)(int vid, uint16_t queue_id, > > > +struct dma_trans_status *opaque_data, > > > +uint16_t max_packets); > > > +}; > > > + > > > +/** > > > + * dma channel feature bit definition */ struct > > > +dma_channel_features { > > > +union { > > > +uint32_t intval; > > > +struct { > > > +uint32_t inorder:1; > > > +uint32_t resvd0115:15; > > > +uint32_t threshold:12; > > > +uint32_t resvd2831:4; > > > +}; > > > +}; > > > +}; > > > + > > > > Naming feature bits as "intval" may cause confusion, why not just use i= ts > > meaning like "engine_features"? > > I'm not sure whether format "resvd0115" match dpdk copy style. In my > mind, > > dpdk will use resvd_0 and resvd_1 for two reserved elements. > > > For comments here above, I will take changes in v2 patch >=20 > > > +if (dev) > > > +dev->async_copy =3D 1; > > > +} > > > + > > > > IMHO, user can chose which queue utilize async copy as backend > hardware > > resource is limited. > > So should async_copy enable flag be saved in virtqueue structure? > > > We have per queue flag to identify the enabling status of a specific queu= e. > "async_copy" flag is a dev level flag which identifies the async capabili= ty of a > vhost device. > This is necessary because we rely on this flag to do initialization work = if the > vhost backend need to support async mode at any of its queues. >=20 I got u, how about rename this variable to "async_copy_enabled" which more = aligned to its real implication.=20 Thanks, Marvin > > > VHOST_LOG_CONFIG(INFO, "new device, handle is %d\n", vid); > > > > > > if (vsocket->notify_ops->new_connection) { @@ -891,6 +900,17 @@ > > > struct vhost_user_reconnect_list { > > > goto out_mutex; > > > } > > > > > > +vsocket->async_copy =3D flags & RTE_VHOST_USER_ASYNC_COPY; > > > + > > > +if (vsocket->async_copy && > > > +(flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > > > +RTE_VHOST_USER_POSTCOPY_SUPPORT))) { > > > +VHOST_LOG_CONFIG(ERR, "error: enabling async copy and > > > IOMMU " > > > +"or post-copy feature simultaneously is not " > > > +"supported\n"); > > > +goto out_mutex; > > > +} > > > + > > > /* > > > * Set the supported features correctly for the builtin vhost-user > > > * net driver. > > > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c inde= x > > > 0266318..e6b688a 100644 > > > --- a/lib/librte_vhost/vhost.c > > > +++ b/lib/librte_vhost/vhost.c > > > @@ -332,8 +332,13 @@ > > > { > > > if (vq_is_packed(dev)) > > > rte_free(vq->shadow_used_packed); > > > -else > > > +else { > > > rte_free(vq->shadow_used_split); > > > +if (vq->async_pkts_pending) > > > +rte_free(vq->async_pkts_pending); > > > +if (vq->async_pending_info) > > > +rte_free(vq->async_pending_info); > > > +} > > > rte_free(vq->batch_copy_elems); > > > rte_mempool_free(vq->iotlb_pool); > > > rte_free(vq); > > > @@ -1527,3 +1532,70 @@ int rte_vhost_extern_callback_register(int > vid, > > > if (vhost_data_log_level >=3D 0) > > > rte_log_set_level(vhost_data_log_level, > > > RTE_LOG_WARNING); > > > } > > > + > > > +int rte_vhost_async_channel_register(int vid, uint16_t queue_id, > > > +uint32_t features, > > > +struct rte_vhost_async_channel_ops > > > *ops) > > > +{ > > > +struct vhost_virtqueue *vq; > > > +struct virtio_net *dev =3D get_device(vid); > > > +struct dma_channel_features f; > > > + > > > +if (dev =3D=3D NULL || ops =3D=3D NULL) > > > +return -1; > > > + > > > +f.intval =3D features; > > > + > > > +vq =3D dev->virtqueue[queue_id]; > > > + > > > +if (vq =3D=3D NULL) > > > +return -1; > > > + > > > +/** packed queue is not supported */ > > > +if (vq_is_packed(dev) || !f.inorder) > > > +return -1; > > > + > > Virtio already has in_order concept, these two names are so like and ca= n > be > > easily messed up. Please consider how to distinguish them. > > > What about "async_inorder" Look great for me. >=20 > > > +if (ops->check_completed_copies =3D=3D NULL || > > > +ops->transfer_data =3D=3D NULL) > > > +return -1; > > > + > > > > Previous error is unlikely to be true, unlikely macro may be helpful fo= r > > understanding. > > > Will update in v2 patch >=20 > > > +rte_spinlock_lock(&vq->access_lock); > > > + > > > +vq->async_ops.check_completed_copies =3D ops- > > > >check_completed_copies; > > > +vq->async_ops.transfer_data =3D ops->transfer_data; > > > + > > > +vq->async_inorder =3D f.inorder; > > > +vq->async_threshold =3D f.threshold; > > > + > > > +vq->async_registered =3D true; > > > + > > > +rte_spinlock_unlock(&vq->access_lock); > > > + > > > +return 0; > > > +} > > > + > > > +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id) { > > > +struct vhost_virtqueue *vq; > > > +struct virtio_net *dev =3D get_device(vid); > > > + > > > +if (dev =3D=3D NULL) > > > +return -1; > > > + > > > +vq =3D dev->virtqueue[queue_id]; > > > + > > > +if (vq =3D=3D NULL) > > > +return -1; > > > + > > > +rte_spinlock_lock(&vq->access_lock); > > > + > > > +vq->async_ops.transfer_data =3D NULL; > > > +vq->async_ops.check_completed_copies =3D NULL; > > > + > > > +vq->async_registered =3D false; > > > + > > > +rte_spinlock_unlock(&vq->access_lock); > > > + > > > +return 0; > > > +} > > > + > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h inde= x > > > df98d15..a7fbe23 100644 > > > --- a/lib/librte_vhost/vhost.h > > > +++ b/lib/librte_vhost/vhost.h > > > @@ -23,6 +23,8 @@ > > > #include "rte_vhost.h" > > > #include "rte_vdpa.h" > > > > > > +#include "rte_vhost_async.h" > > > + > > > /* Used to indicate that the device is running on a data core */ > > > #define VIRTIO_DEV_RUNNING 1 > > > /* Used to indicate that the device is ready to operate */ @@ -39,6 > > > +41,11 @@ > > > > > > #define VHOST_LOG_CACHE_NR 32 > > > > > > +#define MAX_PKT_BURST 32 > > > + > > > +#define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2) #define > > > +VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2) > > > + > > > #define PACKED_DESC_ENQUEUE_USED_FLAG(w)\ > > > ((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | > > > VRING_DESC_F_WRITE) : \ > > > VRING_DESC_F_WRITE) > > > @@ -200,6 +207,25 @@ struct vhost_virtqueue { > > > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > > > intiotlb_cache_nr; > > > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > > > + > > > +/* operation callbacks for async dma */ > > > +struct rte_vhost_async_channel_opsasync_ops; > > > + > > > +struct iov_it it_pool[VHOST_MAX_ASYNC_IT]; > > > +struct iovec vec_pool[VHOST_MAX_ASYNC_VEC]; > > > + > > > +/* async data transfer status */ > > > +uintptr_t**async_pkts_pending; > > > +#defineASYNC_PENDING_INFO_N_MSK 0xFFFF > > > +#defineASYNC_PENDING_INFO_N_SFT 16 > > > +uint64_t*async_pending_info; > > > +uint16_tasync_pkts_idx; > > > +uint16_tasync_pkts_inflight_n; > > > + > > > +/* vq async features */ > > > +boolasync_inorder; > > > +boolasync_registered; > > > +uint16_tasync_threshold; > > > } __rte_cache_aligned; > > > > > > /* Old kernels have no such macros defined */ @@ -353,6 +379,7 @@ > > > struct virtio_net { > > > int16_tbroadcast_rarp; > > > uint32_tnr_vring; > > > intdequeue_zero_copy; > > > +intasync_copy; > > > intextbuf; > > > intlinearbuf; > > > struct vhost_virtqueue*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; > > > @@ -702,7 +729,8 @@ uint64_t translate_log_addr(struct virtio_net > > > *dev, struct vhost_virtqueue *vq, > > > /* Don't kick guest if we don't reach index specified by guest. */ > > > if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { > > > uint16_t old =3D vq->signalled_used; > > > -uint16_t new =3D vq->last_used_idx; > > > +uint16_t new =3D vq->async_pkts_inflight_n ? > > > +vq->used->idx:vq->last_used_idx; > > > bool signalled_used_valid =3D vq->signalled_used_valid; > > > > > > vq->signalled_used =3D new; > > > diff --git a/lib/librte_vhost/vhost_user.c > > > b/lib/librte_vhost/vhost_user.c index 84bebad..d7600bf 100644 > > > --- a/lib/librte_vhost/vhost_user.c > > > +++ b/lib/librte_vhost/vhost_user.c > > > @@ -464,12 +464,25 @@ > > > } else { > > > if (vq->shadow_used_split) > > > rte_free(vq->shadow_used_split); > > > +if (vq->async_pkts_pending) > > > +rte_free(vq->async_pkts_pending); > > > +if (vq->async_pending_info) > > > +rte_free(vq->async_pending_info); > > > + > > > vq->shadow_used_split =3D rte_malloc(NULL, > > > vq->size * sizeof(struct vring_used_elem), > > > RTE_CACHE_LINE_SIZE); > > > -if (!vq->shadow_used_split) { > > > +vq->async_pkts_pending =3D rte_malloc(NULL, > > > +vq->size * sizeof(uintptr_t), > > > +RTE_CACHE_LINE_SIZE); > > > +vq->async_pending_info =3D rte_malloc(NULL, > > > +vq->size * sizeof(uint64_t), > > > +RTE_CACHE_LINE_SIZE); > > > +if (!vq->shadow_used_split || > > > +!vq->async_pkts_pending || > > > +!vq->async_pending_info) { > > > VHOST_LOG_CONFIG(ERR, > > > -"failed to allocate memory for > > > shadow used ring.\n"); > > > +"failed to allocate memory for vq > > > internal data.\n"); > > > > If async copy not enabled, there will be no need to allocate related > structures. > > > Will update in v2 patch >=20 > Thanks, >=20 > Patrick