From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0AFB5A00BE; Tue, 7 Jul 2020 10:22:18 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E58451D9E7; Tue, 7 Jul 2020 10:22:16 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 0F1481D619 for ; Tue, 7 Jul 2020 10:22:14 +0200 (CEST) IronPort-SDR: T495kX83oElcNNM5w0N7p3WlJyfU6s31y/3z3dy2RldS1P3EWGU+T/O8yGPeaHdU019Irmbk7K f7o3AjxWukhQ== X-IronPort-AV: E=McAfee;i="6000,8403,9674"; a="127150403" X-IronPort-AV: E=Sophos;i="5.75,323,1589266800"; d="scan'208";a="127150403" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2020 01:22:14 -0700 IronPort-SDR: aKvl0h87wBSBDsknK01idc9S05IXDcIUqiUte9ERzOq3fbq8+mZ2IFwXBEj9/+B7GHbRGNeFje 1/M5+J79hESg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,323,1589266800"; d="scan'208";a="305579834" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga004.fm.intel.com with ESMTP; 07 Jul 2020 01:22:13 -0700 Received: from fmsmsx605.amr.corp.intel.com (10.18.126.85) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 7 Jul 2020 01:22:11 -0700 Received: from fmsmsx605.amr.corp.intel.com (10.18.126.85) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Tue, 7 Jul 2020 01:22:10 -0700 Received: from FMSEDG001.ED.cps.intel.com (10.1.192.133) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Tue, 7 Jul 2020 01:22:10 -0700 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.176) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 7 Jul 2020 01:22:10 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=COmBaEbdZXYiQQcByTydzUSOCAq825gs6CRnkT9yRl1XnNI/bs0CeN8/o93bMrp/uTOTF/QCwSFhx3o/AfV09eTYLh0CY8Lq1+68cTSXfrG3g6kc8ui1UkrJW1kC19YFnnK7H+8Q/pIGyMXZMnwhLxWZzwxwRV/DIlDr2Xsq/YujqL4Rl9jsA5wfFDOjPJgvONJHt/krVCU/vNnO9+4ZIUYdbTU6+4SQqdXVBBOKPO789sQpTgpRmJJicrMF5wn3+WP9cF8StEEhEBJ79eRqKegR/hnVLqSdLeLLqJTOL/vz1Oea1ISWIDdSm9a4pLQOZYcgaXyWO2/LnwedWW/Low== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CPfaPmfIf8escIeJ1V4+tbDMUuvkSbfYk9glyH1QxOA=; b=dzgtStQUNNXteQXvgL86YI5SVd4hSRQ24aJXuDqFthuzfy3W88FssASJUTekCGABBcuCDOGdjH5bYdsxspx+MtEX/5U1b1P7k4QadRPkR3NJQ5ZMmDoW7OKz+plkbirGiOyy/6BYf27fZJArg0fj/8h1prUUYgYdZcqzbpA/THyIiSTxaWCT1l0OOHuqNL1LHjK7lBvXvztVOi2e+xDn56NahCOr9lZLwXhrTwpvwLCWoEJSBeocUs7W6/9JG0uLCDKsqnzVqE8hC4+ilzxp+E1koDG5swKjEGsbLIQ63mo9PaXheCjFOdhwpowN5U6YlkYinn1hP7i3nnqwyg9Bgw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CPfaPmfIf8escIeJ1V4+tbDMUuvkSbfYk9glyH1QxOA=; b=crzxdFB5e5sv85GoE9vgiIPgqmRXXCkjvkAayIGJSwRCHOb8ZollxlTWkE7DiPdMtY/HbQCthEB85kvBS3Okr323+aiueHHBNWVQwUnmrHoBSju3rrERtvo6ItLr7ZoW761lGys0eDHQPN9wiv33TMi7ahQovTkpN/AMliEWghg= Received: from MN2PR11MB4063.namprd11.prod.outlook.com (2603:10b6:208:13f::22) by MN2PR11MB3712.namprd11.prod.outlook.com (2603:10b6:208:f6::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3153.20; Tue, 7 Jul 2020 08:22:08 +0000 Received: from MN2PR11MB4063.namprd11.prod.outlook.com ([fe80::7cde:8326:5010:c47e]) by MN2PR11MB4063.namprd11.prod.outlook.com ([fe80::7cde:8326:5010:c47e%7]) with mapi id 15.20.3153.029; Tue, 7 Jul 2020 08:22:08 +0000 From: "Xia, Chenbo" To: "Fu, Patrick" , "dev@dpdk.org" , "maxime.coquelin@redhat.com" , "Wang, Zhihong" CC: "Wang, Yinan" , "Jiang, Cheng1" , "Liang, Cunming" Thread-Topic: [PATCH v6 1/2] vhost: introduce async enqueue registration API Thread-Index: AQHWVBx+H+h2Y9MdREmD5ClBeN8Niqj7xulQ Date: Tue, 7 Jul 2020 08:22:08 +0000 Message-ID: References: <1591869725-13331-1-git-send-email-patrick.fu@intel.com> <20200707050709.205480-1-patrick.fu@intel.com> <20200707050709.205480-2-patrick.fu@intel.com> In-Reply-To: <20200707050709.205480-2-patrick.fu@intel.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [192.198.147.218] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 44790960-7990-4c10-4ba6-08d8224ed72f x-ms-traffictypediagnostic: MN2PR11MB3712: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0457F11EAF x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: hLW/VPLaN0dwSDb3CJWzxzs3SU2FyPudt69PI0i3XSSsqK7T5vzuIfsA4Lf/6Z7SBiVZaBR1cRpEY5PEhQ9w0b7+VNkW0jq0qeto7LyFmNmHHI0DT/jMKaR5zZCTBc0QJqS9hxy7M65cVmcd68oJ0q5MyFSpadDz6DT+RMF9M+k6sHILrjURAA7FxUla44Wx3Xj9Qg87qA2HYRzWxeIVYc9WTytNW3nmvNm373buPwPrdgOzmG6g4VcRZ4ZgFYQIxF261beIkIAQyRwpGLSJjW87tY6TJKKQ8H21GN5dDgJjPkqWFzjxXn9FsMk81FYk x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MN2PR11MB4063.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(4636009)(39860400002)(136003)(366004)(346002)(396003)(376002)(26005)(2906002)(110136005)(316002)(6636002)(478600001)(66946007)(64756008)(66446008)(66556008)(66476007)(7696005)(4326008)(6506007)(54906003)(76116006)(107886003)(30864003)(53546011)(33656002)(52536014)(8936002)(186003)(8676002)(86362001)(55016002)(5660300002)(9686003)(83380400001)(71200400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: gofslYkbLsQ1xUIJ0FuhokRe29XuehzmG7VJEEOt/XVzhkSYrNcygNZKQ4RMV+plhJi06c/BNhDDVmHummrqNUZjCA56ohhwRck7Bxt3xtMpSsGLftGgnI9Z9NKMZn8LjSWrWJRBbg1Q6Wze3F5vpRZ29jNZYV7oMEUeFeu1KJzrwtBAJRn6t3sajOxKI6wdJaMhEDydscHL7pK7s78MxoLtEuzj+6Eb3lrl/X6TUJQVHbXcaIdsvhf35wWluRdX70Zhy2TATOI28E+kjYLVbbzRDM7T1lnCJRYV7TV85yc4rQSyGsDFXDZUV0QnLfdqMUBIIvW++VL8PQY1K8Vlswo3HmzgSR2YVVySn1HNh33oVva5c7jgKHdT0kSWaqywjUVbDSqW+mp4IhYtCN4W51emStbLoMaJdgPqNsetVpCUZzsVhCYi8n8AppvdeC3ccMMh8Spa4GGd87heEbUYycTm7COf5xYdI9mQH0cjXO5c5ahxrk/2CXR4ag2s3n9+ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MN2PR11MB4063.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 44790960-7990-4c10-4ba6-08d8224ed72f X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Jul 2020 08:22:08.4400 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: LXpakB95Gts2zFTJMtQQ9L4WbWze3Uhr121ExHaO5rJ6bW9blZk/WGknYkI/303Fm0dW4T9/9tmcfMfBelScDw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR11MB3712 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v6 1/2] vhost: introduce async enqueue registration API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Fu, Patrick > Sent: Tuesday, July 7, 2020 1:07 PM > To: dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo > ; Wang, Zhihong > Cc: Fu, Patrick ; Wang, Yinan ; > Jiang, Cheng1 ; Liang, Cunming > > Subject: [PATCH v6 1/2] vhost: introduce async enqueue registration API >=20 > From: Patrick Fu >=20 > Performing large memory copies usually takes up a major part of CPU cycle= s and > becomes the hot spot in vhost-user enqueue operation. To offload the larg= e > copies from CPU to the DMA devices, asynchronous APIs are introduced, wit= h > which the CPU just submits copy jobs to the DMA but without waiting for i= ts > copy completion. Thus, there is no CPU intervention during data transfer.= We > can save precious CPU cycles and improve the overall throughput for vhost= -user > based applications. This patch introduces registration/un-registration AP= Is for > vhost async data enqueue operation. Together with the registration APIs > implementations, data structures and the prototype of the async callback > functions required for async enqueue data path are also defined. >=20 > Signed-off-by: Patrick Fu > --- > lib/librte_vhost/Makefile | 2 +- > lib/librte_vhost/meson.build | 2 +- > lib/librte_vhost/rte_vhost.h | 1 + > lib/librte_vhost/rte_vhost_async.h | 136 +++++++++++++++++++++++++ > lib/librte_vhost/rte_vhost_version.map | 4 + > lib/librte_vhost/socket.c | 27 +++++ > lib/librte_vhost/vhost.c | 127 ++++++++++++++++++++++- > lib/librte_vhost/vhost.h | 30 +++++- > lib/librte_vhost/vhost_user.c | 23 ++++- > 9 files changed, 345 insertions(+), 7 deletions(-) create mode 100644 > lib/librte_vhost/rte_vhost_async.h >=20 > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index > b7ff7dc4b..4f2f3e47d 100644 > --- a/lib/librte_vhost/Makefile > +++ b/lib/librte_vhost/Makefile > @@ -42,7 +42,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) :=3D fd_man.c iotlb.c > socket.c vhost.c \ >=20 > # install includes > SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include +=3D rte_vhost.h rte_vdpa.h \ > - rte_vdpa_dev.h > + rte_vdpa_dev.h > rte_vhost_async.h >=20 > # only compile vhost crypto when cryptodev is enabled ifeq > ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build = index > 882a0eaf4..cc9aa65c6 100644 > --- a/lib/librte_vhost/meson.build > +++ b/lib/librte_vhost/meson.build > @@ -22,5 +22,5 @@ sources =3D files('fd_man.c', 'iotlb.c', 'socket.c', 'v= dpa.c', > 'vhost.c', 'vhost_user.c', > 'virtio_net.c', 'vhost_crypto.c') > headers =3D files('rte_vhost.h', 'rte_vdpa.h', 'rte_vdpa_dev.h', > - 'rte_vhost_crypto.h') > + 'rte_vhost_crypto.h', 'rte_vhost_async.h') > deps +=3D ['ethdev', 'cryptodev', 'hash', 'pci'] diff --git > a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index > 8a5c332c8..f93f9595a 100644 > --- a/lib/librte_vhost/rte_vhost.h > +++ b/lib/librte_vhost/rte_vhost.h > @@ -35,6 +35,7 @@ extern "C" { > #define RTE_VHOST_USER_EXTBUF_SUPPORT (1ULL << 5) > /* support only linear buffers (no chained mbufs) */ > #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) > +#define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) >=20 > /* Features. */ > #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE > diff --git a/lib/librte_vhost/rte_vhost_async.h > b/lib/librte_vhost/rte_vhost_async.h > new file mode 100644 > index 000000000..d5a59279a > --- /dev/null > +++ b/lib/librte_vhost/rte_vhost_async.h > @@ -0,0 +1,136 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _RTE_VHOST_ASYNC_H_ > +#define _RTE_VHOST_ASYNC_H_ > + > +#include "rte_vhost.h" > + > +/** > + * iovec iterator > + */ > +struct rte_vhost_iov_iter { > + /** offset to the first byte of interesting data */ > + size_t offset; > + /** total bytes of data in this iterator */ > + size_t count; > + /** pointer to the iovec array */ > + struct iovec *iov; > + /** number of iovec in this iterator */ > + unsigned long nr_segs; > +}; > + > +/** > + * dma transfer descriptor pair > + */ > +struct rte_vhost_async_desc { > + /** source memory iov_iter */ > + struct rte_vhost_iov_iter *src; > + /** destination memory iov_iter */ > + struct rte_vhost_iov_iter *dst; > +}; > + > +/** > + * dma transfer status > + */ > +struct rte_vhost_async_status { > + /** An array of application specific data for source memory */ > + uintptr_t *src_opaque_data; > + /** An array of application specific data for destination memory */ > + uintptr_t *dst_opaque_data; > +}; > + > +/** > + * dma operation callbacks to be implemented by applications */ struct > +rte_vhost_async_channel_ops { > + /** > + * instruct async engines to perform copies for a batch of packets > + * > + * @param vid > + * id of vhost device to perform data copies > + * @param queue_id > + * queue id to perform data copies > + * @param descs > + * an array of DMA transfer memory descriptors > + * @param opaque_data > + * opaque data pair sending to DMA engine > + * @param count > + * number of elements in the "descs" array > + * @return > + * -1 on failure, number of descs processed on success > + */ > + int (*transfer_data)(int vid, uint16_t queue_id, > + struct rte_vhost_async_desc *descs, > + struct rte_vhost_async_status *opaque_data, > + uint16_t count); > + /** > + * check copy-completed packets from the async engine > + * @param vid > + * id of vhost device to check copy completion > + * @param queue_id > + * queue id to check copyp completion > + * @param opaque_data > + * buffer to receive the opaque data pair from DMA engine > + * @param max_packets > + * max number of packets could be completed > + * @return > + * -1 on failure, number of iov segments completed on success > + */ > + int (*check_completed_copies)(int vid, uint16_t queue_id, > + struct rte_vhost_async_status *opaque_data, > + uint16_t max_packets); > +}; > + > +/** > + * dma channel feature bit definition > + */ > +struct rte_vhost_async_features { > + union { > + uint32_t intval; > + struct { > + uint32_t async_inorder:1; > + uint32_t resvd_0:15; > + uint32_t async_threshold:12; > + uint32_t resvd_1:4; > + }; > + }; > +}; > + > +/** > + * register a async channel for vhost > + * > + * @param vid > + * vhost device id async channel to be attached to > + * @param queue_id > + * vhost queue id async channel to be attached to > + * @param features > + * DMA channel feature bit > + * b0 : DMA supports inorder data transfer > + * b1 - b15: reserved > + * b16 - b27: Packet length threshold for DMA transfer > + * b28 - b31: reserved > + * @param ops > + * DMA operation callbacks > + * @return > + * 0 on success, -1 on failures > + */ > +__rte_experimental > +int rte_vhost_async_channel_register(int vid, uint16_t queue_id, > + uint32_t features, struct rte_vhost_async_channel_ops *ops); > + > +/** > + * unregister a dma channel for vhost > + * > + * @param vid > + * vhost device id DMA channel to be detached > + * @param queue_id > + * vhost queue id DMA channel to be detached > + * @return > + * 0 on success, -1 on failures > + */ > +__rte_experimental > +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id); > + > +#endif /* _RTE_VHOST_ASYNC_H_ */ > diff --git a/lib/librte_vhost/rte_vhost_version.map > b/lib/librte_vhost/rte_vhost_version.map > index 86784405a..13ec53b63 100644 > --- a/lib/librte_vhost/rte_vhost_version.map > +++ b/lib/librte_vhost/rte_vhost_version.map > @@ -71,4 +71,8 @@ EXPERIMENTAL { > rte_vdpa_get_queue_num; > rte_vdpa_get_features; > rte_vdpa_get_protocol_features; > + rte_vhost_async_channel_register; > + rte_vhost_async_channel_unregister; > + rte_vhost_submit_enqueue_burst; > + rte_vhost_poll_enqueue_completed; > }; > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index > 49267cebf..c4626d2c4 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -42,6 +42,7 @@ struct vhost_user_socket { > bool use_builtin_virtio_net; > bool extbuf; > bool linearbuf; > + bool async_copy; >=20 > /* > * The "supported_features" indicates the feature bits the @@ -205,6 > +206,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket > *vsocket) > size_t size; > struct vhost_user_connection *conn; > int ret; > + struct virtio_net *dev; >=20 > if (vsocket =3D=3D NULL) > return; > @@ -236,6 +238,13 @@ vhost_user_add_connection(int fd, struct > vhost_user_socket *vsocket) > if (vsocket->linearbuf) > vhost_enable_linearbuf(vid); >=20 > + if (vsocket->async_copy) { > + dev =3D get_device(vid); > + > + if (dev) > + dev->async_copy =3D 1; > + } > + > VHOST_LOG_CONFIG(INFO, "new device, handle is %d\n", vid); >=20 > if (vsocket->notify_ops->new_connection) { @@ -881,6 +890,17 @@ > rte_vhost_driver_register(const char *path, uint64_t flags) > goto out_mutex; > } >=20 > + vsocket->async_copy =3D flags & RTE_VHOST_USER_ASYNC_COPY; > + > + if (vsocket->async_copy && > + (flags & (RTE_VHOST_USER_IOMMU_SUPPORT | > + RTE_VHOST_USER_POSTCOPY_SUPPORT))) { > + VHOST_LOG_CONFIG(ERR, "error: enabling async copy and > IOMMU " > + "or post-copy feature simultaneously is not " > + "supported\n"); > + goto out_mutex; > + } > + > /* > * Set the supported features correctly for the builtin vhost-user > * net driver. > @@ -931,6 +951,13 @@ rte_vhost_driver_register(const char *path, uint64_t > flags) > ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT); > } >=20 > + if (vsocket->async_copy) { > + vsocket->supported_features &=3D ~(1ULL << VHOST_F_LOG_ALL); > + vsocket->features &=3D ~(1ULL << VHOST_F_LOG_ALL); > + VHOST_LOG_CONFIG(INFO, > + "Logging feature is disabled in async copy mode\n"); > + } > + > /* > * We'll not be able to receive a buffer from guest in linear mode > * without external buffer if it will not fit in a single mbuf, which i= s diff -- > git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index > 0d822d6a3..a11385f39 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -332,8 +332,13 @@ free_vq(struct virtio_net *dev, struct vhost_virtque= ue > *vq) { > if (vq_is_packed(dev)) > rte_free(vq->shadow_used_packed); > - else > + else { > rte_free(vq->shadow_used_split); > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + } > rte_free(vq->batch_copy_elems); > rte_mempool_free(vq->iotlb_pool); > rte_free(vq); > @@ -1522,3 +1527,123 @@ RTE_INIT(vhost_log_init) > if (vhost_data_log_level >=3D 0) > rte_log_set_level(vhost_data_log_level, > RTE_LOG_WARNING); } > + > +int rte_vhost_async_channel_register(int vid, uint16_t queue_id, > + uint32_t features, > + struct rte_vhost_async_channel_ops > *ops) { > + struct vhost_virtqueue *vq; > + struct virtio_net *dev =3D get_device(vid); > + struct rte_vhost_async_features f; > + > + if (dev =3D=3D NULL || ops =3D=3D NULL) > + return -1; > + > + f.intval =3D features; > + > + vq =3D dev->virtqueue[queue_id]; > + > + if (unlikely(vq =3D=3D NULL || !dev->async_copy)) > + return -1; > + > + /* packed queue is not supported */ > + if (unlikely(vq_is_packed(dev) || !f.async_inorder)) { > + VHOST_LOG_CONFIG(ERR, > + "async copy is not supported on packed queue or non- > inorder mode " > + "(vid %d, qid: %d)\n", vid, queue_id); > + return -1; > + } > + > + if (unlikely(ops->check_completed_copies =3D=3D NULL || > + ops->transfer_data =3D=3D NULL)) > + return -1; > + > + rte_spinlock_lock(&vq->access_lock); > + > + if (unlikely(vq->async_registered)) { > + VHOST_LOG_CONFIG(ERR, > + "async register failed: channel already registered " > + "(vid %d, qid: %d)\n", vid, queue_id); > + goto reg_out; > + } > + > + vq->async_pkts_pending =3D rte_malloc(NULL, > + vq->size * sizeof(uintptr_t), > + RTE_CACHE_LINE_SIZE); > + vq->async_pending_info =3D rte_malloc(NULL, > + vq->size * sizeof(uint64_t), > + RTE_CACHE_LINE_SIZE); > + if (!vq->async_pkts_pending || !vq->async_pending_info) { > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + > + VHOST_LOG_CONFIG(ERR, > + "async register failed: cannot allocate memory > for vq data " > + "(vid %d, qid: %d)\n", vid, queue_id); > + goto reg_out; > + } > + > + vq->async_ops.check_completed_copies =3D ops- > >check_completed_copies; > + vq->async_ops.transfer_data =3D ops->transfer_data; > + > + vq->async_inorder =3D f.async_inorder; > + vq->async_threshold =3D f.async_threshold; > + > + vq->async_registered =3D true; > + > +reg_out: > + rte_spinlock_unlock(&vq->access_lock); > + > + return 0; > +} > + > +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id) { > + struct vhost_virtqueue *vq; > + struct virtio_net *dev =3D get_device(vid); > + int ret =3D -1; > + > + if (dev =3D=3D NULL) > + return ret; > + > + vq =3D dev->virtqueue[queue_id]; > + > + if (vq =3D=3D NULL) > + return ret; > + > + ret =3D 0; > + rte_spinlock_lock(&vq->access_lock); > + > + if (!vq->async_registered) > + goto out; > + > + if (vq->async_pkts_inflight_n) { > + VHOST_LOG_CONFIG(ERR, "Failed to unregister async channel. > " > + "async inflight packets must be completed before > unregistration.\n"); > + ret =3D -1; > + goto out; > + } > + > + if (vq->async_pkts_pending) { > + rte_free(vq->async_pkts_pending); > + vq->async_pkts_pending =3D NULL; > + } > + > + if (vq->async_pending_info) { > + rte_free(vq->async_pending_info); > + vq->async_pending_info =3D NULL; > + } > + > + vq->async_ops.transfer_data =3D NULL; > + vq->async_ops.check_completed_copies =3D NULL; > + vq->async_registered =3D false; > + > +out: > + rte_spinlock_unlock(&vq->access_lock); > + > + return ret; > +} > + > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index > 034463699..f3731982b 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -24,6 +24,8 @@ > #include "rte_vdpa.h" > #include "rte_vdpa_dev.h" >=20 > +#include "rte_vhost_async.h" > + > /* Used to indicate that the device is running on a data core */ #defin= e > VIRTIO_DEV_RUNNING 1 > /* Used to indicate that the device is ready to operate */ @@ -40,6 +42,= 11 @@ >=20 > #define VHOST_LOG_CACHE_NR 32 >=20 > +#define MAX_PKT_BURST 32 > + > +#define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2) #define > +VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2) > + > #define PACKED_DESC_ENQUEUE_USED_FLAG(w) \ > ((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | > VRING_DESC_F_WRITE) : \ > VRING_DESC_F_WRITE) > @@ -202,6 +209,25 @@ struct vhost_virtqueue { > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > int iotlb_cache_nr; > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > + > + /* operation callbacks for async dma */ > + struct rte_vhost_async_channel_ops async_ops; > + > + struct rte_vhost_iov_iter it_pool[VHOST_MAX_ASYNC_IT]; > + struct iovec vec_pool[VHOST_MAX_ASYNC_VEC]; > + > + /* async data transfer status */ > + uintptr_t **async_pkts_pending; > + #define ASYNC_PENDING_INFO_N_MSK 0xFFFF > + #define ASYNC_PENDING_INFO_N_SFT 16 > + uint64_t *async_pending_info; > + uint16_t async_pkts_idx; > + uint16_t async_pkts_inflight_n; > + > + /* vq async features */ > + bool async_inorder; > + bool async_registered; > + uint16_t async_threshold; > } __rte_cache_aligned; >=20 > #define VHOST_MAX_VRING 0x100 > @@ -338,6 +364,7 @@ struct virtio_net { > int16_t broadcast_rarp; > uint32_t nr_vring; > int dequeue_zero_copy; > + int async_copy; > int extbuf; > int linearbuf; > struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; > @@ -683,7 +710,8 @@ vhost_vring_call_split(struct virtio_net *dev, struct > vhost_virtqueue *vq) > /* Don't kick guest if we don't reach index specified by guest. */ > if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { > uint16_t old =3D vq->signalled_used; > - uint16_t new =3D vq->last_used_idx; > + uint16_t new =3D vq->async_pkts_inflight_n ? > + vq->used->idx:vq->last_used_idx; > bool signalled_used_valid =3D vq->signalled_used_valid; >=20 > vq->signalled_used =3D new; > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.= c index > 6039a8fdb..aa8605523 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -476,12 +476,14 @@ vhost_user_set_vring_num(struct virtio_net **pdev, > } else { > if (vq->shadow_used_split) > rte_free(vq->shadow_used_split); > + > vq->shadow_used_split =3D rte_malloc(NULL, > vq->size * sizeof(struct vring_used_elem), > RTE_CACHE_LINE_SIZE); > + > if (!vq->shadow_used_split) { > VHOST_LOG_CONFIG(ERR, > - "failed to allocate memory for shadow > used ring.\n"); > + "failed to allocate memory for vq > internal data.\n"); > return RTE_VHOST_MSG_RESULT_ERR; > } > } > @@ -1166,7 +1168,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, > struct VhostUserMsg *msg, > goto err_mmap; > } >=20 > - populate =3D (dev->dequeue_zero_copy) ? MAP_POPULATE : 0; > + populate =3D (dev->dequeue_zero_copy || dev->async_copy) ? > + MAP_POPULATE : 0; > mmap_addr =3D mmap(NULL, mmap_size, PROT_READ | > PROT_WRITE, > MAP_SHARED | populate, fd, 0); >=20 > @@ -1181,7 +1184,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, > struct VhostUserMsg *msg, > reg->host_user_addr =3D (uint64_t)(uintptr_t)mmap_addr + > mmap_offset; >=20 > - if (dev->dequeue_zero_copy) > + if (dev->dequeue_zero_copy || dev->async_copy) > if (add_guest_pages(dev, reg, alignment) < 0) { > VHOST_LOG_CONFIG(ERR, > "adding guest pages to region %u > failed.\n", @@ -1979,6 +1982,12 @@ vhost_user_get_vring_base(struct > virtio_net **pdev, > } else { > rte_free(vq->shadow_used_split); > vq->shadow_used_split =3D NULL; > + if (vq->async_pkts_pending) > + rte_free(vq->async_pkts_pending); > + if (vq->async_pending_info) > + rte_free(vq->async_pending_info); > + vq->async_pkts_pending =3D NULL; > + vq->async_pending_info =3D NULL; > } >=20 > rte_free(vq->batch_copy_elems); > @@ -2012,6 +2021,14 @@ vhost_user_set_vring_enable(struct virtio_net > **pdev, > "set queue enable: %d to qp idx: %d\n", > enable, index); >=20 > + if (!enable && dev->virtqueue[index]->async_registered) { > + if (dev->virtqueue[index]->async_pkts_inflight_n) { > + VHOST_LOG_CONFIG(ERR, "failed to disable vring. " > + "async inflight packets must be completed first\n"); > + return RTE_VHOST_MSG_RESULT_ERR; > + } > + } > + > /* On disable, rings have to be stopped being processed. */ > if (!enable && dev->dequeue_zero_copy) > drain_zmbuf_list(dev->virtqueue[index]); > -- > 2.18.4 Reviewed-by: Chenbo Xia