From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id CD4B4A0583; Thu, 19 Mar 2020 10:10:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AE58F2B9E; Thu, 19 Mar 2020 10:10:38 +0100 (CET) Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [216.205.24.74]) by dpdk.org (Postfix) with ESMTP id 8BA8CCF3 for ; Thu, 19 Mar 2020 10:10:37 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1584609037; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=CYT/CjHTeG/5xLPQK11AcQcjgteBDj1jUW+u5SvZSx4=; b=dX6skXkaM50csauLZQYIFPdmCAhV01xnFGEu+lLhhfk/2Xg9xDuUEOsVgETQd7BaS/pex7 veLj2cEWiXAdXiiDUYmUQTkAVSX70a31UpeNGIV7J/JU5RWgCfkFR2K41k1pkyiXuQ01xR Q3Rskea7ZVvIu7uElzIbYQfGJVcuvfo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-1-tvNQeE6cM5GaWVkFAbljaw-1; Thu, 19 Mar 2020 05:10:32 -0400 X-MC-Unique: tvNQeE6cM5GaWVkFAbljaw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 66A7513EA; Thu, 19 Mar 2020 09:10:31 +0000 (UTC) Received: from [10.36.110.21] (unknown [10.36.110.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3259C1001DF0; Thu, 19 Mar 2020 09:10:30 +0000 (UTC) To: "Hu, Jiayu" , "dev@dpdk.org" Cc: "Ye, Xiaolong" , "Wang, Zhihong" References: <1584436885-18651-1-git-send-email-jiayu.hu@intel.com> <370798e0-b006-4a33-d8d9-1aea7bf4af49@redhat.com> <33221483053a41e8bd8d4bd0cb582634@intel.com> From: Maxime Coquelin Autocrypt: addr=maxime.coquelin@redhat.com; keydata= mQINBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABtCxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPokCOAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASq5Ag0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAGJ Ah8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= Message-ID: <63c79431-96bf-79e9-fb75-1714e194257f@redhat.com> Date: Thu, 19 Mar 2020 10:10:28 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <33221483053a41e8bd8d4bd0cb582634@intel.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH 0/4] Support DMA-accelerated Tx operations for vhost-user PMD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Jiayu, On 3/19/20 8:33 AM, Hu, Jiayu wrote: > Hi Maxime, >=20 > Thanks for your comments. Replies are inline. >=20 >> -----Original Message----- >> From: Maxime Coquelin >> Sent: Tuesday, March 17, 2020 5:54 PM >> To: Hu, Jiayu ; dev@dpdk.org >> Cc: Ye, Xiaolong ; Wang, Zhihong >> >> Subject: Re: [PATCH 0/4] Support DMA-accelerated Tx operations for vhost= - >> user PMD >> >> Hi Jiayu, >> >> On 3/17/20 10:21 AM, Jiayu Hu wrote: >>> In vhost-user PMD's Tx operations, where data movement is heavily >> involved, >>> performing large memory copies usually takes up a major part of CPU >> cycles >>> and becomes the hot spot. To offload expensive memory operations from >> the >>> CPU, this patch set proposes to leverage DMA engines, e.g., I/OAT, a DM= A >>> engine in the Intel's processor, to accelerate large copies for vhost-u= ser. >>> >>> Large copies are offloaded from the CPU to the DMA in an asynchronous >>> manner. The CPU just submits copy jobs to the DMA but without waiting >>> for its copy completion. Thus, there is no CPU intervention during data >>> transfer; we can save precious CPU cycles and improve the overall >>> throughput for vhost-user PMD based applications, like OVS. During >>> packet transmission, it offloads large copies to the DMA and performs >>> small copies by the CPU, due to startup overheads associated with the D= MA. >>> >>> vhost-user PMD is able to support various DMA engines, but it just >>> supports I/OAT devices currently. In addition, I/OAT acceleration is on= ly >>> enabled for Tx operations of split rings. Users can explicitly assign a >>> I/OAT device to a queue by the parameter 'dmas'. However, one I/OAT >> device >>> can only be used by one queue, and a queue can use one I/OAT device at = a >>> time. >>> >>> We measure the performance in testpmd. With 1024 bytes packets, >> compared >>> with the original SW data path, DMA-enabled vhost-user PMD can improve >>> the throughput around 20%~30% in the VM2VM and PVP cases. >> Furthermore, >>> with larger packets, the throughput improvement will be higher. >> >> >> I'm not sure it should be done like that for several reasons. >> >> First, it seems really complex for the user to get the command line >> right. There is no mention in the doc patch on how to bind the DMAs to >> the DPDK application. Are all the DMAs on the system capable of doing >> it? >=20 > DMA engines in Intel CPU are able to move data within system memory. > Currently, we have I/OAT and we will have DSA in the future. OK, can you give me an example of how many I/OAT instances on a given CPU? >> I think it should be made transparent to the user, who should not have >> to specify the DMA device address in command line. The user should just >> pass a devarg specifying he wants to use DMAs, if available. >=20 > How do you think of replacing DMA address with specific DMA capabilities,= like > "dmas=3D[txq0@DMACOPY]". As I/OAT only supports data movement, we can > just provide basic DMA copy ability now. But when there are more DMA devi= ces, > we can add capabilities in devargs later. "dmas=3D[txq0@DMACOPY]" is still too complex IMHO. We should just have a flag to enable or not DMA (tx_dma=3D1 / tx_dma=3D0), and this would be used for all queues as we do for zero-copy. >> >> Second, it looks too much vendor-specific. IMHO, we should have a DMA >> framework, so that the driver can request DMA channels based on >> capabilities. >=20 > We only have one DMA engine, I/OAT, in DPDK, and it is implemented as > a rawdev. IMO, it will be very hard to provide a generic DMA abstraction > currently. In addition, I/OAT specific API is called inside vhost-user PM= D, > we can replace these function calls when we have a DMA framework in > the future. Users are unaware of the changes. Does it make sense to you? Having an abstraction might be hard, but it does not seem impossible. Such DMA abstraction has been done in the Kernel for IOAT. For example: https://lore.kernel.org/patchwork/cover/56714/ >> >> Also, I don't think implementing ring processing in the Vhost PMD is >> welcome, Vhost PMD should just be a wrapper for the Vhost library. Doing >> that in Vhost PMD causes code duplication, and will be a maintenance >> burden on the long run. >> >> As IOAT is a kind of acceleration, why not implement it through the vDPA >> framework? vDPA framework should be extended to support this kind of >> acceleration which requires some CPU processing, as opposed to full >> offload of the ring processing as it only supports today. >=20 > The main reason of implementing data path in vhost PMD is to avoid impact= ing > SW data path in vhost library. Even if we implement it as an instance of = vDPA, > we also have to implement data path in a new vdev PMD, as DMA just accele= rates > memory copy and all ring operations have to be done by the CPU. There is = still the > code duplication issue. Ok, so what about: Introducing a pair of callbacks in struct virtio_net for DMA enqueue and dequeue. lib/librte_vhost/ioat.c which would implement dma_enqueue and dma_dequeue callback for IOAT. As it will live in the vhost lib directory, it will be easy to refactor the code to share as much as possible and so avoid code duplication. In rte_vhost_enqueue/dequeue_burst, if the dma callback is set, then call it instead of the SW datapath. It adds a few cycle, but this is much more sane IMHO. What do you think? Thanks, Maxime > Thanks, > Jiayu >=20 >> >> Kind regards, >> Maxime >> >>> Jiayu Hu (4): >>> vhost: populate guest memory for DMA-accelerated vhost-user >>> net/vhost: setup vrings for DMA-accelerated datapath >>> net/vhost: leverage DMA engines to accelerate Tx operations >>> doc: add I/OAT acceleration support for vhost-user PMD >>> >>> doc/guides/nics/vhost.rst | 14 + >>> drivers/Makefile | 2 +- >>> drivers/net/vhost/Makefile | 6 +- >>> drivers/net/vhost/internal.h | 160 +++++++ >>> drivers/net/vhost/meson.build | 5 +- >>> drivers/net/vhost/rte_eth_vhost.c | 308 +++++++++++--- >>> drivers/net/vhost/virtio_net.c | 861 >> ++++++++++++++++++++++++++++++++++++++ >>> drivers/net/vhost/virtio_net.h | 288 +++++++++++++ >>> lib/librte_vhost/rte_vhost.h | 1 + >>> lib/librte_vhost/socket.c | 20 + >>> lib/librte_vhost/vhost.h | 2 + >>> lib/librte_vhost/vhost_user.c | 3 +- >>> 12 files changed, 1597 insertions(+), 73 deletions(-) >>> create mode 100644 drivers/net/vhost/internal.h >>> create mode 100644 drivers/net/vhost/virtio_net.c >>> create mode 100644 drivers/net/vhost/virtio_net.h >>> >=20