From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from new2-smtp.messagingengine.com (new2-smtp.messagingengine.com [66.111.4.224]) by dpdk.org (Postfix) with ESMTP id 3DE041B1FB for ; Fri, 6 Oct 2017 08:24:33 +0200 (CEST) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailnew.nyi.internal (Postfix) with ESMTP id 810721055; Fri, 6 Oct 2017 02:24:32 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute1.internal (MEProxy); Fri, 06 Oct 2017 02:24:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=Z0sduW/KvZnpJOEYlGdDV1u2E6/942/EtaUwqnNAu rk=; b=EJ41rIxO4IZhgB5RkkziiRdGe18ObO9QyuQCQmMtLsVNUk6SMoWJh9W8Z eVkMSzmU8TWQ6AKn1++b6fxeRUPWUn9xzGqZFNwJvXcM+7x3n6pECyHlqaOCMo2P CoaezipbXlfCluaxxyUnnO8CmrSpPauKh8kyhzqY4houfkY5gVP0ZAo3pMKcnbBR 9eJ+DDcl6FOw3uUgKVV1VzUZDQYH5v22WEuEGsSVxh5GPqEot3eTfDQ6/UaoCDU7 8aIkM4Mg23UGVbi2Ks1VsK/IIL1Su4PEvPPzyb48EdJIZyb3oPQqQr9MU5s4XzxL u1UqWmCsJ5P3zpRY5vFAjSH27tDsA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=Z0sduW/KvZnpJOEYlG dDV1u2E6/942/EtaUwqnNAurk=; b=peq61oZzkqmxuS4xpxTdI5M4/VOpkFU7O/ yq30LK4bK4b9fjcsXD35GQtoSugiEwGQaVW5rnTtQCb82J8SyvYN5yHIfjOxIgGf lHmVmCpeuJ3av8xIm574T0y0xmq+bxym4R8YBLkBcpPXSLZ2ZVhlFbzvmN2pQuNV XyDnPdiU/NxQeiIOMl/enUGcOCApNlJHEruQaHm0pITrW0IYMSphiOpaAhblnDdU EBwhxY7nKTgz5PeLrzZRiZDOoIoD/CqZ+rAPdsktMMmNqt1cvwMjhyXZFqeMtZGU Q97CAJZwvfyKnFBOOxmBkVhK5rlVnv3ymV0SnSkwcuvo1goxN03w== X-ME-Sender: X-Sasl-enc: nAjmHTbM5RqPy1oTZxLsqk9tkSEMiZb9KvECwLMK9YIW 1507271070 Received: from yliu-home (unknown [222.64.136.243]) by mail.messagingengine.com (Postfix) with ESMTPA id 893A724009; Fri, 6 Oct 2017 02:24:30 -0400 (EDT) Date: Fri, 6 Oct 2017 14:24:26 +0800 From: Yuanhan Liu To: Maxime Coquelin Cc: dev@dpdk.org, remy.horton@intel.com, tiwei.bie@intel.com, mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com, jasowang@redhat.com Message-ID: <20171006062426.GA1545@yliu-home> References: <20171005083627.27828-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171005083627.27828-1-maxime.coquelin@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [dpdk-dev] [PATCH v3 00/19] Vhost-user: Implement device IOTLB support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Oct 2017 06:24:33 -0000 Series applied to dpdk-next-virtio. Thanks. --yliu On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote: > This v3 lists the feature in the release note, and fixes the bug in > is_vring_iotlb_update() reported by Yuanhan. > > The purpose of this series is to add support for > VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the > vhost-user backend. It improves the guest safety by enabling the > possibility to isolate the Virtio device. > > It makes possible to use Virtio PMD in guest with using VFIO driver > without enable_unsafe_noiommu_mode parameter set, so that the DPDK > application on guest can only access memory its has been allowed to, > and preventing malicious/buggy DPDK application in guest to make > vhost-user backend write random guest memory. Note that Virtio-net > Kernel driver also support IOMMU. > > The series depends on Qemu's "vhost-user: Specify and implement > device IOTLB support" [0], available upstream and which will be part > of Qemu v2.10 release. > > Performance-wise, even if this RFC has still room for optimizations, > no performance degradation is noticed with static mappings (i.e. DPDK > on guest) with PVP benchmark: > Traffic Generator: Moongen (lua-trafficgen) > Acceptable Loss: 0.005% > Validation run time: 1 min > Guest DPDK version/commit: v17.05 > QEMU version/commit: master (6db174aed1fd) > Virtio features: default > CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > NIC: 2 x X710 > Page size: 1G host/1G guest > Results (bidirectional, total of the two flows): > - base: 18.8Mpps > - base + IOTLB series, IOMMU OFF: 18.8Mpps > - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21) > > This is explained because IOTLB misses, which are very costly, only > happen at startup time. Indeed, once used, the buffers are not > invalidated, so if the IOTLB cache is large enough, there will be only > cache hits. Also, the use of 1G huge pages improves the IOTLB cache > searching time by reducing the number of entries. > > With 2M hugepages, a performance degradation is seen with IOMMU on: > Traffic Generator: Moongen (lua-trafficgen) > Acceptable Loss: 0.005% > Validation run time: 1 min > Guest DPDK version/commit: v17.05 > QEMU version/commit: master (6db174aed1fd) > Virtio features: default > CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > NIC: 2 x X710 > Page size: 2M host/2M guest > Results (bidirectional, total of the two flows): > - base: 18.8Mpps > - base + IOTLB series, IOMMU OFF: 18.8Mpps > - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21) > > A possible improvement would be to merge contiguous IOTLB entries sharing > the same permissions. A very rough patch implementing this idea fixes > the performance degradation (18.8Mpps), but the required work to clean > it would delay this series after v17.11. > > With dynamic mappings (i.e. Virtio-net kernel driver), this is another > story. The performance is so poor it makes it almost unusable. Indeed, > since the Kernel driver unmaps the buffers as soon as they are handled, > almost all descriptors buffers addresses translations result in an IOTLB > miss. There is not much that can be done on DPDK side, except maybe > batching IOTLB miss requests no to break bursts, but it would require > a big rework. In Qemu, we may consider enabling IOMMU MAP notifications, > so that DPDK receives the IOTLB updates without having to send IOTLB miss > request. > > Regarding the design choices: > - I initially intended to use userspace RCU library[1] for the cache > implementation, but it would have added an external dependency, and the > lib is not available in all distros. Qemu for example got rid of this > dependency by copying some of the userspace RCU lib parts into Qemu tree, > but this is not possible with DPDK due to licensing issues (RCU lib is > LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr > locks. > - I initially implemented a per-device IOTLB cache, but the concurrent > acccesses on the IOTLB lock had huge impact on performance (~-40% in > bidirectionnal, expect even worse with multiqueue). I move to a per- > virtqueue IOTLB design, which prevents this concurrency. > - The slave IOTLB miss request supports reply-ack feature in spec, but > this version doesn't block or busy-wait for the corresponding update so > that other queues sharing the same lcore can be processed in the meantime. > > For those who would like to test the series, I made it available on > gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires > the intel_iommu=on parameter, and the guest should be started with and > iommu device attached to the virtio-net device. For example: > > ./qemu-system-x86_64 \ > -enable-kvm -m 4096 -smp 2 \ > -M q35,kernel-irqchip=split \ > -cpu host \ > -device intel-iommu,device-iotlb=on,intremap \ > -device ioh3420,id=root.1,chassis=1 \ > -chardev socket,id=char0,path=/tmp/vhost-user1 \ > -netdev type=vhost-user,id=hn2,chardev=char0 \ > -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \ > ... > > [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html > [1]: http://liburcu.org/ > [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2 > > Changes since v1: > ================= > - Add feature support in release note (Yuanhan) > - Fix return value in is_vring_iotlb_update() (Yuanhan) > > Changes since v1: > ================= > - Implement random cache entry eviction instead of removing all entries > when cache is full (Yuanhan) > - Adds bounds check on vring_idx (Remy) > - Initialize slave_req_fd to -1 (Tiwei) > - Remove virtio-net device lock (Tiwei) > - Simplify vhost_user_iotlb_cache_remove() (Tiwei) > - Squash iotlb lock usage optimization patch > - Inline noiommu part of vhost_iova_to_vva to remove performance > regressions with IOMMU=off > - Reworked iotlb files copyrights > > Changes since RFC: > ================== > - Fix memory leak in error patch reported by Jens > - Rework wait for IOTLB update by stopping the burst to let other > queues to be processed, if any. It implies the introduction an > iotlb_pending_list, so that iotlb miss requests aren't sent multiple > times for a same address. > - Optimize iotlb lock usage to recover to same as IOMMU off performance > - Fix device locking issue in rte_vhost_dequeue_burst() error path > - Change virtio_dev_rx error handling for consistency with mergeable rx, > and to ease returning in case of IOTLB misses. > - Fix checkpatch warnings reported by checkpatch@dpdk.org > > Maxime Coquelin (19): > Revert "vhost: workaround MQ fails to startup" > vhost: make error handling consistent in rx path > vhost: prepare send_vhost_message() to slave requests > vhost: add support to slave requests channel > vhost: declare missing IOMMU-related definitions for old kernels > vhost: add iotlb helper functions > vhost: iotlb: add pending miss request list and helpers > vhost-user: add support to IOTLB miss slave requests > vhost: initialize vrings IOTLB caches > vhost-user: handle IOTLB update and invalidate requests > vhost: introduce guest IOVA to backend VA helper > vhost: use the guest IOVA to host VA helper > vhost: enable rings at the right time > vhost: don't dereference invalid dev pointer after its reallocation > vhost: postpone rings addresses translation > vhost-user: translate ring addresses when IOMMU enabled > vhost-user: iommu: postpone device creation until ring are mapped > vhost: iommu: Invalidate vring in case of matching IOTLB invalidate > vhost: enable IOMMU support > > doc/guides/rel_notes/release_17_11.rst | 4 + > lib/librte_vhost/Makefile | 4 +- > lib/librte_vhost/iotlb.c | 352 +++++++++++++++++++++++++++++++++ > lib/librte_vhost/iotlb.h | 76 +++++++ > lib/librte_vhost/vhost.c | 115 +++++++++-- > lib/librte_vhost/vhost.h | 61 +++++- > lib/librte_vhost/vhost_user.c | 312 +++++++++++++++++++++++++---- > lib/librte_vhost/vhost_user.h | 20 +- > lib/librte_vhost/virtio_net.c | 96 +++++++-- > 9 files changed, 962 insertions(+), 78 deletions(-) > create mode 100644 lib/librte_vhost/iotlb.c > create mode 100644 lib/librte_vhost/iotlb.h > > -- > 2.13.6