From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id EA5B62A58 for ; Tue, 4 Jul 2017 11:49:39 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CD18F80462; Tue, 4 Jul 2017 09:49:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com CD18F80462 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=maxime.coquelin@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com CD18F80462 Received: from localhost.localdomain (ovpn-112-47.ams2.redhat.com [10.36.112.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E5655C7C1; Tue, 4 Jul 2017 09:49:32 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, Yuanhan Liu Cc: mst@redhat.com, vkaplans@redhat.com, jasowang@redhat.com, jfreiman@redhat.com, Maxime Coquelin Date: Tue, 4 Jul 2017 11:49:03 +0200 Message-Id: <20170704094922.11405-1-maxime.coquelin@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 04 Jul 2017 09:49:38 +0000 (UTC) Subject: [dpdk-dev] [RFC 00/19] Vhost-user: Implement device IOTLB support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jul 2017 09:49:40 -0000 This first RFC, which targets v17.11, adds support for VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the vhost-user backend. It improves the guest safety by enabling the possibility to isolate the Virtio device. It makes possible to use Virtio PMD in guest with using VFIO driver without enable_unsafe_noiommu_mode parameter set, so that the DPDK application on guest can only access memory its has been allowed to, and preventing malicious/buggy DPDK application in guest to make vhost-user backend write random guest memory. Note that Virtio-net Kernel driver also support IOMMU. The series depends on Qemu's "vhost-user: Specify and implement device IOTLB support" [0], available upstream and which will be part of Qemu v2.10 release. Performance-wise, even if this RFC has still room for optimizations, no performance degradation is noticed with static mappings (i.e. DPDK on guest) with PVP benchmark: Traffic Generator: Moongen (lua-trafficgen) Acceptable Loss: 0.005% Validation run time: 1 min Guest DPDK version/commit: v17.05 QEMU version/commit: master (6db174aed1fd) Virtio features: default CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz NIC: 2 x X710 Page size: 1G host/1G guest Results (bidirectional, total of the two flows): - base: 18.8Mpps - base + IOTLB series, IOMMU OFF: 18.8Mpps - base + IOTLB series, IOMMU ON: 18.8Mpps This is explained because IOTLB misses, which are very costly, only happen at startup time. Indeed, once used, the buffers are not invalidated, so if the IOTLB cache is large enough, there will be only cache hit. Also, the use of 1G huge pages improves the IOTLB cache searching time by reducing the number of entries. In next revision of this series, I plan to provide PVP results with 2MB pages. With dynamic mappings (i.e. Virtio-net kernel driver), this is another story. The performance is so poor it makes it almost unusable. Indeed, since the Kernel driver unmaps the buffers as soon as they are handled, almost all descriptors buffers addresses translations result in an IOTLB miss. There is not much that can be done on DPDK side. In Qemu, we may consider enabling IOMMU MAP notifications, so that DPDK receives the IOTLB updates without having to send IOTLB miss request. Regarding the design choices: - I initially intended to use userspace RCU library[1] for the cache implementation, but it would have added an external dependency, and the lib is not available in all distros. Qemu for example got rid of this dependency by copying some of the userspace RCU lib parts into Qemu tree, but this is not possible with DPDK due to licensing issues (RCU lib is LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr locks. - I initially implemented a per-device IOTLB cache, but the concurrent acccesses on the IOTLB lock had huge impact on performance (~-40% in bidirectionnal, expect even worse with multiqueue). I move to a per- virtqueue IOTLB design, which prevents this concurrency. - The slave IOTLB miss request supports reply-ack feature in spec, but I moved to a busy-polling on IOTLB event to avoid a deadlock happening when the device is stopped while a processing thread is waiting for an IOTLB update. For those who would like to test the series, I made it available on gitlab[2] (vhost_user_iotlb_upstream_rfc tag). The guest kernel command line requires the intel_iommu=on parameter, and the guest should be started with and iommu device attached to the virtio-net device. For example: ./qemu-system-x86_64 \ -enable-kvm -m 4096 -smp 2 \ -M q35,kernel-irqchip=split \ -cpu host \ -device intel-iommu,device-iotlb=on,intremap \ -device ioh3420,id=root.1,chassis=1 \ -chardev socket,id=char0,path=/tmp/vhost-user1 \ -netdev type=vhost-user,id=hn2,chardev=char0 \ -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \ ... [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html [1]: http://liburcu.org/ [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/tree/vhost_user_iotlb_upstream_rfc Maxime Coquelin (19): vhost: protect virtio_net device struct Revert "vhost: workaround MQ fails to startup" vhost: prepare send_vhost_message() to slave requests vhost: add support to slave requests channel vhost: declare missing IOMMU-related definitions for old kernels vhost: add iotlb helper functions vhost-user: add support to IOTLB miss slave requests vhost: initialize vrings IOTLB caches vhost: implement IOTLB events notification mechanism vhost-user: handle IOTLB update and invalidate requests vhost: introduce guest IOVA to backend VA helper vhost: use the guest IOVA to host VA helper vhost: enable rings at the right time vhost: don't dereference invalid dev pointer after its reallocation vhost: postpone rings adresses translation vhost-user: translate ring addresses when IOMMU enabled vhost-user: iommu: postpone device creation until ring are mapped vhost: iommu: Invalidate vring in case of matching IOTLB invalidate vhost: enable IOMMU support lib/librte_vhost/Makefile | 4 +- lib/librte_vhost/iotlb.c | 236 ++++++++++++++++++++++++++ lib/librte_vhost/iotlb.h | 47 +++++ lib/librte_vhost/vhost.c | 350 +++++++++++++++++++++++++++++++++----- lib/librte_vhost/vhost.h | 53 +++++- lib/librte_vhost/vhost_user.c | 387 +++++++++++++++++++++++++++++++++--------- lib/librte_vhost/vhost_user.h | 20 ++- lib/librte_vhost/virtio_net.c | 100 ++++++++--- 8 files changed, 1033 insertions(+), 164 deletions(-) create mode 100644 lib/librte_vhost/iotlb.c create mode 100644 lib/librte_vhost/iotlb.h -- 2.9.4