From: Yuanhan Liu <yliu@fridaylinux.org>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: dev@dpdk.org, remy.horton@intel.com, tiwei.bie@intel.com,
mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
jasowang@redhat.com
Subject: Re: [dpdk-dev] [PATCH v3 00/19] Vhost-user: Implement device IOTLB support
Date: Fri, 6 Oct 2017 14:24:26 +0800 [thread overview]
Message-ID: <20171006062426.GA1545@yliu-home> (raw)
In-Reply-To: <20171005083627.27828-1-maxime.coquelin@redhat.com>
Series applied to dpdk-next-virtio.
Thanks.
--yliu
On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote:
> This v3 lists the feature in the release note, and fixes the bug in
> is_vring_iotlb_update() reported by Yuanhan.
>
> The purpose of this series is to add support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
>
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
>
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
>
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> Traffic Generator: Moongen (lua-trafficgen)
> Acceptable Loss: 0.005%
> Validation run time: 1 min
> Guest DPDK version/commit: v17.05
> QEMU version/commit: master (6db174aed1fd)
> Virtio features: default
> CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> NIC: 2 x X710
> Page size: 1G host/1G guest
> Results (bidirectional, total of the two flows):
> - base: 18.8Mpps
> - base + IOTLB series, IOMMU OFF: 18.8Mpps
> - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
>
> This is explained because IOTLB misses, which are very costly, only
> happen at startup time. Indeed, once used, the buffers are not
> invalidated, so if the IOTLB cache is large enough, there will be only
> cache hits. Also, the use of 1G huge pages improves the IOTLB cache
> searching time by reducing the number of entries.
>
> With 2M hugepages, a performance degradation is seen with IOMMU on:
> Traffic Generator: Moongen (lua-trafficgen)
> Acceptable Loss: 0.005%
> Validation run time: 1 min
> Guest DPDK version/commit: v17.05
> QEMU version/commit: master (6db174aed1fd)
> Virtio features: default
> CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> NIC: 2 x X710
> Page size: 2M host/2M guest
> Results (bidirectional, total of the two flows):
> - base: 18.8Mpps
> - base + IOTLB series, IOMMU OFF: 18.8Mpps
> - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
>
> A possible improvement would be to merge contiguous IOTLB entries sharing
> the same permissions. A very rough patch implementing this idea fixes
> the performance degradation (18.8Mpps), but the required work to clean
> it would delay this series after v17.11.
>
> With dynamic mappings (i.e. Virtio-net kernel driver), this is another
> story. The performance is so poor it makes it almost unusable. Indeed,
> since the Kernel driver unmaps the buffers as soon as they are handled,
> almost all descriptors buffers addresses translations result in an IOTLB
> miss. There is not much that can be done on DPDK side, except maybe
> batching IOTLB miss requests no to break bursts, but it would require
> a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
> so that DPDK receives the IOTLB updates without having to send IOTLB miss
> request.
>
> Regarding the design choices:
> - I initially intended to use userspace RCU library[1] for the cache
> implementation, but it would have added an external dependency, and the
> lib is not available in all distros. Qemu for example got rid of this
> dependency by copying some of the userspace RCU lib parts into Qemu tree,
> but this is not possible with DPDK due to licensing issues (RCU lib is
> LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
> locks.
> - I initially implemented a per-device IOTLB cache, but the concurrent
> acccesses on the IOTLB lock had huge impact on performance (~-40% in
> bidirectionnal, expect even worse with multiqueue). I move to a per-
> virtqueue IOTLB design, which prevents this concurrency.
> - The slave IOTLB miss request supports reply-ack feature in spec, but
> this version doesn't block or busy-wait for the corresponding update so
> that other queues sharing the same lcore can be processed in the meantime.
>
> For those who would like to test the series, I made it available on
> gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
> the intel_iommu=on parameter, and the guest should be started with and
> iommu device attached to the virtio-net device. For example:
>
> ./qemu-system-x86_64 \
> -enable-kvm -m 4096 -smp 2 \
> -M q35,kernel-irqchip=split \
> -cpu host \
> -device intel-iommu,device-iotlb=on,intremap \
> -device ioh3420,id=root.1,chassis=1 \
> -chardev socket,id=char0,path=/tmp/vhost-user1 \
> -netdev type=vhost-user,id=hn2,chardev=char0 \
> -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
> ...
>
> [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
> [1]: http://liburcu.org/
> [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
>
> Changes since v1:
> =================
> - Add feature support in release note (Yuanhan)
> - Fix return value in is_vring_iotlb_update() (Yuanhan)
>
> Changes since v1:
> =================
> - Implement random cache entry eviction instead of removing all entries
> when cache is full (Yuanhan)
> - Adds bounds check on vring_idx (Remy)
> - Initialize slave_req_fd to -1 (Tiwei)
> - Remove virtio-net device lock (Tiwei)
> - Simplify vhost_user_iotlb_cache_remove() (Tiwei)
> - Squash iotlb lock usage optimization patch
> - Inline noiommu part of vhost_iova_to_vva to remove performance
> regressions with IOMMU=off
> - Reworked iotlb files copyrights
>
> Changes since RFC:
> ==================
> - Fix memory leak in error patch reported by Jens
> - Rework wait for IOTLB update by stopping the burst to let other
> queues to be processed, if any. It implies the introduction an
> iotlb_pending_list, so that iotlb miss requests aren't sent multiple
> times for a same address.
> - Optimize iotlb lock usage to recover to same as IOMMU off performance
> - Fix device locking issue in rte_vhost_dequeue_burst() error path
> - Change virtio_dev_rx error handling for consistency with mergeable rx,
> and to ease returning in case of IOTLB misses.
> - Fix checkpatch warnings reported by checkpatch@dpdk.org
>
> Maxime Coquelin (19):
> Revert "vhost: workaround MQ fails to startup"
> vhost: make error handling consistent in rx path
> vhost: prepare send_vhost_message() to slave requests
> vhost: add support to slave requests channel
> vhost: declare missing IOMMU-related definitions for old kernels
> vhost: add iotlb helper functions
> vhost: iotlb: add pending miss request list and helpers
> vhost-user: add support to IOTLB miss slave requests
> vhost: initialize vrings IOTLB caches
> vhost-user: handle IOTLB update and invalidate requests
> vhost: introduce guest IOVA to backend VA helper
> vhost: use the guest IOVA to host VA helper
> vhost: enable rings at the right time
> vhost: don't dereference invalid dev pointer after its reallocation
> vhost: postpone rings addresses translation
> vhost-user: translate ring addresses when IOMMU enabled
> vhost-user: iommu: postpone device creation until ring are mapped
> vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
> vhost: enable IOMMU support
>
> doc/guides/rel_notes/release_17_11.rst | 4 +
> lib/librte_vhost/Makefile | 4 +-
> lib/librte_vhost/iotlb.c | 352 +++++++++++++++++++++++++++++++++
> lib/librte_vhost/iotlb.h | 76 +++++++
> lib/librte_vhost/vhost.c | 115 +++++++++--
> lib/librte_vhost/vhost.h | 61 +++++-
> lib/librte_vhost/vhost_user.c | 312 +++++++++++++++++++++++++----
> lib/librte_vhost/vhost_user.h | 20 +-
> lib/librte_vhost/virtio_net.c | 96 +++++++--
> 9 files changed, 962 insertions(+), 78 deletions(-)
> create mode 100644 lib/librte_vhost/iotlb.c
> create mode 100644 lib/librte_vhost/iotlb.h
>
> --
> 2.13.6
prev parent reply other threads:[~2017-10-06 6:24 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-05 8:36 Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-11-01 17:11 ` Kavanagh, Mark B
2017-11-02 9:40 ` Maxime Coquelin
2017-11-03 13:05 ` Yuanhan Liu
2017-11-03 14:28 ` Maxime Coquelin
2017-11-06 12:00 ` Yuanhan Liu
2017-11-06 12:07 ` Maxime Coquelin
2017-11-06 12:24 ` Yuanhan Liu
2017-11-06 12:50 ` Maxime Coquelin
2017-11-06 13:36 ` Yuanhan Liu
2017-11-03 15:34 ` Thomas Monjalon
2017-11-03 16:31 ` Kavanagh, Mark B
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
2017-10-13 1:47 ` Yao, Lei A
2017-10-13 7:32 ` Maxime Coquelin
2017-10-13 7:55 ` Yao, Lei A
2017-10-13 7:56 ` Maxime Coquelin
2017-10-16 5:59 ` Yao, Lei A
2017-10-16 6:23 ` Yao, Lei A
2017-10-16 9:47 ` Maxime Coquelin
2017-10-16 10:47 ` Maxime Coquelin
2017-10-17 1:24 ` Yao, Lei A
2017-10-17 8:06 ` Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-11-02 7:21 ` Yao, Lei A
2017-11-02 8:21 ` Maxime Coquelin
2017-11-02 16:02 ` Maxime Coquelin
2017-11-03 8:25 ` Maxime Coquelin
2017-11-03 15:15 ` Michael S. Tsirkin
2017-11-03 15:54 ` Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-10-05 8:36 ` [dpdk-dev] [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-10-06 6:24 ` Yuanhan Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171006062426.GA1545@yliu-home \
--to=yliu@fridaylinux.org \
--cc=dev@dpdk.org \
--cc=jasowang@redhat.com \
--cc=jfreiman@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=remy.horton@intel.com \
--cc=tiwei.bie@intel.com \
--cc=vkaplans@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).