DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yuanhan Liu <yliu@fridaylinux.org>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: dev@dpdk.org, remy.horton@intel.com, tiwei.bie@intel.com,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Subject: Re: [dpdk-dev] [PATCH v3 00/19] Vhost-user: Implement device IOTLB support
Date: Fri, 6 Oct 2017 14:24:26 +0800	[thread overview]
Message-ID: <20171006062426.GA1545@yliu-home> (raw)
In-Reply-To: <20171005083627.27828-1-maxime.coquelin@redhat.com>

Series applied to dpdk-next-virtio.

Thanks.

	--yliu

On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote:
> This v3 lists the feature in the release note, and fixes the bug in
> is_vring_iotlb_update() reported by Yuanhan.
> 
> The purpose of this series is to add support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
> 
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
> 
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
> 
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 1G host/1G guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
> 
> This is explained because IOTLB misses, which are very costly, only
> happen at startup time. Indeed, once used, the buffers are not
> invalidated, so if the IOTLB cache is large enough, there will be only
> cache hits. Also, the use of 1G huge pages improves the IOTLB cache
> searching time by reducing the number of entries.
> 
> With 2M hugepages, a performance degradation is seen with IOMMU on:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 2M host/2M guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
> 
> A possible improvement would be to merge contiguous IOTLB entries sharing
> the same permissions. A very rough patch implementing this idea fixes
> the performance degradation (18.8Mpps), but the required work to clean
> it would delay this series after v17.11.
> 
> With dynamic mappings (i.e. Virtio-net kernel driver), this is another
> story. The performance is so poor it makes it almost unusable. Indeed,
> since the Kernel driver unmaps the buffers as soon as they are handled,
> almost all descriptors buffers addresses translations result in an IOTLB
> miss. There is not much that can be done on DPDK side, except maybe
> batching IOTLB miss requests no to break bursts, but it would require
> a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
> so that DPDK receives the IOTLB updates without having to send IOTLB miss
> request.
> 
> Regarding the design choices:
>  - I initially intended to use userspace RCU library[1] for the cache
> implementation, but it would have added an external dependency, and the
> lib is not available in all distros. Qemu for example got rid of this
> dependency by copying some of the userspace RCU lib parts into Qemu tree,
> but this is not possible with DPDK due to licensing issues (RCU lib is
> LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
> locks.
>  - I initially implemented a per-device IOTLB cache, but the concurrent
> acccesses on the IOTLB lock had huge impact on performance (~-40% in
> bidirectionnal, expect even worse with multiqueue). I move to a per-
> virtqueue IOTLB design, which prevents this concurrency.
>  - The slave IOTLB miss request supports reply-ack feature in spec, but
> this version doesn't block or busy-wait for the corresponding update so
> that other queues sharing the same lcore can be processed in the meantime.
> 
> For those who would like to test the series, I made it available on
> gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
> the intel_iommu=on parameter, and the guest should be started with and
> iommu device attached to the virtio-net device. For example:
> 
> ./qemu-system-x86_64 \
>   -enable-kvm -m 4096 -smp 2 \
>   -M q35,kernel-irqchip=split \
>   -cpu host \
>   -device intel-iommu,device-iotlb=on,intremap \
>   -device ioh3420,id=root.1,chassis=1 \
>   -chardev socket,id=char0,path=/tmp/vhost-user1 \
>   -netdev type=vhost-user,id=hn2,chardev=char0 \
>   -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
> ...
> 
> [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
> [1]: http://liburcu.org/
> [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
> 
> Changes since v1:
> =================
> - Add feature support in release note (Yuanhan)
> - Fix return value in is_vring_iotlb_update() (Yuanhan)
> 
> Changes since v1:
> =================
> - Implement random cache entry eviction instead of removing all entries
>   when cache is full (Yuanhan)
> - Adds bounds check on vring_idx (Remy)
> - Initialize slave_req_fd to -1 (Tiwei)
> - Remove virtio-net device lock (Tiwei)
> - Simplify vhost_user_iotlb_cache_remove() (Tiwei)
> - Squash iotlb lock usage optimization patch
> - Inline noiommu part of vhost_iova_to_vva to remove performance
>   regressions with IOMMU=off
> - Reworked iotlb files copyrights
> 
> Changes since RFC:
> ==================
> - Fix memory leak in error patch reported by Jens
> - Rework wait for IOTLB update by stopping the burst to let other
>   queues to be processed, if any. It implies the introduction an
>   iotlb_pending_list, so that iotlb miss requests aren't sent multiple
>   times for a same address.
> - Optimize iotlb lock usage to recover to same as IOMMU off performance
> - Fix device locking issue in rte_vhost_dequeue_burst() error path
> - Change virtio_dev_rx error handling for consistency with mergeable rx,
>   and to ease returning in case of IOTLB misses.
> - Fix checkpatch warnings reported by checkpatch@dpdk.org
> 
> Maxime Coquelin (19):
>   Revert "vhost: workaround MQ fails to startup"
>   vhost: make error handling consistent in rx path
>   vhost: prepare send_vhost_message() to slave requests
>   vhost: add support to slave requests channel
>   vhost: declare missing IOMMU-related definitions for old kernels
>   vhost: add iotlb helper functions
>   vhost: iotlb: add pending miss request list and helpers
>   vhost-user: add support to IOTLB miss slave requests
>   vhost: initialize vrings IOTLB caches
>   vhost-user: handle IOTLB update and invalidate requests
>   vhost: introduce guest IOVA to backend VA helper
>   vhost: use the guest IOVA to host VA helper
>   vhost: enable rings at the right time
>   vhost: don't dereference invalid dev pointer after its reallocation
>   vhost: postpone rings addresses translation
>   vhost-user: translate ring addresses when IOMMU enabled
>   vhost-user: iommu: postpone device creation until ring are mapped
>   vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
>   vhost: enable IOMMU support
> 
>  doc/guides/rel_notes/release_17_11.rst |   4 +
>  lib/librte_vhost/Makefile              |   4 +-
>  lib/librte_vhost/iotlb.c               | 352 +++++++++++++++++++++++++++++++++
>  lib/librte_vhost/iotlb.h               |  76 +++++++
>  lib/librte_vhost/vhost.c               | 115 +++++++++--
>  lib/librte_vhost/vhost.h               |  61 +++++-
>  lib/librte_vhost/vhost_user.c          | 312 +++++++++++++++++++++++++----
>  lib/librte_vhost/vhost_user.h          |  20 +-
>  lib/librte_vhost/virtio_net.c          |  96 +++++++--
>  9 files changed, 962 insertions(+), 78 deletions(-)
>  create mode 100644 lib/librte_vhost/iotlb.c
>  create mode 100644 lib/librte_vhost/iotlb.h
> 
> -- 
> 2.13.6

      parent reply	other threads:[~2017-10-06  6:24 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05  8:36 Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-11-01 17:11   ` Kavanagh, Mark B
2017-11-02  9:40     ` Maxime Coquelin
2017-11-03 13:05       ` Yuanhan Liu
2017-11-03 14:28         ` Maxime Coquelin
2017-11-06 12:00           ` Yuanhan Liu
2017-11-06 12:07             ` Maxime Coquelin
2017-11-06 12:24               ` Yuanhan Liu
2017-11-06 12:50                 ` Maxime Coquelin
2017-11-06 13:36                   ` Yuanhan Liu
2017-11-03 15:34       ` Thomas Monjalon
2017-11-03 16:31         ` Kavanagh, Mark B
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
2017-10-13  1:47   ` Yao, Lei A
2017-10-13  7:32     ` Maxime Coquelin
2017-10-13  7:55       ` Yao, Lei A
2017-10-13  7:56         ` Maxime Coquelin
2017-10-16  5:59         ` Yao, Lei A
2017-10-16  6:23           ` Yao, Lei A
2017-10-16  9:47             ` Maxime Coquelin
2017-10-16 10:47               ` Maxime Coquelin
2017-10-17  1:24                 ` Yao, Lei A
2017-10-17  8:06                   ` Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-11-02  7:21   ` Yao, Lei A
2017-11-02  8:21     ` Maxime Coquelin
2017-11-02 16:02       ` Maxime Coquelin
2017-11-03  8:25         ` Maxime Coquelin
2017-11-03 15:15           ` Michael S. Tsirkin
2017-11-03 15:54             ` Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-10-05  8:36 ` [dpdk-dev] [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-10-06  6:24 ` Yuanhan Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171006062426.GA1545@yliu-home \
    --to=yliu@fridaylinux.org \
    --cc=dev@dpdk.org \
    --cc=jasowang@redhat.com \
    --cc=jfreiman@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=remy.horton@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=vkaplans@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).