DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: dev@dpdk.org, Yuanhan Liu <yliu@fridaylinux.org>
Cc: mst@redhat.com, vkaplans@redhat.com, jasowang@redhat.com,
	jfreiman@redhat.com, Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: [dpdk-dev] [RFC 00/19] Vhost-user: Implement device IOTLB support
Date: Tue,  4 Jul 2017 11:49:03 +0200	[thread overview]
Message-ID: <20170704094922.11405-1-maxime.coquelin@redhat.com> (raw)

This first RFC, which targets v17.11,  adds support for
VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
vhost-user backend. It improves the guest safety by enabling the
possibility to isolate the Virtio device.

It makes possible to use Virtio PMD in guest with using VFIO driver
without enable_unsafe_noiommu_mode parameter set, so that the DPDK
application on guest can only access memory its has been allowed to,
and preventing malicious/buggy DPDK application in guest to make
vhost-user backend write random guest memory. Note that Virtio-net
Kernel driver also support IOMMU.

The series depends on Qemu's "vhost-user: Specify and implement
device IOTLB support" [0], available upstream and which will be part
of Qemu v2.10 release.

Performance-wise, even if this RFC has still room for optimizations,
no performance degradation is noticed with static mappings (i.e. DPDK
on guest) with PVP benchmark:
	Traffic Generator: Moongen (lua-trafficgen)
	Acceptable Loss: 0.005%
	Validation run time: 1 min
	Guest DPDK version/commit: v17.05
	QEMU version/commit: master (6db174aed1fd)
	Virtio features: default
	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
	NIC: 2 x X710
	Page size: 1G host/1G guest
	Results (bidirectional, total of the two flows):
	 - base: 18.8Mpps
	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
	 - base + IOTLB series, IOMMU ON: 18.8Mpps

This is explained because IOTLB misses, which are very costly, only
happen at startup time. Indeed, once used, the buffers are not
invalidated, so if the IOTLB cache is large enough, there will be only
cache hit. Also, the use of 1G huge pages improves the IOTLB cache
searching time by reducing the number of entries. In next revision of
this series, I plan to provide PVP results with 2MB pages.

With dynamic mappings (i.e. Virtio-net kernel driver), this is another
story. The performance is so poor it makes it almost unusable. Indeed,
since the Kernel driver unmaps the buffers as soon as they are handled,
almost all descriptors buffers addresses translations result in an IOTLB
miss. There is not much that can be done on DPDK side. In Qemu, we may
consider enabling IOMMU MAP notifications, so that DPDK receives the
IOTLB updates without having to send IOTLB miss request.

Regarding the design choices:
 - I initially intended to use userspace RCU library[1] for the cache
implementation, but it would have added an external dependency, and the
lib is not available in all distros. Qemu for example got rid of this
dependency by copying some of the userspace RCU lib parts into Qemu tree,
but this is not possible with DPDK due to licensing issues (RCU lib is
LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
locks.
 - I initially implemented a per-device IOTLB cache, but the concurrent
acccesses on the IOTLB lock had huge impact on performance (~-40% in
bidirectionnal, expect even worse with multiqueue). I move to a per-
virtqueue IOTLB design, which prevents this concurrency.
 - The slave IOTLB miss request supports reply-ack feature in spec, but
I moved to a busy-polling on IOTLB event to avoid a deadlock happening
when the device is stopped while a processing thread is waiting for an
IOTLB update.

For those who would like to test the series, I made it available on
gitlab[2] (vhost_user_iotlb_upstream_rfc tag). The guest kernel command
line requires the intel_iommu=on parameter, and the guest should be
started with and iommu device attached to the virtio-net device. For
example:

./qemu-system-x86_64 \
  -enable-kvm -m 4096 -smp 2 \
  -M q35,kernel-irqchip=split \
  -cpu host \
  -device intel-iommu,device-iotlb=on,intremap \
  -device ioh3420,id=root.1,chassis=1 \
  -chardev socket,id=char0,path=/tmp/vhost-user1 \
  -netdev type=vhost-user,id=hn2,chardev=char0 \
  -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
...

[0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
[1]: http://liburcu.org/
[2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/tree/vhost_user_iotlb_upstream_rfc

Maxime Coquelin (19):
  vhost: protect virtio_net device struct
  Revert "vhost: workaround MQ fails to startup"
  vhost: prepare send_vhost_message() to slave requests
  vhost: add support to slave requests channel
  vhost: declare missing IOMMU-related definitions for old kernels
  vhost: add iotlb helper functions
  vhost-user: add support to IOTLB miss slave requests
  vhost: initialize vrings IOTLB caches
  vhost: implement IOTLB events notification mechanism
  vhost-user: handle IOTLB update and invalidate requests
  vhost: introduce guest IOVA to backend VA helper
  vhost: use the guest IOVA to host VA helper
  vhost: enable rings at the right time
  vhost: don't dereference invalid dev pointer after its reallocation
  vhost: postpone rings adresses translation
  vhost-user: translate ring addresses when IOMMU enabled
  vhost-user: iommu: postpone device creation until ring are mapped
  vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
  vhost: enable IOMMU support

 lib/librte_vhost/Makefile     |   4 +-
 lib/librte_vhost/iotlb.c      | 236 ++++++++++++++++++++++++++
 lib/librte_vhost/iotlb.h      |  47 +++++
 lib/librte_vhost/vhost.c      | 350 +++++++++++++++++++++++++++++++++-----
 lib/librte_vhost/vhost.h      |  53 +++++-
 lib/librte_vhost/vhost_user.c | 387 +++++++++++++++++++++++++++++++++---------
 lib/librte_vhost/vhost_user.h |  20 ++-
 lib/librte_vhost/virtio_net.c | 100 ++++++++---
 8 files changed, 1033 insertions(+), 164 deletions(-)
 create mode 100644 lib/librte_vhost/iotlb.c
 create mode 100644 lib/librte_vhost/iotlb.h

-- 
2.9.4

             reply	other threads:[~2017-07-04  9:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-04  9:49 Maxime Coquelin [this message]
2017-07-04  9:49 ` [dpdk-dev] [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
2017-07-05 10:07   ` Jens Freimann
2017-07-07  7:31     ` Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 02/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 07/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 08/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 09/19] vhost: implement IOTLB events notification mechanism Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 15/19] vhost: postpone rings adresses translation Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-07-04  9:49 ` [dpdk-dev] [RFC 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-08-31  9:10 ` [dpdk-dev] [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170704094922.11405-1-maxime.coquelin@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jasowang@redhat.com \
    --cc=jfreiman@redhat.com \
    --cc=mst@redhat.com \
    --cc=vkaplans@redhat.com \
    --cc=yliu@fridaylinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).