DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Xia, Chenbo" <chenbo.xia@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	"mkp@redhat.com" <mkp@redhat.com>,
	"fbl@redhat.com" <fbl@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"Liang, Cunming" <cunming.liang@intel.com>,
	"Xie, Yongji" <xieyongji@bytedance.com>,
	"echaudro@redhat.com" <echaudro@redhat.com>,
	"eperezma@redhat.com" <eperezma@redhat.com>,
	"amorenoz@redhat.com" <amorenoz@redhat.com>,
	"lulu@redhat.com" <lulu@redhat.com>
Subject: Re: [PATCH v5 00/26] Add VDUSE support to Vhost library
Date: Wed, 7 Jun 2023 16:58:46 +0200	[thread overview]
Message-ID: <6e196d2c-fb70-68a9-ca38-58629bb3ceda@redhat.com> (raw)
In-Reply-To: <SN6PR11MB35041E06BE8EABADBE2117E19C53A@SN6PR11MB3504.namprd11.prod.outlook.com>



On 6/7/23 08:48, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, June 6, 2023 4:18 PM
>> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>;
>> david.marchand@redhat.com; mkp@redhat.com; fbl@redhat.com;
>> jasowang@redhat.com; Liang, Cunming <cunming.liang@intel.com>; Xie, Yongji
>> <xieyongji@bytedance.com>; echaudro@redhat.com; eperezma@redhat.com;
>> amorenoz@redhat.com; lulu@redhat.com
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v5 00/26] Add VDUSE support to Vhost library
>>
>> This series introduces a new type of backend, VDUSE,
>> to the Vhost library.
>>
>> VDUSE stands for vDPA device in Userspace, it enables
>> implementing a Virtio device in userspace and have it
>> attached to the Kernel vDPA bus.
>>
>> Once attached to the vDPA bus, it can be used either by
>> Kernel Virtio drivers, like virtio-net in our case, via
>> the virtio-vdpa driver. Doing that, the device is visible
>> to the Kernel networking stack and is exposed to userspace
>> as a regular netdev.
>>
>> It can also be exposed to userspace thanks to the
>> vhost-vdpa driver, via a vhost-vdpa chardev that can be
>> passed to QEMU or Virtio-user PMD.
>>
>> While VDUSE support is already available in upstream
>> Kernel, a couple of patches are required to support
>> network device type:
>>
>> https://gitlab.com/mcoquelin/linux/-/tree/vduse_networking_rfc
>>
>> In order to attach the created VDUSE device to the vDPA
>> bus, a recent iproute2 version containing the vdpa tool is
>> required.
>>
>> Benchmark results:
>> ==================
>>
>> On this v2, PVP reference benchmark has been run & compared with
>> Vhost-user.
>>
>> When doing macswap forwarding in the worload, no difference is seen.
>> When doing io forwarding in the workload, we see 4% performance
>> degradation with VDUSE, comapred to Vhost-user/Virtio-user. It is
>> explained by the use of the IOTLB layer in the Vhost-library when using
>> VDUSE, whereas Vhost-user/Virtio-user does not make use of it.
>>
>> Usage:
>> ======
>>
>> 1. Probe required Kernel modules
>> # modprobe vdpa
>> # modprobe vduse
>> # modprobe virtio-vdpa
>>
>> 2. Build (require vduse kernel headers to be available)
>> # meson build
>> # ninja -C build
>>
>> 3. Create a VDUSE device (vduse0) using Vhost PMD with
>> testpmd (with 4 queue pairs in this example)
>> # ./build/app/dpdk-testpmd --no-pci --
>> vdev=net_vhost0,iface=/dev/vduse/vduse0,queues=4 --log-level=*:9  -- -i --
>> txq=4 --rxq=4
>>
>> 4. Attach the VDUSE device to the vDPA bus
>> # vdpa dev add name vduse0 mgmtdev vduse
>> => The virtio-net netdev shows up (eth0 here)
>> # ip l show eth0
>> 21: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
>> mode DEFAULT group default qlen 1000
>>      link/ether c2:73:ea:a7:68:6d brd ff:ff:ff:ff:ff:ff
>>
>> 5. Start/stop traffic in testpmd
>> testpmd> start
>> testpmd> show port stats 0
>>    ######################## NIC statistics for port 0
>> ########################
>>    RX-packets: 11         RX-missed: 0          RX-bytes:  1482
>>    RX-errors: 0
>>    RX-nombuf:  0
>>    TX-packets: 1          TX-errors: 0          TX-bytes:  62
>>
>>    Throughput (since last show)
>>    Rx-pps:            0          Rx-bps:            0
>>    Tx-pps:            0          Tx-bps:            0
>>
>> ##########################################################################
>> ##
>> testpmd> stop
>>
>> 6. Detach the VDUSE device from the vDPA bus
>> # vdpa dev del vduse0
>>
>> 7. Quit testpmd
>> testpmd> quit
>>
>> Known issues & remaining work:
>> ==============================
>> - Fix issue in FD manager (still polling while FD has been removed)
>> - Add Netlink support in Vhost library
>> - Support device reconnection
>>   -> a temporary patch to support reconnection via a tmpfs file is
>> available,
>>      upstream solution would be in-kernel and is being developed.
>>   -> https://gitlab.com/mcoquelin/dpdk-next-virtio/-
>> /commit/5ad06ce14159a9ce36ee168dd13ef389cec91137
>> - Support packed ring
>> - Provide more performance benchmark results
>>
>> Changes in v5:
>> ==============
>> - Delay starting/stopping the device to after having replied to the VDUSE
>>    event in order to avoid a deadlock encountered when testing with OVS.
> 
> Could you explain more to help me understand the deadlock issue?

Sure.

The V5 fixes an ABBA deadlock involving OVS mutex and kernel
rtnl_lock(), two OVS threads and the vdpa tool process.

We have an OVS bridge with a mlx5 port already added.
We add the vduse port to the same bridge.
Then we use the iproute2 vdpa tool to attach the vduse device the the
kernel vdpa bus. when doing this the rtnl lock is taken when the virtio-
net device is probed, and VDUSE_SET_STATUS gets sent and waits for its
reply.

This VDUSE_SET_STATUS request is handled by the DPDK VDUSE event
handler, and if DRIVER_OK bit is set the Vhsot .new_device() callback is
called, which triggers a bridge reconfiguration.

On bridge reconfiguration, the mlx5 port takes the OVS mutex and
performs an ioctl() which tries to take the rtnl lock, but is is already
owned by the vdpa tool.

The vduse_events thread is stucked waiting for the OVS mutex, so the 
reply to the VDUSE_SET_STATUS event is never sent, and the vdpa tool
process is stucked for 30 seconds, until a timeout happens.

When the timeourt happen, everything is unblocked, but the VDUSE device
has been marked as broken, and so not usable anymore.

I could reproduce and provide you the backtraces of the different
threads if you wish.

Anyway, I think it makes sense to perform the device startup after
having replied to VDUSE_SET_STATUS request, as it just mean the device
has taken into account the new status of the driver.

Hope it clarifies, let me know if you need more details.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>> - Mention reconnection support lack in the release note.
>>
>> Changes in v4:
>> ==============
>> - Applied patch 1 and patch 2 from v3
>> - Rebased on top of Eelco series
>> - Fix coredump clear in IOTLB cache removal (David)
>> - Remove uneeded ret variable in vhost_vring_inject_irq (David)
>> - Fixed release note (David, Chenbo)
>>
>> Changes in v2/v3:
>> =================
>> - Fixed mem_set_dump() parameter (patch 4)
>> - Fixed accidental comment change (patch 7, Chenbo)
>> - Change from __builtin_ctz to __builtin_ctzll (patch 9, Chenbo)
>> - move change from patch 12 to 13 (Chenbo)
>> - Enable locks annotation for control queue (Patch 17)
>> - Send control queue notification when used descriptors enqueued (Patch 17)
>> - Lock control queue IOTLB lock (Patch 17)
>> - Fix error path in virtio_net_ctrl_pop() (Patch 17, Chenbo)
>> - Set VDUSE dev FD as NONBLOCK (Patch 18)
>> - Enable more Virtio features (Patch 18)
>> - Remove calls to pthread_setcancelstate() (Patch 22)
>> - Add calls to fdset_pipe_notify() when adding and deleting FDs from a set
>> (Patch 22)
>> - Use RTE_DIM() to get requests string array size (Patch 22)
>> - Set reply result for IOTLB update message (Patch 25, Chenbo)
>> - Fix queues enablement with multiqueue (Patch 26)
>> - Move kickfd creation for better logging (Patch 26)
>> - Improve logging (Patch 26)
>> - Uninstall cvq kickfd in case of handler installation failure (Patch 27)
>> - Enable CVQ notifications once handler is installed (Patch 27)
>> - Don't advertise multiqueue and control queue if app only request single
>> queue pair (Patch 27)
>> - Add release notes
>>
>> Maxime Coquelin (26):
>>    vhost: fix IOTLB entries overlap check with previous entry
>>    vhost: add helper of IOTLB entries coredump
>>    vhost: add helper for IOTLB entries shared page check
>>    vhost: don't dump unneeded pages with IOTLB
>>    vhost: change to single IOTLB cache per device
>>    vhost: add offset field to IOTLB entries
>>    vhost: add page size info to IOTLB entry
>>    vhost: retry translating IOVA after IOTLB miss
>>    vhost: introduce backend ops
>>    vhost: add IOTLB cache entry removal callback
>>    vhost: add helper for IOTLB misses
>>    vhost: add helper for interrupt injection
>>    vhost: add API to set max queue pairs
>>    net/vhost: use API to set max queue pairs
>>    vhost: add control virtqueue support
>>    vhost: add VDUSE device creation and destruction
>>    vhost: add VDUSE callback for IOTLB miss
>>    vhost: add VDUSE callback for IOTLB entry removal
>>    vhost: add VDUSE callback for IRQ injection
>>    vhost: add VDUSE events handler
>>    vhost: add support for virtqueue state get event
>>    vhost: add support for VDUSE status set event
>>    vhost: add support for VDUSE IOTLB update event
>>    vhost: add VDUSE device startup
>>    vhost: add multiqueue support to VDUSE
>>    vhost: add VDUSE device stop
>>
>>   doc/guides/prog_guide/vhost_lib.rst    |   4 +
>>   doc/guides/rel_notes/release_23_07.rst |  12 +
>>   drivers/net/vhost/rte_eth_vhost.c      |   3 +
>>   lib/vhost/iotlb.c                      | 333 +++++++------
>>   lib/vhost/iotlb.h                      |  45 +-
>>   lib/vhost/meson.build                  |   5 +
>>   lib/vhost/rte_vhost.h                  |  17 +
>>   lib/vhost/socket.c                     |  72 ++-
>>   lib/vhost/vduse.c                      | 646 +++++++++++++++++++++++++
>>   lib/vhost/vduse.h                      |  33 ++
>>   lib/vhost/version.map                  |   1 +
>>   lib/vhost/vhost.c                      |  70 ++-
>>   lib/vhost/vhost.h                      |  57 ++-
>>   lib/vhost/vhost_user.c                 |  51 +-
>>   lib/vhost/vhost_user.h                 |   2 +-
>>   lib/vhost/virtio_net_ctrl.c            | 286 +++++++++++
>>   lib/vhost/virtio_net_ctrl.h            |  10 +
>>   17 files changed, 1409 insertions(+), 238 deletions(-)
>>   create mode 100644 lib/vhost/vduse.c
>>   create mode 100644 lib/vhost/vduse.h
>>   create mode 100644 lib/vhost/virtio_net_ctrl.c
>>   create mode 100644 lib/vhost/virtio_net_ctrl.h
>>
>> --
>> 2.40.1
> 


  reply	other threads:[~2023-06-07 14:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06  8:18 Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 01/26] vhost: fix IOTLB entries overlap check with previous entry Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 02/26] vhost: add helper of IOTLB entries coredump Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 03/26] vhost: add helper for IOTLB entries shared page check Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 04/26] vhost: don't dump unneeded pages with IOTLB Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 05/26] vhost: change to single IOTLB cache per device Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 06/26] vhost: add offset field to IOTLB entries Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 07/26] vhost: add page size info to IOTLB entry Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 08/26] vhost: retry translating IOVA after IOTLB miss Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 09/26] vhost: introduce backend ops Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 10/26] vhost: add IOTLB cache entry removal callback Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 11/26] vhost: add helper for IOTLB misses Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 12/26] vhost: add helper for interrupt injection Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 13/26] vhost: add API to set max queue pairs Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 14/26] net/vhost: use " Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 15/26] vhost: add control virtqueue support Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 16/26] vhost: add VDUSE device creation and destruction Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 17/26] vhost: add VDUSE callback for IOTLB miss Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 18/26] vhost: add VDUSE callback for IOTLB entry removal Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 19/26] vhost: add VDUSE callback for IRQ injection Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 20/26] vhost: add VDUSE events handler Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 21/26] vhost: add support for virtqueue state get event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 22/26] vhost: add support for VDUSE status set event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 23/26] vhost: add support for VDUSE IOTLB update event Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 24/26] vhost: add VDUSE device startup Maxime Coquelin
2023-06-08  2:10   ` Xia, Chenbo
2023-06-06  8:18 ` [PATCH v5 25/26] vhost: add multiqueue support to VDUSE Maxime Coquelin
2023-06-06  8:18 ` [PATCH v5 26/26] vhost: add VDUSE device stop Maxime Coquelin
2023-06-08  2:11   ` Xia, Chenbo
2023-06-07  6:48 ` [PATCH v5 00/26] Add VDUSE support to Vhost library Xia, Chenbo
2023-06-07 14:58   ` Maxime Coquelin [this message]
2023-06-08  1:53     ` Xia, Chenbo
2023-06-07  8:05 ` David Marchand
2023-06-08  9:17   ` Maxime Coquelin
2023-06-08 12:44     ` David Marchand
2023-06-08 14:29 ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e196d2c-fb70-68a9-ca38-58629bb3ceda@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=amorenoz@redhat.com \
    --cc=chenbo.xia@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=echaudro@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=fbl@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lulu@redhat.com \
    --cc=mkp@redhat.com \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).