DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tiwei Bie <tiwei.bie@intel.com>
To: Dariusz Stojaczyk <darek.stojaczyk@gmail.com>
Cc: dev@dpdk.org, Maxime Coquelin <maxime.coquelin@redhat.com>,
	Tetsuya Mukawa <mtetsuyah@gmail.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	yliu@fridaylinux.org, James Harris <james.r.harris@intel.com>,
	Tomasz Kulasek <tomaszx.kulasek@intel.com>,
	Pawel Wodkowski <pawelx.wodkowski@intel.com>
Subject: Re: [dpdk-dev] [RFC v3 0/7] vhost2: new librte_vhost2 proposal
Date: Mon, 25 Jun 2018 19:01:46 +0800	[thread overview]
Message-ID: <20180625110146.GA18211@debian> (raw)
In-Reply-To: <20180607151227.23660-1-darek.stojaczyk@gmail.com>

On Thu, Jun 07, 2018 at 05:12:20PM +0200, Dariusz Stojaczyk wrote:
> rte_vhost is not vhost-user spec compliant. Some Vhost drivers have
> been already confirmed not to work with rte_vhost. virtio-user-scsi-pci
> in QEMU 2.12 doesn't fully initialize its management queues at SeaBIOS
> stage. This is perfectly fine from the Vhost-user spec perspective, but
> doesn't meet rte_vhost expectations. rte_vhost waits for all queues
> to be fully initialized before it allows the entire device to be
> processed.
> 
> The problem can be temporarily worked around by start-stopping entire
> device (all queues) on each vhost-user message that could change queue
> state. This would increase general VM boot time and is totally
> unacceptable though.
> 
> Fixing rte_vhost properly would require quite a big amount
> of changes which would completely break backward compatibility.
> We're introducing a new rte_vhost2 library intended to smooth the transition.
> It exposes a low-level API for implementing new vhost devices.
> The existing rte_vhost is about to be refactored to use rte_vhost2
> underneath and demanding users could now use rte_vhost2 directly.
> 
> We're designing a fresh library here, so this opens up a great
> amount of possibilities and improvements we can make to the public
> API that will pay off significantly for the sake of future
> specification/library extensions.
> 
> rte_vhost2 abstracts away most vhost-user/virtio-vhost-user specifics
> and allows developers to implement vhost devices with an ease.
> It calls user-provided callbacks once proper device initialization
> state has been reached. That is - memory mappings have changed,
> virtqueues are ready to be processed, features have changed at
> runtime, etc.
> 
> Compared to the rte_vhost, this lib additionally allows the following:
> 
> * ability to start/stop particular queues
> * most callbacks are now asynchronous - it greatly simplifies
>   the event handling for asynchronous applications and doesn't
>   make anything harder for synchronous ones.
> * this is low-level vhost API. It doesn't have any vhost-net, nvme
>   or crypto references. These backend-specific libraries will
>   be later refactored to use *this* generic library underneath.
>   This implies that the library doesn't do any virtqueue processing,
>   it only delivers vring addresses to the user, so he can process
>   virtqueues by himself.
> * abstracting away Unix domain sockets (vhost-user) and PCI
>   (virtio-vhost-user).
> * APIs for interrupt-driven drivers
> * replacing gpa_to_vva function with an IOMMU-aware variant
> * The API imposes how public functions can be called and how
>   internal data can change, so there's only a minimal work required
>   to ensure thread-safety. Possibly no mutexes are required at all.
> * full Virtio 1.0/vhost-user specification compliance.
> 
> The proposed structure for the new library is described below.
>  * rte_vhost2.h
>    - public API
>    - registering targets with provided ops
>    - unregistering targets
>    - iova_to_vva()
>  * transport.h/.c
>    - implements rte_vhost2.h
>    - allows registering vhost transports, which are opaquely required by the
>      rte_vhost2.h API (target register function requires transport name).
>    - exposes a set of callbacks to be implemented by each transport
>  * vhost_user.c
>    - vhost-user Unix domain socket transport
>    - does recvmsg()
>    - uses the underlying vhost-user helper lib to process messages, but still
>      handles some transport-specific ones, e.g. SET_MEM_TABLE
>    - calls some of the rte_vhost2.h ops registered with a target
>  * fd_man.h/.c
>    - polls provided descriptors, calls user callbacks on fd events
>    - based on the original rte_vhost version
>    - additionally allows calling user-provided callbacks on the poll thread
>  * vhost.h/.c
>    - a transport-agnostic vhost-user library
>    - calls most of the rte_vhost2.h ops registered with a target
>    - manages virtqueues state
>    - hopefully to be reused by the virtio-vhost-user
>    - exposes a set of callbacks to be implemented by each transport
>      (for e.g. sending message replies)
> 
> This series includes only vhost-user transport. Virtio-vhost-user
> is to be done later.
> 
> The following items are still TBD:
>   * vhost-user slave channel
>   * get/set_config
>   * cfg_call() implementation
>   * IOTLB
>   * NUMA awareness
>   * Live migration
>   * vhost-net implementation (to port rte_vhost over)
>   * vhost-crypto implementation (to port rte_vhost over)
>   * vhost-user client mode (connecting to an external socket)
>   * various error logging
> 
> Dariusz Stojaczyk (7):
>   vhost2: add initial implementation
>   vhost2: add vhost-user helper layer
>   vhost2/vhost: handle queue_stop/device_destroy failure
>   vhost2: add asynchronous fd_man
>   vhost2: add initial vhost-user transport
>   vhost2/vhost: support protocol features
>   vhost2/user: implement tgt unregister path

Hi Dariusz,

Thank you for putting efforts in making the DPDK
vhost more generic!

>From my understanding, your proposal is that:

1) Introduce rte_vhost2 to provide the APIs which
   allow users to implement vhost backends like
   SCSI, net, crypto, ..

2) Refactor the existing rte_vhost to use rte_vhost2.
   The rte_vhost will still provide below existing
   sets of APIs:
    1. The APIs which allow users to implement
       external vhost backends (these APIs were
       designed for SPDK previously)
    2. The APIs provided by the net backend
    3. The APIs provided by the crypto backend
   And above APIs in rte_vhost won't be changed.

Is my above understanding correct? Thanks!

Best regards,
Tiwei Bie


> 
>  lib/Makefile                             |   4 +-
>  lib/librte_vhost2/Makefile               |  25 +
>  lib/librte_vhost2/fd_man.c               | 288 +++++++++++
>  lib/librte_vhost2/fd_man.h               | 125 +++++
>  lib/librte_vhost2/rte_vhost2.h           | 304 +++++++++++
>  lib/librte_vhost2/rte_vhost2_version.map |  12 +
>  lib/librte_vhost2/transport.c            |  74 +++
>  lib/librte_vhost2/transport.h            |  32 ++
>  lib/librte_vhost2/vhost.c                | 728 ++++++++++++++++++++++++++
>  lib/librte_vhost2/vhost.h                | 203 ++++++++
>  lib/librte_vhost2/vhost_user.c           | 845 +++++++++++++++++++++++++++++++
>  11 files changed, 2638 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_vhost2/Makefile
>  create mode 100644 lib/librte_vhost2/fd_man.c
>  create mode 100644 lib/librte_vhost2/fd_man.h
>  create mode 100644 lib/librte_vhost2/rte_vhost2.h
>  create mode 100644 lib/librte_vhost2/rte_vhost2_version.map
>  create mode 100644 lib/librte_vhost2/transport.c
>  create mode 100644 lib/librte_vhost2/transport.h
>  create mode 100644 lib/librte_vhost2/vhost.c
>  create mode 100644 lib/librte_vhost2/vhost.h
>  create mode 100644 lib/librte_vhost2/vhost_user.c
> 
> -- 
> 2.11.0
> 
> 
> Dariusz Stojaczyk (7):
>   vhost2: add initial implementation
>   vhost2: add vhost-user helper layer
>   vhost2/vhost: handle queue_stop/device_destroy failure
>   vhost2: add asynchronous fd_man
>   vhost2: add initial vhost-user transport
>   vhost2/vhost: support protocol features
>   vhost2/user: implement tgt unregister path
> 
>  lib/Makefile                             |   4 +-
>  lib/librte_vhost2/Makefile               |  25 +
>  lib/librte_vhost2/fd_man.c               | 288 +++++++++++
>  lib/librte_vhost2/fd_man.h               | 125 +++++
>  lib/librte_vhost2/rte_vhost2.h           | 304 +++++++++++
>  lib/librte_vhost2/rte_vhost2_version.map |  12 +
>  lib/librte_vhost2/transport.c            |  74 +++
>  lib/librte_vhost2/transport.h            |  32 ++
>  lib/librte_vhost2/vhost.c                | 728 ++++++++++++++++++++++++++
>  lib/librte_vhost2/vhost.h                | 203 ++++++++
>  lib/librte_vhost2/vhost_user.c           | 845 +++++++++++++++++++++++++++++++
>  11 files changed, 2638 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_vhost2/Makefile
>  create mode 100644 lib/librte_vhost2/fd_man.c
>  create mode 100644 lib/librte_vhost2/fd_man.h
>  create mode 100644 lib/librte_vhost2/rte_vhost2.h
>  create mode 100644 lib/librte_vhost2/rte_vhost2_version.map
>  create mode 100644 lib/librte_vhost2/transport.c
>  create mode 100644 lib/librte_vhost2/transport.h
>  create mode 100644 lib/librte_vhost2/vhost.c
>  create mode 100644 lib/librte_vhost2/vhost.h
>  create mode 100644 lib/librte_vhost2/vhost_user.c
> 
> -- 
> 2.11.0
> 

  parent reply	other threads:[~2018-06-25 11:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10 13:22 [dpdk-dev] [RFC] vhost: new rte_vhost API proposal Dariusz Stojaczyk
     [not found] ` <20180510163643.GD9308@stefanha-x1.localdomain>
2018-05-11  5:55   ` Stojaczyk, DariuszX
     [not found]     ` <20180511100531.GA19894@stefanha-x1.localdomain>
2018-05-18  7:51       ` Stojaczyk, DariuszX
2018-05-18 13:01 ` [dpdk-dev] [RFC v2] " Dariusz Stojaczyk
2018-05-18 13:50   ` Maxime Coquelin
2018-05-20  7:07     ` Yuanhan Liu
2018-05-22 10:19     ` Stojaczyk, DariuszX
     [not found]   ` <20180525100550.GD14757@stefanha-x1.localdomain>
2018-05-29 13:38     ` Stojaczyk, DariuszX
     [not found]       ` <20180530085700.GC14623@stefanha-x1.localdomain>
2018-05-30 12:24         ` Stojaczyk, DariuszX
     [not found]   ` <20180607151227.23660-1-darek.stojaczyk@gmail.com>
     [not found]     ` <20180608100852.GA31164@stefanha-x1.localdomain>
2018-06-13  9:41       ` [dpdk-dev] [RFC v3 0/7] vhost2: new librte_vhost2 proposal Dariusz Stojaczyk
2018-06-25 11:01     ` Tiwei Bie [this message]
2018-06-25 12:17       ` Stojaczyk, DariuszX
2018-06-26  8:22         ` Tiwei Bie
2018-06-26  8:30           ` Thomas Monjalon
2018-06-26  8:47           ` Stojaczyk, DariuszX
2018-06-26  9:14             ` Tiwei Bie
2018-06-26  9:38               ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180625110146.GA18211@debian \
    --to=tiwei.bie@intel.com \
    --cc=darek.stojaczyk@gmail.com \
    --cc=dev@dpdk.org \
    --cc=james.r.harris@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mtetsuyah@gmail.com \
    --cc=pawelx.wodkowski@intel.com \
    --cc=stefanha@redhat.com \
    --cc=thomas@monjalon.net \
    --cc=tomaszx.kulasek@intel.com \
    --cc=yliu@fridaylinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).