From: "Stojaczyk, DariuszX" <dariuszx.stojaczyk@intel.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
Maxime Coquelin <maxime.coquelin@redhat.com>,
"Bie, Tiwei" <tiwei.bie@intel.com>,
"Tetsuya Mukawa" <mtetsuyah@gmail.com>,
Thomas Monjalon <thomas@monjalon.net>,
"yliu@fridaylinux.org" <yliu@fridaylinux.org>,
"Liu, Changpeng" <changpeng.liu@intel.com>
Subject: Re: [dpdk-dev] [RFC] vhost: new rte_vhost API proposal
Date: Fri, 18 May 2018 07:51:51 +0000 [thread overview]
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A8449A2C91@HASMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <20180511100531.GA19894@stefanha-x1.localdomain>
> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, May 11, 2018 6:06 PM
> On Fri, May 11, 2018 at 05:55:45AM +0000, Stojaczyk, DariuszX wrote:
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > Sent: Friday, May 11, 2018 12:37 AM
> > > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > > > rte_virtio would offer both vhost and virtio driver APIs. These two
> > > > have a lot of common code for vhost-user handling or PCI access for
> > > > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > > > sense to keep target and initiator code separated between different
> > > > libs. Of course, the APIs would be separate - only some parts of
> > > > the code would be shared.
> > >
> > > The API below seems to be for vhost backends (aka slaves).
> rte_virtio_*
> > > is a misnomer because vhost and virtio are two different things. This
> > > is not for implementing virtio devices, it's specifically for vhost
> > > devices.
> >
> > I agree it's named a bit off if we're talking about vhost. My idea was to
> introduce a generic library for userspace Virtio processing and that's
> where the name came from. Even when you use just the vhost API that's
> introduced here, you are only required to implement vring processing,
> config access, and possibly interrupt handling, all of which are typical
> Virtio things. The vhost logic is hidden inside.
>
> No, the vhost logic is not hidden: there is custom_msg() and the whole
> tgt_ops struct is an abstraction of the vhost protocol, not virtio.
>
> It sounds like you're hoping to create a single API that can support
> both vhost and virtio access. For example, one "net" device backend
> implementation using rte_virtio can be accessed via vhost or virtio.
>
> This won't work because vhost and virtio are not equivalent. vhost-net
> devices don't implement the virtio-net config space and they only have a
> subset of the virtqueues. vhost-net devices support special vhost
> messages that don't exist in virtio-net.
>
> Additionally, the virtio and vhost-user specifications are independent
> and make no promise of a 1:1 mapping. They have the freedom to
> change
> in ways which will break any abstraction you come up with today.
>
> I hope it will be possible to unify the two in the future, but that
> needs to happen at the spec level first, before trying to unify them in
> code.
>
> This is why I'm belaboring the point that vhost should not be confused
> with virtio. Each needs to be separate and clearly identified to avoid
> confusion.
>
Ok, I'm convinced now. Thanks for the explanation. I'll name the lib rte_vhost2 in v2.
> >
> > >
> > > Vhost does not offer the full virtio device model - otherwise it would
> > > be just another transport in the VIRTIO specification. Instead vhost is
> > > a protocol for vhost devices, which are subsets of virtio devices.
> > >
> > > I suggest calling it rte_vhost2 since it's basically a new, incompatible
> > > rte_vhost API.
> >
> > Rte_vhost2 sounds better for what we have now, but would that name
> be still valid once we add a true Virtio driver functionality? (I believe it's
> called Virtio PMD in DPDK right now). That driver would reuse a lot of the
> vhost code for PCI and vhost-user, so it makes some sense to put these
> two together.
> >
> > I don't think rte_vhost2 is a permanent name anyway, so maybe we
> could call it like so for now, and rename it later once I introduce that
> additional Virtio functionality? Would that work?
>
> The natural layering for is that vhost depends on virtio. Virtio header
> files (feature bits, config space layout, vring layout) and the vring
> API can be reused by vhost.
>
> Virtio doesn't need knowledge of virtio though and the two can be in
> separate packages without code duplication.
>
> That said, it doesn't really matter whether there are rte_virtio +
> rte_vhost2 packages or a single rte_virtio package, as long as the
> function and struct names for vhost interfaces contain the name "vhost"
> so they cannot be confused with virtio.
>
> > > > + * or before actions that require changing vhost device/vq state.
> > > > + */
> > > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq);
> > > > + /** Device disconnected. All queues are guaranteed to be stopped
> by
> > > now */
> > > > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > > > + /**
> > > > + * Custom message handler. `vdev` and `vq` can be NULL. This is
> called
> > > > + * for backend-specific actions. The `id` should be prefixed by the
> > >
> > > Since vdev can be NULL, does this mean custom_msg() may be invoked
> at
> > > any time during the lifecycle and even before/after
> > > device_create()/device_destroy()?
> >
> > Theoretically. I was thinking of some poorly-written backends notifying
> they're out of internal resources, but I agree it's just poor. I'll remove the
> `vdev can be NULL` part.
>
> Okay, I wasn't suggesting it's bad, I just wanted the docs to state at
> which points in the lifecycle this callback can be invoked.
>
> > > > + */
> > > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq,
> > > > + char *id, void *ctx);
> > >
> > > What is the purpose of id and why is it char* instead of const char*?
> >
> > Ack, It should be const. (same thing goes to every other char* in this
> patch)
> >
> > For example vhost-crypto introduces two new vhost-user messages for
> initializing and destroying crypto session. The underlying vhost-crypto
> vhost-user backend after receiving such message could execute this
> callback as follows:
> >
> > struct my_crypto_data *data = calloc();
> > [...]
> > Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);
>
> So it's necessary to modify rte_virtio vhost code when implementing new
> device backends with custom messages?
>
> It seems like rte_virtio needs to have knowledge of how to parse any
> custom messages :(. It would be cleaner for rte_virtio to have no
> knowledge of device-specific messages.
>
> And how does the device backend reply to custom messages?
>
The library would send proper response after receiving rte_virtio_tgt_cb_complete(). If it needs additional data from the user, there's the `ctx` field in custom_msg that he [the user] can write into.
However, I started to work on the implementation and came to conclusion that it's unnecessarily difficult to implement new Vhost device backends this way. I've changed the custom_msg callback to parse raw Vhost-user messages now. Still, new Vhost-user messages are usually a type of a protocol extension negotiated by a protocol feature flag, and protocol extensions should be implemented inside the lib in my opinion. If a protocol extension changes existing message rather than introduces new one, we'll *need* to implement it inside the lib.
Both solutions have their good and bad points.
I'm sending v2 in a couple minutes, maybe it'll help us decide which one is better.
> > > > +/**
> > > > + * Unregisters a vhost target asynchronously.
> > >
> > > How are existing device instances affected?
> >
> > Ack. How about:
> > All active queues will be stopped and all devices destroyed.
> >
> > This is analogous to what rte_vhost has now.
>
> Sounds good.
>
> Stefan
Regards,
D.
next prev parent reply other threads:[~2018-05-18 7:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-10 13:22 Dariusz Stojaczyk
[not found] ` <20180510163643.GD9308@stefanha-x1.localdomain>
2018-05-11 5:55 ` Stojaczyk, DariuszX
[not found] ` <20180511100531.GA19894@stefanha-x1.localdomain>
2018-05-18 7:51 ` Stojaczyk, DariuszX [this message]
2018-05-18 13:01 ` [dpdk-dev] [RFC v2] " Dariusz Stojaczyk
2018-05-18 13:50 ` Maxime Coquelin
2018-05-20 7:07 ` Yuanhan Liu
2018-05-22 10:19 ` Stojaczyk, DariuszX
[not found] ` <20180525100550.GD14757@stefanha-x1.localdomain>
2018-05-29 13:38 ` Stojaczyk, DariuszX
[not found] ` <20180530085700.GC14623@stefanha-x1.localdomain>
2018-05-30 12:24 ` Stojaczyk, DariuszX
[not found] ` <20180607151227.23660-1-darek.stojaczyk@gmail.com>
[not found] ` <20180608100852.GA31164@stefanha-x1.localdomain>
2018-06-13 9:41 ` [dpdk-dev] [RFC v3 0/7] vhost2: new librte_vhost2 proposal Dariusz Stojaczyk
2018-06-25 11:01 ` Tiwei Bie
2018-06-25 12:17 ` Stojaczyk, DariuszX
2018-06-26 8:22 ` Tiwei Bie
2018-06-26 8:30 ` Thomas Monjalon
2018-06-26 8:47 ` Stojaczyk, DariuszX
2018-06-26 9:14 ` Tiwei Bie
2018-06-26 9:38 ` Maxime Coquelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FBE7E039FA50BF47A673AD0BD3CD56A8449A2C91@HASMSX105.ger.corp.intel.com \
--to=dariuszx.stojaczyk@intel.com \
--cc=changpeng.liu@intel.com \
--cc=dev@dpdk.org \
--cc=maxime.coquelin@redhat.com \
--cc=mtetsuyah@gmail.com \
--cc=stefanha@redhat.com \
--cc=thomas@monjalon.net \
--cc=tiwei.bie@intel.com \
--cc=yliu@fridaylinux.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).