From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dariuszx.stojaczyk@intel.com>
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
 by dpdk.org (Postfix) with ESMTP id CA3508D9D
 for <dev@dpdk.org>; Fri, 18 May 2018 09:51:56 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 18 May 2018 00:51:55 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,413,1520924400"; d="scan'208";a="40937131"
Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202])
 by fmsmga008.fm.intel.com with ESMTP; 18 May 2018 00:51:55 -0700
Received: from fmsmsx102.amr.corp.intel.com (10.18.124.200) by
 fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Fri, 18 May 2018 00:51:54 -0700
Received: from lcsmsx153.ger.corp.intel.com (10.186.165.228) by
 FMSMSX102.amr.corp.intel.com (10.18.124.200) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Fri, 18 May 2018 00:51:54 -0700
Received: from hasmsx105.ger.corp.intel.com ([169.254.1.197]) by
 LCSMSX153.ger.corp.intel.com ([169.254.8.118]) with mapi id 14.03.0319.002;
 Fri, 18 May 2018 10:51:52 +0300
From: "Stojaczyk, DariuszX" <dariuszx.stojaczyk@intel.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Maxime Coquelin
 <maxime.coquelin@redhat.com>, "Bie, Tiwei" <tiwei.bie@intel.com>, "Tetsuya
 Mukawa" <mtetsuyah@gmail.com>, Thomas Monjalon <thomas@monjalon.net>,
 "yliu@fridaylinux.org" <yliu@fridaylinux.org>, "Liu, Changpeng"
 <changpeng.liu@intel.com>
Thread-Topic: [RFC] vhost: new rte_vhost API proposal
Thread-Index: AQHT6EQfHS5LsMfYdU+ohMObYPEAS6Qo91WAgADPx1CAAFVBgIAK/4Vg
Date: Fri, 18 May 2018 07:51:51 +0000
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A8449A2C91@HASMSX105.ger.corp.intel.com>
References: <1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com>
 <20180510163643.GD9308@stefanha-x1.localdomain>
 <FBE7E039FA50BF47A673AD0BD3CD56A84499C958@HASMSX105.ger.corp.intel.com>
 <20180511100531.GA19894@stefanha-x1.localdomain>
In-Reply-To: <20180511100531.GA19894@stefanha-x1.localdomain>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 11.0.200.100
dlp-reaction: no-action
x-originating-ip: [10.104.19.213]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [RFC] vhost: new rte_vhost API proposal
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 18 May 2018 07:51:57 -0000


> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, May 11, 2018 6:06 PM
> On Fri, May 11, 2018 at 05:55:45AM +0000, Stojaczyk, DariuszX wrote:
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > Sent: Friday, May 11, 2018 12:37 AM
> > > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > > > rte_virtio would offer both vhost and virtio driver APIs. These two
> > > > have a lot of common code for vhost-user handling or PCI access for
> > > > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > > > sense to keep target and initiator code separated between different
> > > > libs. Of course, the APIs would be separate - only some parts of
> > > > the code would be shared.
> > >
> > > The API below seems to be for vhost backends (aka slaves).
> rte_virtio_*
> > > is a misnomer because vhost and virtio are two different things.  Thi=
s
> > > is not for implementing virtio devices, it's specifically for vhost
> > > devices.
> >
> > I agree it's named a bit off if we're talking about vhost. My idea was =
to
> introduce a generic library for userspace Virtio processing and that's
> where the name came from. Even when you use just the vhost API that's
> introduced here, you are only required to implement vring processing,
> config access, and possibly interrupt handling, all of which are typical
> Virtio things. The vhost logic is hidden inside.
>=20
> No, the vhost logic is not hidden: there is custom_msg() and the whole
> tgt_ops struct is an abstraction of the vhost protocol, not virtio.
>=20
> It sounds like you're hoping to create a single API that can support
> both vhost and virtio access.  For example, one "net" device backend
> implementation using rte_virtio can be accessed via vhost or virtio.
>=20
> This won't work because vhost and virtio are not equivalent.  vhost-net
> devices don't implement the virtio-net config space and they only have a
> subset of the virtqueues.  vhost-net devices support special vhost
> messages that don't exist in virtio-net.
>=20
> Additionally, the virtio and vhost-user specifications are independent
> and make no promise of a 1:1 mapping.  They have the freedom to
> change
> in ways which will break any abstraction you come up with today.
>=20
> I hope it will be possible to unify the two in the future, but that
> needs to happen at the spec level first, before trying to unify them in
> code.
>=20
> This is why I'm belaboring the point that vhost should not be confused
> with virtio.  Each needs to be separate and clearly identified to avoid
> confusion.
> =09

Ok, I'm convinced now. Thanks for the explanation. I'll name the lib rte_vh=
ost2 in v2.


> >
> > >
> > > Vhost does not offer the full virtio device model - otherwise it woul=
d
> > > be just another transport in the VIRTIO specification.  Instead vhost=
 is
> > > a protocol for vhost devices, which are subsets of virtio devices.
> > >
> > > I suggest calling it rte_vhost2 since it's basically a new, incompati=
ble
> > > rte_vhost API.
> >
> > Rte_vhost2 sounds better for what we have now, but would that name
> be still valid once we add a true Virtio driver functionality? (I believe=
 it's
> called Virtio PMD in DPDK right now). That driver would reuse a lot of th=
e
> vhost code for PCI and vhost-user, so it makes some sense to put these
> two together.
> >
> > I don't think rte_vhost2 is a permanent name anyway, so maybe we
> could call it like so for now, and rename it later once I introduce that
> additional Virtio functionality? Would that work?
>=20
> The natural layering for is that vhost depends on virtio.  Virtio header
> files (feature bits, config space layout, vring layout) and the vring
> API can be reused by vhost.
>=20
> Virtio doesn't need knowledge of virtio though and the two can be in
> separate packages without code duplication.
>=20
> That said, it doesn't really matter whether there are rte_virtio +
> rte_vhost2 packages or a single rte_virtio package, as long as the
> function and struct names for vhost interfaces contain the name "vhost"
> so they cannot be confused with virtio.
>=20
> > > > + * or before actions that require changing vhost device/vq state.
> > > > + */
> > > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq);
> > > > + /** Device disconnected. All queues are guaranteed to be stopped
> by
> > > now */
> > > > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > > > + /**
> > > > + * Custom message handler. `vdev` and `vq` can be NULL. This is
> called
> > > > + * for backend-specific actions. The `id` should be prefixed by th=
e
> > >
> > > Since vdev can be NULL, does this mean custom_msg() may be invoked
> at
> > > any time during the lifecycle and even before/after
> > > device_create()/device_destroy()?
> >
> > Theoretically. I was thinking of some poorly-written backends notifying
> they're out of internal resources, but I agree it's just poor. I'll remov=
e the
> `vdev can be NULL` part.
>=20
> Okay, I wasn't suggesting it's bad, I just wanted the docs to state at
> which points in the lifecycle this callback can be invoked.
>=20
> > > > + */
> > > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq,
> > > > +   char *id, void *ctx);
> > >
> > > What is the purpose of id and why is it char* instead of const char*?
> >
> > Ack, It should be const. (same thing goes to every other char* in this
> patch)
> >
> > For example vhost-crypto introduces two new vhost-user messages for
> initializing and destroying crypto session. The underlying vhost-crypto
> vhost-user backend after receiving such message could execute this
> callback as follows:
> >
> > struct my_crypto_data *data =3D calloc();
> > [...]
> > Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);
>=20
> So it's necessary to modify rte_virtio vhost code when implementing new
> device backends with custom messages?
>=20
> It seems like rte_virtio needs to have knowledge of how to parse any
> custom messages :(.  It would be cleaner for rte_virtio to have no
> knowledge of device-specific messages.
>=20
> And how does the device backend reply to custom messages?
>

The library would send proper response after receiving rte_virtio_tgt_cb_co=
mplete(). If it needs additional data from the user, there's the `ctx` fiel=
d in custom_msg that he [the user] can write into.

However, I started to work on the implementation and came to conclusion tha=
t it's unnecessarily difficult to implement new Vhost device backends this =
way. I've changed the custom_msg callback to parse raw Vhost-user messages =
now. Still, new Vhost-user messages are usually a type of a protocol extens=
ion negotiated by a protocol feature flag, and protocol extensions should b=
e implemented inside the lib in my opinion. If a protocol extension changes=
 existing message rather than introduces new one, we'll *need* to implement=
 it inside the lib.=20

Both solutions have their good and bad points.
I'm sending v2 in a couple minutes, maybe it'll help us decide which one is=
 better.

> > > > +/**
> > > > + * Unregisters a vhost target asynchronously.
> > >
> > > How are existing device instances affected?
> >
> > Ack. How about:
> > All active queues will be stopped and all devices destroyed.
> >
> > This is analogous to what rte_vhost has now.
>=20
> Sounds good.
>=20
> Stefan

Regards,
D.