From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id CA3508D9D for ; Fri, 18 May 2018 09:51:56 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 May 2018 00:51:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,413,1520924400"; d="scan'208";a="40937131" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga008.fm.intel.com with ESMTP; 18 May 2018 00:51:55 -0700 Received: from fmsmsx102.amr.corp.intel.com (10.18.124.200) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 18 May 2018 00:51:54 -0700 Received: from lcsmsx153.ger.corp.intel.com (10.186.165.228) by FMSMSX102.amr.corp.intel.com (10.18.124.200) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 18 May 2018 00:51:54 -0700 Received: from hasmsx105.ger.corp.intel.com ([169.254.1.197]) by LCSMSX153.ger.corp.intel.com ([169.254.8.118]) with mapi id 14.03.0319.002; Fri, 18 May 2018 10:51:52 +0300 From: "Stojaczyk, DariuszX" To: Stefan Hajnoczi CC: "dev@dpdk.org" , Maxime Coquelin , "Bie, Tiwei" , "Tetsuya Mukawa" , Thomas Monjalon , "yliu@fridaylinux.org" , "Liu, Changpeng" Thread-Topic: [RFC] vhost: new rte_vhost API proposal Thread-Index: AQHT6EQfHS5LsMfYdU+ohMObYPEAS6Qo91WAgADPx1CAAFVBgIAK/4Vg Date: Fri, 18 May 2018 07:51:51 +0000 Message-ID: References: <1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com> <20180510163643.GD9308@stefanha-x1.localdomain> <20180511100531.GA19894@stefanha-x1.localdomain> In-Reply-To: <20180511100531.GA19894@stefanha-x1.localdomain> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.104.19.213] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC] vhost: new rte_vhost API proposal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 May 2018 07:51:57 -0000 > -----Original Message----- > From: Stefan Hajnoczi [mailto:stefanha@redhat.com] > Sent: Friday, May 11, 2018 6:06 PM > On Fri, May 11, 2018 at 05:55:45AM +0000, Stojaczyk, DariuszX wrote: > > > -----Original Message----- > > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com] > > > Sent: Friday, May 11, 2018 12:37 AM > > > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote: > > > > rte_virtio would offer both vhost and virtio driver APIs. These two > > > > have a lot of common code for vhost-user handling or PCI access for > > > > initiator/virtio-vhost-user (and possibly vDPA) so there's little > > > > sense to keep target and initiator code separated between different > > > > libs. Of course, the APIs would be separate - only some parts of > > > > the code would be shared. > > > > > > The API below seems to be for vhost backends (aka slaves). > rte_virtio_* > > > is a misnomer because vhost and virtio are two different things. Thi= s > > > is not for implementing virtio devices, it's specifically for vhost > > > devices. > > > > I agree it's named a bit off if we're talking about vhost. My idea was = to > introduce a generic library for userspace Virtio processing and that's > where the name came from. Even when you use just the vhost API that's > introduced here, you are only required to implement vring processing, > config access, and possibly interrupt handling, all of which are typical > Virtio things. The vhost logic is hidden inside. >=20 > No, the vhost logic is not hidden: there is custom_msg() and the whole > tgt_ops struct is an abstraction of the vhost protocol, not virtio. >=20 > It sounds like you're hoping to create a single API that can support > both vhost and virtio access. For example, one "net" device backend > implementation using rte_virtio can be accessed via vhost or virtio. >=20 > This won't work because vhost and virtio are not equivalent. vhost-net > devices don't implement the virtio-net config space and they only have a > subset of the virtqueues. vhost-net devices support special vhost > messages that don't exist in virtio-net. >=20 > Additionally, the virtio and vhost-user specifications are independent > and make no promise of a 1:1 mapping. They have the freedom to > change > in ways which will break any abstraction you come up with today. >=20 > I hope it will be possible to unify the two in the future, but that > needs to happen at the spec level first, before trying to unify them in > code. >=20 > This is why I'm belaboring the point that vhost should not be confused > with virtio. Each needs to be separate and clearly identified to avoid > confusion. > =09 Ok, I'm convinced now. Thanks for the explanation. I'll name the lib rte_vh= ost2 in v2. > > > > > > > > Vhost does not offer the full virtio device model - otherwise it woul= d > > > be just another transport in the VIRTIO specification. Instead vhost= is > > > a protocol for vhost devices, which are subsets of virtio devices. > > > > > > I suggest calling it rte_vhost2 since it's basically a new, incompati= ble > > > rte_vhost API. > > > > Rte_vhost2 sounds better for what we have now, but would that name > be still valid once we add a true Virtio driver functionality? (I believe= it's > called Virtio PMD in DPDK right now). That driver would reuse a lot of th= e > vhost code for PCI and vhost-user, so it makes some sense to put these > two together. > > > > I don't think rte_vhost2 is a permanent name anyway, so maybe we > could call it like so for now, and rename it later once I introduce that > additional Virtio functionality? Would that work? >=20 > The natural layering for is that vhost depends on virtio. Virtio header > files (feature bits, config space layout, vring layout) and the vring > API can be reused by vhost. >=20 > Virtio doesn't need knowledge of virtio though and the two can be in > separate packages without code duplication. >=20 > That said, it doesn't really matter whether there are rte_virtio + > rte_vhost2 packages or a single rte_virtio package, as long as the > function and struct names for vhost interfaces contain the name "vhost" > so they cannot be confused with virtio. >=20 > > > > + * or before actions that require changing vhost device/vq state. > > > > + */ > > > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct > rte_virtio_vq > > > *vq); > > > > + /** Device disconnected. All queues are guaranteed to be stopped > by > > > now */ > > > > + void (*device_destroy)(struct rte_virtio_dev *vdev); > > > > + /** > > > > + * Custom message handler. `vdev` and `vq` can be NULL. This is > called > > > > + * for backend-specific actions. The `id` should be prefixed by th= e > > > > > > Since vdev can be NULL, does this mean custom_msg() may be invoked > at > > > any time during the lifecycle and even before/after > > > device_create()/device_destroy()? > > > > Theoretically. I was thinking of some poorly-written backends notifying > they're out of internal resources, but I agree it's just poor. I'll remov= e the > `vdev can be NULL` part. >=20 > Okay, I wasn't suggesting it's bad, I just wanted the docs to state at > which points in the lifecycle this callback can be invoked. >=20 > > > > + */ > > > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct > rte_virtio_vq > > > *vq, > > > > + char *id, void *ctx); > > > > > > What is the purpose of id and why is it char* instead of const char*? > > > > Ack, It should be const. (same thing goes to every other char* in this > patch) > > > > For example vhost-crypto introduces two new vhost-user messages for > initializing and destroying crypto session. The underlying vhost-crypto > vhost-user backend after receiving such message could execute this > callback as follows: > > > > struct my_crypto_data *data =3D calloc(); > > [...] > > Ops->custom_msg(vdev, NULL, "crypto_sess_init", data); >=20 > So it's necessary to modify rte_virtio vhost code when implementing new > device backends with custom messages? >=20 > It seems like rte_virtio needs to have knowledge of how to parse any > custom messages :(. It would be cleaner for rte_virtio to have no > knowledge of device-specific messages. >=20 > And how does the device backend reply to custom messages? > The library would send proper response after receiving rte_virtio_tgt_cb_co= mplete(). If it needs additional data from the user, there's the `ctx` fiel= d in custom_msg that he [the user] can write into. However, I started to work on the implementation and came to conclusion tha= t it's unnecessarily difficult to implement new Vhost device backends this = way. I've changed the custom_msg callback to parse raw Vhost-user messages = now. Still, new Vhost-user messages are usually a type of a protocol extens= ion negotiated by a protocol feature flag, and protocol extensions should b= e implemented inside the lib in my opinion. If a protocol extension changes= existing message rather than introduces new one, we'll *need* to implement= it inside the lib.=20 Both solutions have their good and bad points. I'm sending v2 in a couple minutes, maybe it'll help us decide which one is= better. > > > > +/** > > > > + * Unregisters a vhost target asynchronously. > > > > > > How are existing device instances affected? > > > > Ack. How about: > > All active queues will be stopped and all devices destroyed. > > > > This is analogous to what rte_vhost has now. >=20 > Sounds good. >=20 > Stefan Regards, D.