From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id B00FC1BD95 for ; Fri, 11 May 2018 07:55:50 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 May 2018 22:55:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,387,1520924400"; d="scan'208";a="54255583" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga001.fm.intel.com with ESMTP; 10 May 2018 22:55:48 -0700 Received: from fmsmsx114.amr.corp.intel.com (10.18.116.8) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 10 May 2018 22:55:48 -0700 Received: from HASMSX109.ger.corp.intel.com (10.184.198.21) by FMSMSX114.amr.corp.intel.com (10.18.116.8) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 10 May 2018 22:55:47 -0700 Received: from hasmsx105.ger.corp.intel.com ([169.254.1.197]) by hasmsx109.ger.corp.intel.com ([169.254.3.213]) with mapi id 14.03.0319.002; Fri, 11 May 2018 08:55:45 +0300 From: "Stojaczyk, DariuszX" To: Stefan Hajnoczi CC: "dev@dpdk.org" , Maxime Coquelin , "Bie, Tiwei" , "Tetsuya Mukawa" , Thomas Monjalon , "yliu@fridaylinux.org" , "Liu, Changpeng" Thread-Topic: [RFC] vhost: new rte_vhost API proposal Thread-Index: AQHT6EQfHS5LsMfYdU+ohMObYPEAS6Qo91WAgADPx1A= Date: Fri, 11 May 2018 05:55:45 +0000 Message-ID: References: <1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com> <20180510163643.GD9308@stefanha-x1.localdomain> In-Reply-To: <20180510163643.GD9308@stefanha-x1.localdomain> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.104.57.19] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC] vhost: new rte_vhost API proposal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2018 05:55:51 -0000 Hi, > -----Original Message----- > From: Stefan Hajnoczi [mailto:stefanha@redhat.com] > Sent: Friday, May 11, 2018 12:37 AM > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote: > > rte_virtio would offer both vhost and virtio driver APIs. These two > > have a lot of common code for vhost-user handling or PCI access for > > initiator/virtio-vhost-user (and possibly vDPA) so there's little > > sense to keep target and initiator code separated between different > > libs. Of course, the APIs would be separate - only some parts of > > the code would be shared. >=20 > The API below seems to be for vhost backends (aka slaves). rte_virtio_* > is a misnomer because vhost and virtio are two different things. This > is not for implementing virtio devices, it's specifically for vhost > devices. I agree it's named a bit off if we're talking about vhost. My idea was to i= ntroduce a generic library for userspace Virtio processing and that's where= the name came from. Even when you use just the vhost API that's introduced= here, you are only required to implement vring processing, config access, = and possibly interrupt handling, all of which are typical Virtio things. Th= e vhost logic is hidden inside. >=20 > Vhost does not offer the full virtio device model - otherwise it would > be just another transport in the VIRTIO specification. Instead vhost is > a protocol for vhost devices, which are subsets of virtio devices. >=20 > I suggest calling it rte_vhost2 since it's basically a new, incompatible > rte_vhost API. Rte_vhost2 sounds better for what we have now, but would that name be still= valid once we add a true Virtio driver functionality? (I believe it's call= ed Virtio PMD in DPDK right now). That driver would reuse a lot of the vhos= t code for PCI and vhost-user, so it makes some sense to put these two toge= ther.=20 I don't think rte_vhost2 is a permanent name anyway, so maybe we could call= it like so for now, and rename it later once I introduce that additional V= irtio functionality? Would that work? >=20 > Also, the initiator/target terminology does not match the vhost-user > specification. It uses master/client and slave/backend/server. Adding > another pair of words makes things more confusing. Please stick to the > words used by the spec. Ack. >=20 > > +/** > > + * Device/queue related callbacks, all optional. Provided callback > > + * parameters are guaranteed not to be NULL until explicitly specified= . >=20 > s/until/unless/ ? Ack. > > + /** > > + * Stop processing vq. It shouldn't be accessed after this callback > > + * completes (via tgt_cb_complete). This can be called prior to > shutdown >=20 > s/tgt_cb_complete/rte_virtio_tgt_cb_complete/ Ack. >=20 > > + * or before actions that require changing vhost device/vq state. > > + */ > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq > *vq); > > + /** Device disconnected. All queues are guaranteed to be stopped by > now */ > > + void (*device_destroy)(struct rte_virtio_dev *vdev); > > + /** > > + * Custom message handler. `vdev` and `vq` can be NULL. This is called > > + * for backend-specific actions. The `id` should be prefixed by the >=20 > Since vdev can be NULL, does this mean custom_msg() may be invoked at > any time during the lifecycle and even before/after > device_create()/device_destroy()? Theoretically. I was thinking of some poorly-written backends notifying the= y're out of internal resources, but I agree it's just poor. I'll remove the= `vdev can be NULL` part. > > + */ > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq > *vq, > > + char *id, void *ctx); >=20 > What is the purpose of id and why is it char* instead of const char*? Ack, It should be const. (same thing goes to every other char* in this patc= h) For example vhost-crypto introduces two new vhost-user messages for initial= izing and destroying crypto session. The underlying vhost-crypto vhost-user= backend after receiving such message could execute this callback as follow= s: struct my_crypto_data *data =3D calloc(); [...] Ops->custom_msg(vdev, NULL, "crypto_sess_init", data); >=20 > Is ctx the "message"? If ctx is untrusted message data from an external > process, why is there no size argument? Who validates the message size? Ack. Will add size parameter. >=20 > > + > > + /** > > + * Interrupt handler, synchronous. If this callback is set to NULL, > > + * rte_virtio will hint the initiators not to send any interrupts. > > + */ > > + void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq > *vq); >=20 > Devices often have multiple types of queues. Some of them may be > suitable for polling, others may be suitable for an interrupt-driven > model. Is there a way to enable/disable interrupts for specific queues? Thanks, I didn't think of that. I'll need to move the responsibility of set= ting vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT to the user. >=20 > > + /** Device config read, synchronous. */ >=20 > What is the meaning of the return value? How about the following: \return 0 if `config` has been successfully set, -1 otherwise. An error ( -1 ) is propagated all the way to the master so he can handle it= his way. >=20 > > + int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config, > > + uint32_t config_len); > > + /** Device config changed by the driver, synchronous. */ >=20 > What is the meaning of the return value? >=20 > What is the meaning of the flags? Good call. I actually can't find any doc/usage of this API. Changpeng (the original get/set_config author, +CC'ed), could you document = this function briefly here? >=20 > > + int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config, > > + uint32_t offset, uint32_t len, uint32_t flags); > > +}; > > + > > +/** > > + * Registers a new vhost target accepting remote connections. Multiple > > + * available transports are available. It is possible to create a Vhos= t- > user > > + * Unix domain socket polling local connections or connect to a > physical > > + * Virtio device and install an interrupt handler . > > + * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost- > user", > > + * "PCI-vDPA", "vhost-user". > > + * \param trid identifier of the device. For PCI this would be the BDF > address, > > + * for vhost-user the socket name. > > + * \param trctx additional data for the specified transport. Can be > NULL. > > + * \param tgt_ops callbacks to be called upon reaching specific > initialization > > + * states. > > + * \param features supported Virtio features. To be negotiated with > the > > + * driver ones. rte_virtio will append a couple of generic feature bit= s > > + * which are required by the Virtio spec. TODO list these features her= e > > + * \return 0 on success, negative errno otherwise > > + */ > > +int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx, > > + struct rte_virtio_tgt_ops *tgt_ops, > > + uint64_t features); > > + > > +/** > > + * Finish async device tgt ops callback. Unless a tgt op has been > documented > > + * as 'synchronous' this function must be called at the end of the op > handler. > > + * It can be called either before or after the op handler returns. > rte_virtio > > + * won't call any callbacks while another one hasn't been finished yet= . > > + * \param vdev vhost device > > + * \param rc 0 on success, negative errno otherwise. > > + */ > > +int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc); >=20 > How can this function fail and how is the caller supposed to handle > failure? If -1 is returned, the current callback will be perceived as failed. So if = ` device_create` is completed with rc !=3D 0, this lib will teardown the de= vice and no subsequent `device_destroy` will be called. Similar thing goes = with queues - if a queue failed to start, it won't need to be stopped. Sinc= e you pointed it out - I'll mention it somewhere in the doc. I didn't do so= in the first place because it's analogous to how rte_vhost works now. >=20 > Are there any thread-safety rules regarding this API? Can it be called > from a different thread than the callback? Yes. I should mention it. >=20 > > + > > +/** > > + * Unregisters a vhost target asynchronously. >=20 > How are existing device instances affected? Ack. How about: All active queues will be stopped and all devices destroyed. This is analogous to what rte_vhost has now. >=20 > > + * \param cb_fn callback to be called on finish > > + * \param cb_arg argument for \c cb_fn > > + */ > > +void rte_virtio_tgt_unregister(char *trid, >=20 > One of the rte_vhost API limitations is that the ID namespace is shared > between transports. The same seems to be the case here. >=20 > It assumes that "PCI", "PCI-vhost-user", "PCI-vDPA", and "vhost-user" > are never instantiated with the same trid. UNIX domain sockets can have > arbitrary filenames (that resemble a PCI BDF). And who knows what > other > transports will be added in the future. >=20 > I think namespace collisions could be a problem. Ack, I'll add `char *trtype` param to the unregister func. >=20 > > + void (*cb_fn)(void *arg), void *cb_arg); >=20 > Which thread is the callback invoked from? It'll be called from the same thread that calls rte_virtio_tgt_ops. I'll me= ntion it in the doc, thanks. rte_virtio_tgt_unregister should also return an error code for cases where = a device with given trtype/trid couldn't be found. It'll prevent some imple= mentations from waiting endlessly for cb_fn to be called. Thanks, D.