From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 79AF91B986 for ; Thu, 10 May 2018 11:48:31 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 May 2018 02:48:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,384,1520924400"; d="scan'208";a="54722611" Received: from kraken.imu.intel.com (HELO Sent) ([10.217.246.153]) by orsmga001.jf.intel.com with SMTP; 10 May 2018 02:48:26 -0700 Received: by Sent (sSMTP sendmail emulation); Thu, 10 May 2018 15:32:32 +0200 From: Dariusz Stojaczyk To: dev@dpdk.org, Maxime Coquelin , Tiwei Bie , Tetsuya Mukawa , Thomas Monjalon Cc: yliu@fridaylinux.org, Stefan Hajnoczi , Dariusz Stojaczyk Date: Thu, 10 May 2018 15:22:53 +0200 Message-Id: <1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com> X-Mailer: git-send-email 2.7.4 Subject: [dpdk-dev] [RFC] vhost: new rte_vhost API proposal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2018 09:48:32 -0000 rte_vhost has been confirmed not to work with some Virtio devices (it's not vhost-user spec compliant, see details below) and fixing it directly would require quite a big amount of changes which would completely break backwards compatibility. This library is intended to smooth out the transition. It exposes a low-level API for implementing new Virtio drivers/targets. The existing rte_vhost is about to be refactored to use rte_virtio library underneath, and demanding drivers could now use rte_virtio directly. rte_virtio would offer both vhost and virtio driver APIs. These two have a lot of common code for vhost-user handling or PCI access for initiator/virtio-vhost-user (and possibly vDPA) so there's little sense to keep target and initiator code separated between different libs. Of course, the APIs would be separate - only some parts of the code would be shared. rte_virtio intends to abstract away most vhost-user/virtio-vhost-user specifics and to allow developers to implement Virtio targets/drivers with an ease. It calls user-provided callbacks once proper device initialization state has been reached. That is - memory mappings have changed, virtqueues are ready to be processed, features have changed in runtime, etc. Compared to the rte_vhost, this lib additionally allows the following: * ability to start/stop particular queues - that's required by the vhost-user spec. rte_vhost has been already confirmed not to work with some Virtio devices which do not initialize some of their management queues. * most callbacks are now asynchronous - it greatly simplifies the event handling for asynchronous applications and doesn't make anything harder for synchronous ones. * this is low-level API. It doesn't have any vhost-net, nvme or crypto references. These backend-specific libraries will be later refactored to use *this* generic library underneath. This implies that the library doesn't do any virtqueue processing, it only delivers vring addresses to the user, so he can process virtqueues by himself. * abstracting away PCI/vhost-user. * The API imposes how public functions can be called and how internal data can change, so there's only a minimal work required to ensure thread-safety. Possibly no mutexes are required at all. * full Virtio 1.0/vhost-user specification compliance. This patch only introduces the API. Some additional functions for vDPA might be still required, but everything present here so far shouldn't need changing. Signed-off-by: Dariusz Stojaczyk --- lib/librte_virtio/rte_virtio.h | 245 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 245 insertions(+) create mode 100644 lib/librte_virtio/rte_virtio.h diff --git a/lib/librte_virtio/rte_virtio.h b/lib/librte_virtio/rte_virtio.h new file mode 100644 index 0000000..0203d5e --- /dev/null +++ b/lib/librte_virtio/rte_virtio.h @@ -0,0 +1,245 @@ +/*- + * BSD LICENSE + * + * Copyright (c) Intel Corporation. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include + +/** Single memory region. Both physically and virtually contiguous */ +struct rte_virtio_mem_region { + uint64_t guest_phys_addr; + uint64_t guest_user_addr; + uint64_t host_user_addr; + uint64_t size; + void *mmap_addr; + uint64_t mmap_size; + int fd; +}; + +struct rte_virtio_memory { + uint32_t nregions; + struct rte_virtio_mem_region regions[]; +}; + +/** + * Vhost device created and managed by rte_virtio. Accessible via + * \c rte_virtio_tgt_ops callbacks. This is only a part of the real + * vhost device data. This struct is published just for inline vdev + * functions to access their data directly. + */ +struct rte_virtio_dev { + struct rte_virtio_memory *mem; + uint64_t features; +}; + +/** + * Virtqueue created and managed by rte_virtio. Accessible via + * \c rte_virtio_tgt_ops callbacks. + */ +struct rte_virtio_vq { + struct vring_desc *desc; + struct vring_avail *avail; + struct vring_used *used; + /* available only if F_LOG_ALL has been negotiated */ + void *log; + uint16_t size; +}; + +/** + * Device/queue related callbacks, all optional. Provided callback + * parameters are guaranteed not to be NULL until explicitly specified. + */ +struct rte_virtio_tgt_ops { + /** New initiator connected. */ + void (*device_create)(struct rte_virtio_dev *vdev); + /** + * Device is ready to operate. vdev->mem is now available. + * This callback may be called multiple times as memory mappings + * can change dynamically. All queues are guaranteed to be stopped + * by now. + */ + void (*device_init)(struct rte_virtio_dev *vdev); + /** + * Features have changed in runtime. Queues might be still running + * at this point. + */ + void (*device_features_changed)(struct rte_virtio_dev *vdev); + /** + * Start processing vq. The `vq` is guaranteed not to be modified before + * `queue_stop` is called. + */ + void (*queue_start)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq); + /** + * Stop processing vq. It shouldn't be accessed after this callback + * completes (via tgt_cb_complete). This can be called prior to shutdown + * or before actions that require changing vhost device/vq state. + */ + void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq); + /** Device disconnected. All queues are guaranteed to be stopped by now */ + void (*device_destroy)(struct rte_virtio_dev *vdev); + /** + * Custom message handler. `vdev` and `vq` can be NULL. This is called + * for backend-specific actions. The `id` should be prefixed by the + * backend name (net/crypto/scsi) and `ctx` is message-specific data + * that should be available until tgt_cb_complete is called. + */ + void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq, + char *id, void *ctx); + + /** + * Interrupt handler, synchronous. If this callback is set to NULL, + * rte_virtio will hint the initiators not to send any interrupts. + */ + void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq); + /** Device config read, synchronous. */ + int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config, + uint32_t config_len); + /** Device config changed by the driver, synchronous. */ + int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config, + uint32_t offset, uint32_t len, uint32_t flags); +}; + +/** + * Registers a new vhost target accepting remote connections. Multiple + * available transports are available. It is possible to create a Vhost-user + * Unix domain socket polling local connections or connect to a physical + * Virtio device and install an interrupt handler . + * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost-user", + * "PCI-vDPA", "vhost-user". + * \param trid identifier of the device. For PCI this would be the BDF address, + * for vhost-user the socket name. + * \param trctx additional data for the specified transport. Can be NULL. + * \param tgt_ops callbacks to be called upon reaching specific initialization + * states. + * \param features supported Virtio features. To be negotiated with the + * driver ones. rte_virtio will append a couple of generic feature bits + * which are required by the Virtio spec. TODO list these features here + * \return 0 on success, negative errno otherwise + */ +int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx, + struct rte_virtio_tgt_ops *tgt_ops, + uint64_t features); + +/** + * Finish async device tgt ops callback. Unless a tgt op has been documented + * as 'synchronous' this function must be called at the end of the op handler. + * It can be called either before or after the op handler returns. rte_virtio + * won't call any callbacks while another one hasn't been finished yet. + * \param vdev vhost device + * \param rc 0 on success, negative errno otherwise. + */ +int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc); + +/** + * Unregisters a vhost target asynchronously. + * \param cb_fn callback to be called on finish + * \param cb_arg argument for \c cb_fn + */ +void rte_virtio_tgt_unregister(char *trid, + void (*cb_fn)(void *arg), void *cb_arg); + +/** + * Bypass F_IOMMU_PLATFORM and translate gpa directly. + * \param mem vhost device memory + * \param gpa guest physical address + * \param len length of the memory to translate (in bytes). If requested + * memory chunk crosses memory region boundary, the *len will be set to + * the remaining, maximum length of virtually contiguous memory. In such + * case the user will be required to call another gpa_to_vva(gpa + *len). + * \return vhost virtual address or NULL if requested `gpa` is not mapped. + */ +static inline void * +rte_virtio_gpa_to_vva(struct rte_virtio_memory *mem, uint64_t gpa, uint64_t *len) +{ + struct rte_virtio_mem_region *r; + uint32_t i; + + for (i = 0; i < mem->nregions; i++) { + r = &mem->regions[i]; + if (gpa >= r->guest_phys_addr && + gpa < r->guest_phys_addr + r->size) { + + if (unlikely(*len > r->guest_phys_addr + r->size - gpa)) { + *len = r->guest_phys_addr + r->size - gpa; + } + + return gpa - r->guest_phys_addr + + r->host_user_addr; + } + } + *len = 0; + + return 0; +} + +/** + * Translate I/O virtual address to vhost address space. + * If F_IOMMU_PLATFORM has been negotiated, this might potentially + * send a TLB miss and wait for the TLB update response. + * If F_IOMMU_PLATFORM has not been negotiated, `iova` is + * a physical address and `perm` is ignored. + * \param vdev vhost device + * \param iova I/O virtual address + * \param len length of the memory to translate (in bytes). If requested + * memory chunk crosses memory region boundary, the *len will be set to + * the remaining, maximum length of virtually contiguous memory. In such + * case the user will be required to call another gpa_to_vva(gpa + *len). + * \perm VHOST_ACCESS_RO,VHOST_ACCESS_WO or VHOST_ACCESS_RW + * \return vhost virtual address or NULL if requested `iova` is not mapped + * or the `perm` doesn't match. + */ +static inline void * +rte_virtio_iova_to_vva(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq, + uint64_t iova, uint32_t *len, uint8_t perm) +{ + void *__vhost_iova_to_vva(struct virtio_net * dev, struct vhost_virtqueue * vq, + uint64_t iova, uint64_t size, uint8_t perm); + + if (!(vdev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) { + return rte_virtio_gpa_to_vva(vdev->mem, iova, len); + } + + return __vhost_iova_to_vva(vdev, vq, iova, len, perm); +} + +/** + * Notify the driver about vq change. This is an eventfd_write for vhost-user + * or MMIO write for PCI devices. + */ +void rte_virtio_dev_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq); + +/** + * Notify the driver about device config change. This will result in \c + * rte_virtio_tgt_ops->get_config being called. This is an eventfd_write + * for vhost-user or MMIO write for PCI devices + */ +void rte_virtio_dev_cfg_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq); + -- 2.7.4