From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 5064E201 for ; Fri, 3 Mar 2017 10:52:48 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2017 01:52:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,236,1484035200"; d="scan'208";a="1137400840" Received: from yliu-dev.sh.intel.com ([10.239.67.162]) by fmsmga002.fm.intel.com with ESMTP; 03 Mar 2017 01:52:45 -0800 From: Yuanhan Liu To: dev@dpdk.org Cc: Maxime Coquelin , Harris James R , Liu Changpeng , Yuanhan Liu Date: Fri, 3 Mar 2017 17:51:05 +0800 Message-Id: <1488534682-3494-1-git-send-email-yuanhan.liu@linux.intel.com> X-Mailer: git-send-email 1.9.0 Subject: [dpdk-dev] [PATCH 00/17] vhost: generic vhost API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Mar 2017 09:52:49 -0000 This is a first attempt to make DPDK vhost library be generic enough, so that user could built its own vhost-user drivers on top of it. For example, SPDK (Storage Performance Development Kit) is trying to enable vhost-user SCSI. The basic idea is, let DPDK vhost be a vhost-user agent. It stores all the info about the virtio device (i.e. vring address, negotiated features, etc) and let the specific vhost-user driver to fetch them (by the API provided by DPDK vhost lib). With those info being provided, the vhost-user driver then could get/put vring entries, thus, it could exchange data between the guest and host. The last patch demonstrates how to use these new APIs to implement a very simple vhost-user net driver, without any fancy features enabled. API/ABI Changes summary ======================= - some renames * "struct virtio_net_device_ops" ==> "struct vhost_device_ops" * "rte_virtio_net.h" ==> "rte_vhost.h" - driver related APIs are bond with the socket file * rte_vhost_driver_set_features(socket_file, features); * rte_vhost_driver_get_features(socket_file, features); * rte_vhost_driver_enable_features(socket_file, features) * rte_vhost_driver_disable_features(socket_file, features) * rte_vhost_driver_callback_register(socket_file, notify_ops); - new APIs to fetch guest and vring info * rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem); * rte_vhost_get_negotiated_features(int vid); * rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx, struct rte_vhost_vring *vring); - new exported structures * struct rte_vhost_vring * struct rte_vhost_mem_region * struct rte_vhost_memory Some design choices =================== While making this patchset, I met quite few design choices and here are two of them, with the issue and the reason I made such choices provided. Please let me know if you have any comments (or better ideas). Export public structures or not ------------------------------- I made an ABI refactor last time (v16.07): move all the structures internally and let applications use a "vid" to reference the internal struct. With that, I hope we could never worry about the annoying ABI issues. It works great (and as expected) since then, as far as we only support virito-net, as far as we can handle all the descs inside vhost lib. It becomes problematic when a user wants to implement a vhost-user driver somewhere. For example, it needs do the GPA to VVA translation. Without any structs exported, some functions like gpa_to_vva() can't be inlined. Calling it would be costly, especially it's a function we have to invoke for processing each vring desc. For that reason, the guest memory regions are exported. With that, the gpa_to_vva could be inlined. Add helper functions to fetch/update descs or not ------------------------------------------------- I intended to do it like this way: introduce one function to get @count of descs from a specific vring and another one to update the used descs. It's something like rte_vhost_vring_get_descs(vid, vring_idx, count, offset, iov, descs); rte_vhost_vring_update_used_descs(vid, vring_idx, count, offset, descs); With that, vhost-user driver programmer's task would be easier, as he/she doesn't have to parse the descs any more (such as to handle indirect desc). But judging that virtio 1.1 is just emerged and it proposes a completely ring layout, and most importantly, the vring desc structure is also changed, I'd like to hold to introduce such two functions. Otherwise, it's very likely the two will be invalid when virtio 1.1 is out. Though I think it may could be addressed with a care design, something like making the IOV generic enough: struct rte_vhost_iov { uint64_t gpa; uint64_t vva; uint64_t len; }; Instead, I go with the other way: introduce few APIs to export all the vring infos (vring size, vring addr, callfd, etc), and let the vhost-user driver read and update the descs. Those info could be passed to vhost-user driver by introducing one API for each, but for saving few APIs and reducing few calls for the programmer, I packed few key fields into a new structure, so that it can be fetched with one call: struct rte_vhost_vring { struct vring_desc *desc; struct vring_avail *avail; struct vring_used *used; uint64_t log_guest_addr; int callfd; int kickfd; uint16_t size; }; When virtio 1.1 comes out, likely a simple change like following would just work: struct rte_vhost_vring { union { struct { struct vring_desc *desc; struct vring_avail *avail; struct vring_used *used; uint64_t log_guest_addr; }; struct desc *desc_1_1; /* vring addr for virtio 1.1 */ }; int callfd; int kickfd; uint16_t size; }; AFAIK, it's not an ABI breakage. Even if it does, we could introduce a new API to get the virtio 1.1 ring address. Those fields are the minimum set I got for a specific vring, with the mind it would bring the minimum chance to break ABI for future extension. If we need more info, we could introduce a new API. OTOH, for getting the best performance, the two functions also have to be inlined ("vid + vring_idx" combo is replaced with "vring"): rte_vhost_vring_get_descs(vring, count, offset, iov, descs); rte_vhost_vring_update_used_descs(vring, count, offset, descs); That said, one way or another, we have to export rte_vhost_vring struct. For this reason, I didn't rush into introducing the two APIs. TODOs ===== This series still got few small items to finish, and they are: - update release note - fill API comments - set protocol features --yliu --- Yuanhan Liu (17): vhost: introduce driver features related APIs net/vhost: remove feature related APIs vhost: use new APIs to handle features vhost: make notify ops per vhost driver vhost: export guest memory regions vhost: introduce API to fetch negotiated features vhost: export vhost vring info vhost: export API to translate gpa to vva vhost: turn queue pair to vring vhost: export the number of vrings vhost: move the device ready check at proper place vhost: drop the Rx and Tx queue macro vhost: do not include net specific headers vhost: rename device ops struct vhost: rename virtio-net to vhost vhost: rename header file examples/vhost: demonstrate the new generic vhost APIs doc/guides/rel_notes/deprecation.rst | 9 - drivers/net/vhost/rte_eth_vhost.c | 51 ++-- drivers/net/vhost/rte_eth_vhost.h | 32 +-- drivers/net/vhost/rte_pmd_vhost_version.map | 3 - examples/tep_termination/main.c | 11 +- examples/tep_termination/main.h | 2 + examples/tep_termination/vxlan_setup.c | 2 +- examples/vhost/Makefile | 2 +- examples/vhost/main.c | 88 ++++-- examples/vhost/main.h | 33 ++- examples/vhost/virtio_net.c | 405 ++++++++++++++++++++++++++++ lib/librte_vhost/Makefile | 4 +- lib/librte_vhost/rte_vhost.h | 259 ++++++++++++++++++ lib/librte_vhost/rte_vhost_version.map | 18 +- lib/librte_vhost/rte_virtio_net.h | 193 ------------- lib/librte_vhost/socket.c | 143 ++++++++++ lib/librte_vhost/vhost.c | 209 +++++++------- lib/librte_vhost/vhost.h | 82 +++--- lib/librte_vhost/vhost_user.c | 91 +++---- lib/librte_vhost/vhost_user.h | 2 +- lib/librte_vhost/virtio_net.c | 35 +-- 21 files changed, 1140 insertions(+), 534 deletions(-) create mode 100644 examples/vhost/virtio_net.c create mode 100644 lib/librte_vhost/rte_vhost.h delete mode 100644 lib/librte_vhost/rte_virtio_net.h -- 1.9.0