From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 83AB8ADAB for ; Wed, 15 Jun 2016 11:04:02 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP; 15 Jun 2016 02:03:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,475,1459839600"; d="scan'208";a="122217700" Received: from dpdk06.sh.intel.com ([10.239.128.225]) by fmsmga004.fm.intel.com with ESMTP; 15 Jun 2016 02:03:34 -0700 From: Jianfeng Tan To: dev@dpdk.org Cc: Jianfeng Tan , Huawei Xie , rich.lane@bigswitch.com, yuanhan.liu@linux.intel.com, mst@redhat.com, nakajima.yoshihiro@lab.ntt.co.jp, p.fedin@samsung.com, ann.zhuangyanying@huawei.com, mukawa@igel.co.jp, nhorman@tuxdriver.com Date: Wed, 15 Jun 2016 09:03:19 +0000 Message-Id: <1465981405-37485-1-git-send-email-jianfeng.tan@intel.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> Subject: [dpdk-dev] [PATCH v9 0/6] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2016 09:04:03 -0000 v9: - Squash a patch in mq support of virtio-user to use virtual address in control queue inside this series. - A regression bug fix, missed "%s" when printing some message. - Run check-git-log.sh, change pci to PCI. v8: - Change to use max_queue_pairs instead of queue_pairs to initialize and deinitialize queues. - Remove vhost-kernel support. v7: - CONFIG_RTE_VIRTIO_VDEV -> CONFIG_RTE_VIRTIO_USER; and corresondingly, RTE_VIRTIO_VDEV -> RTE_VIRTIO_USER. - uint64_t -> uintptr_t, so that it can be compiled on 32-bit platform. - Rebase on latest dpdk-next-virtio branch. - Abandon abstracting related code into vring_hdr_desc_init(), instead, just move it behind setup_queue(). v6: - Move driver related code into from driver/net/virtio/virtio-user/ to driver/net/virtio/ directory, inside virtio_user_ethdev.c. - Rename vdev to virtio_user in comments and code. - Merge code, which lies in virtio_user_pci.c, into virtio_user_ethdev.c. - Add some comments at virtio-user special handling at virtio_dev_ethdev.c. - Merge document update into the 7nd commit where virtio-user is added. - Add usage with vhost-switch in vhost.rst. v5: - Rename struct virtio_user_hw to struct virtio_user_dev. - Rename "vdev_private" to "virtio_user_dev". - Move special handling into virtio_ethdev.c from queue_setup(). - Add vring in virtio_user_dev (remove rte_eth_dev_data), so that device does not depend on driver's data structure (rte_eth_dev_data). - Remove update on doc/guides/nics/overview.rst, because virtio-user has exact feature set with virtio. - Change "unsigned long int" to "uint64_t", "unsigned" to "uint32_t". - Remove unnecessary cast in vdev_read_dev_config(). - Add functions in virtio_user_dev.c with prefix of "virtio_user_". - Rebase on virtio-next-virtio. v4: - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to judge if it's virtual device or physical device. - Change the added device name to virtio-user. - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c, virtio_user_dev.c. - Move virtio-user specific data from struct virtio_hw into struct virtio_user_hw. - Add support to send reset_owner message. - Change del_queue implementation. (This need more check) - Remove rte_panic(), and superseded with log. - Add reset_owner into virtio_pci_ops.reset. - Merge parameter "rx" and "tx" to "queues" to emliminate confusion. - Move get_features to after set_owner. - Redefine path in virtio_user_hw from char * to char []. v3: - Remove --single-file option; do no change at EAL memory. - Remove the added API rte_eal_get_backfile_info(), instead we check all opened files with HUGEFILE_FMT to find hugepage files owned by DPDK. - Accordingly, add more restrictions at "Known issue" section. - Rename parameter from queue_num to queue_size for confusion. - Rename vhost_embedded.c to rte_eth_virtio_vdev.c. - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to reuse eth_virtio_dev_init(), remove its static declaration. - Implement dev_uninit() for rte_eth_dev_detach(). - WARN -> ERR, in vhost_embedded.c - Add more commit message for clarify the model. v2: - Rebase on the patchset of virtio 1.0 support. - Fix cannot create non-hugepage memory. - Fix wrong size of memory region when "single-file" is used. - Fix setting of offset in virtqueue to use virtual address. - Fix setting TUNSETVNETHDRSZ in vhost-user's branch. - Add mac option to specify the mac address of this virtual device. - Update doc. This patchset is to provide high performance networking interface (virtio) for container-based DPDK applications. The way of starting DPDK apps in containers with ownership of NIC devices exclusively is beyond the scope. The basic idea here is to present a new virtual device (named virtio-user), which can be discovered and initialized by DPDK. To minimize the change, we reuse already-existing virtio PMD code (driver/net/virtio/). Background: Previously, we usually use a virtio device in the context of QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually presented in VM as a PCI device. ------------------ | virtio driver | -----> VM ------------------ | | ----------> (over PCI bus or MMIO or Channel I/O) | ------------------ | device emulate | | | -----> QEMU | vhost adapter | ------------------ | | ----------> (vhost-user protocol or vhost-net ioctls) | ------------------ | vhost backend | ------------------ Compared to QEMU/VM case, virtio support for contaner requires to embedded device framework inside the virtio PMD. So this converged driver actually plays three roles: - virtio driver to drive this new kind of virtual device; - device emulation to present this virtual device and reponse to the virtio driver, which is originally by QEMU; - and the role to communicate with vhost backend, which is also originally by QEMU. The code layout and functionality of each module: ---------------------- | ------------------ | | | virtio driver | |----> (virtio_user_ethdev.c) | ------------------ | | | | | ------------------ | ------> virtio-user PMD | | device emulate |-|----> (virtio_user_dev.c) | | | | | | vhost adapter |-|----> (vhost_user.c, vhost_kernel.c, vhost.c) | ------------------ | ---------------------- | | -------------- --> (vhost-user protocol) | ------------------ | vhost backend | ------------------ How to share memory? In VM's case, qemu always shares all physical layout to backend. But it's not feasible for a container, as a process, to share all virtual memory regions to backend. So only specified virtual memory regions (with type of shared) are sent to backend. It's a limitation that only addresses in these areas can be used to transmit or receive packets. Known issues: - Control queue and multi-queue are not supported yet. - Cannot work with --huge-unlink. - Cannot work with no-huge. - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. - Root privilege is a must (mainly becase of sorting hugepages according to physical address). - Applications should not use file name like HUGEFILE_FMT ("%smap_%d"). - Cannot work with vhost kernel. How to use? a. Apply this patchset. b. To compile container apps: $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc c. To build a docker image using Dockerfile below. $: cat ./Dockerfile FROM ubuntu:latest WORKDIR /usr/src/dpdk COPY . /usr/src/dpdk ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/" $: docker build -t dpdk-app-l2fwd . d. Used with vhost-user $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \ --socket-mem 1024,1024 -- -p 0x1 --stats 1 $: docker run -i -t -v :/var/run/usvhost \ -v /dev/hugepages:/dev/hugepages \ dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ --vdev=virtio-user0,path=/var/run/usvhost -- -p 0x1 By the way, it's not necessary to run in a container. Signed-off-by: Huawei Xie Signed-off-by: Jianfeng Tan Acked-by: Yuanhan Liu Jianfeng Tan (6): virtio: hide phys addr check inside PCI ops virtio: enable use virtual address to fill desc virtio-user: add vhost user adapter layer virtio-user: add device emulation layer APIs virtio-user: add new virtual PCI driver for virtio virtio-user: add a new vdev named virtio-user config/common_linuxapp | 1 + doc/guides/rel_notes/release_16_07.rst | 12 + doc/guides/sample_app_ug/vhost.rst | 17 + drivers/net/virtio/Makefile | 6 + drivers/net/virtio/virtio_ethdev.c | 83 +++-- drivers/net/virtio/virtio_ethdev.h | 2 + drivers/net/virtio/virtio_pci.c | 30 +- drivers/net/virtio/virtio_pci.h | 3 +- drivers/net/virtio/virtio_rxtx.c | 5 +- drivers/net/virtio/virtio_rxtx_simple.c | 13 +- drivers/net/virtio/virtio_user/vhost.h | 141 ++++++++ drivers/net/virtio/virtio_user/vhost_user.c | 404 +++++++++++++++++++++ drivers/net/virtio/virtio_user/virtio_user_dev.c | 227 ++++++++++++ drivers/net/virtio/virtio_user/virtio_user_dev.h | 62 ++++ drivers/net/virtio/virtio_user_ethdev.c | 427 +++++++++++++++++++++++ drivers/net/virtio/virtqueue.h | 10 + 16 files changed, 1398 insertions(+), 45 deletions(-) create mode 100644 drivers/net/virtio/virtio_user/vhost.h create mode 100644 drivers/net/virtio/virtio_user/vhost_user.c create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.c create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.h create mode 100644 drivers/net/virtio/virtio_user_ethdev.c -- 2.1.4