From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id C909D5686 for ; Sun, 10 Jan 2016 19:43:16 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP; 10 Jan 2016 10:43:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,548,1444719600"; d="scan'208";a="26927862" Received: from dpdk06.sh.intel.com ([10.239.128.225]) by fmsmga004.fm.intel.com with ESMTP; 10 Jan 2016 10:43:13 -0800 From: Jianfeng Tan To: dev@dpdk.org Date: Sun, 10 Jan 2016 19:42:58 +0800 Message-Id: <1452426182-86851-1-git-send-email-jianfeng.tan@intel.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> Cc: nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, ann.zhuangyanying@huawei.com Subject: [dpdk-dev] [PATCH 0/4] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jan 2016 18:43:17 -0000 This patchset is to provide high performance networking interface (virtio) for container-based DPDK applications. The way of starting DPDK apps in containers with ownership of NIC devices exclusively is beyond the scope. The basic idea here is to present a new virtual device (named eth_cvio), which can be discovered and initialized in container-based DPDK apps using rte_eal_init(). To minimize the change, we reuse already-existing virtio frontend driver code (driver/net/virtio/). Compared to QEMU/VM case, virtio device framework (translates I/O port r/w operations into unix socket/cuse protocol, which is originally provided in QEMU), is integrated in virtio frontend driver. So this converged driver actually plays the role of original frontend driver and the role of QEMU device framework. The major difference lies in how to calculate relative address for vhost. The principle of virtio is that: based on one or multiple shared memory segments, vhost maintains a reference system with the base addresses and length for each segment so that an address from VM comes (usually GPA, Guest Physical Address) can be translated into vhost-recognizable address (named VVA, Vhost Virtual Address). To decrease the overhead of address translation, we should maintain as few segments as possible. In VM's case, GPA is always locally continuous. In container's case, CVA (Container Virtual Address) can be used. Specifically: a. when set_base_addr, CVA address is used; b. when preparing RX's descriptors, CVA address is used; c. when transmitting packets, CVA is filled in TX's descriptors; d. in TX and CQ's header, CVA is used. How to share memory? In VM's case, qemu always shares all physical layout to backend. But it's not feasible for a container, as a process, to share all virtual memory regions to backend. So only specified virtual memory regions (with type of shared) are sent to backend. It's a limitation that only addresses in these areas can be used to transmit or receive packets. Known issues a. When used with vhost-net, root privilege is required to create tap device inside. b. Control queue and multi-queue are not supported yet. c. When --single-file option is used, socket_id of the memory may be wrong. (Use "numactl -N x -m x" to work around this for now) How to use? a. Apply this patchset. b. To compile container apps: $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc c. To build a docker image using Dockerfile below. $: cat ./Dockerfile FROM ubuntu:latest WORKDIR /usr/src/dpdk COPY . /usr/src/dpdk ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/" $: docker build -t dpdk-app-l2fwd . d. Used with vhost-user $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \ --socket-mem 1024,1024 -- -p 0x1 --stats 1 $: docker run -i -t -v :/var/run/usvhost \ -v /dev/hugepages:/dev/hugepages \ dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ --vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1 f. Used with vhost-net $: modprobe vhost $: modprobe vhost-net $: docker run -i -t --privileged \ -v /dev/vhost-net:/dev/vhost-net \ -v /dev/net/tun:/dev/net/tun \ -v /dev/hugepages:/dev/hugepages \ dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ --vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1 By the way, it's not necessary to run in a container. Signed-off-by: Huawei Xie Signed-off-by: Jianfeng Tan Jianfeng Tan (4): mem: add --single-file to create single mem-backed file mem: add API to obstain memory-backed file info virtio/vdev: add ways to interact with vhost virtio/vdev: add a new vdev named eth_cvio config/common_linuxapp | 5 + drivers/net/virtio/Makefile | 4 + drivers/net/virtio/vhost.c | 734 +++++++++++++++++++++++++++++ drivers/net/virtio/vhost.h | 192 ++++++++ drivers/net/virtio/virtio_ethdev.c | 338 ++++++++++--- drivers/net/virtio/virtio_ethdev.h | 4 + drivers/net/virtio/virtio_pci.h | 52 +- drivers/net/virtio/virtio_rxtx.c | 11 +- drivers/net/virtio/virtio_rxtx_simple.c | 14 +- drivers/net/virtio/virtqueue.h | 13 +- lib/librte_eal/common/eal_common_options.c | 17 + lib/librte_eal/common/eal_internal_cfg.h | 1 + lib/librte_eal/common/eal_options.h | 2 + lib/librte_eal/common/include/rte_memory.h | 16 + lib/librte_eal/linuxapp/eal/eal_memory.c | 82 +++- 15 files changed, 1392 insertions(+), 93 deletions(-) create mode 100644 drivers/net/virtio/vhost.c create mode 100644 drivers/net/virtio/vhost.h -- 2.1.4