From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 524848DA1 for ; Fri, 6 Nov 2015 02:31:33 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 05 Nov 2015 17:31:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,249,1444719600"; d="scan'208";a="812931194" Received: from tan-s2600cw.sh.intel.com ([10.239.128.225]) by orsmga001.jf.intel.com with ESMTP; 05 Nov 2015 17:31:29 -0800 From: Jianfeng Tan To: dev@dpdk.org Date: Fri, 6 Nov 2015 02:31:11 +0800 Message-Id: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> X-Mailer: git-send-email 2.1.4 Cc: nakajima.yoshihiro@lab.ntt.co.jp, zhbzg@huawei.com, mst@redhat.com, gaoxiaoqiu@huawei.com, oscar.zhangbo@huawei.com, ann.zhuangyanying@huawei.com, zhoujingbin@huawei.com, guohongzhen@huawei.com Subject: [dpdk-dev] [RFC 0/5] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 01:31:33 -0000 This patchset only acts as a PoC to request the community for comments. This patchset is to provide high performance networking interface (virtio) for container-based DPDK applications. The way of starting DPDK applications in containers with ownership of NIC devices exclusively is beyond the scope. The basic idea here is to present a new virtual device (named eth_cvio), which can be discovered and initialized in container-based DPDK applications rte_eal_init(). To minimize the change, we reuse already-existing virtio frontend driver code (driver/net/virtio/). Compared to QEMU/VM case, virtio device framework (translates I/O port r/w operations into unix socket/cuse protocol, which is originally provided in QEMU), is integrated in virtio frontend driver. Aka, this new converged driver actually plays the role of original frontend driver and the role of QEMU device framework. The biggest difference here lies in how to calculate relative address for backend. The principle of virtio is that: based on one or multiple shared memory segments, vhost maintains a reference system with the base addresses and length of these segments so that an address from VM comes (usually GPA, Guest Physical Address), vhost can translate it into self-recognizable address (aka VVA, Vhost Virtual Address). To decrease the overhead of address translation, we should maintain as few segments as better. In the context of virtual machines, GPA is always locally continuous. So it's a good choice. In container's case, CVA (Container Virtual Address) can be used. This means that: a. when set_base_addr, CVA address is used; b. when preparing RX's descriptors, CVA address is used; c. when transmitting packets, CVA is filled in TX's descriptors; d. in TX and CQ's header, CVA is used. How to share memory? In VM's case, qemu always shares all physical layout to backend. But it's not feasible for a container, as a process, to share all virtual memory regions to backend. So only specified virtual memory regions (type is shared) are sent to backend. It leads to a limitation that only addresses in these areas can be used to transmit or receive packets. For now, the shared memory is created in /dev/shm using shm_open() in the memory initialization process. How to use? a. Apply the patch of virtio for container. We need two copies of patched code (referred as dpdk-app/ and dpdk-vhost/) b. To compile container apps: $: cd dpdk-app $: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=y") $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc c. To build a docker image using Dockerfile below. $: cat ./Dockerfile FROM ubuntu:latest WORKDIR /usr/src/dpdk COPY . /usr/src/dpdk CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4", "--no-huge", "--no-pci", "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost", "--", "-p", "0x1"] $: docker build -t dpdk-app-l2fwd . d. To compile vhost: $: cd dpdk-vhost $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc e. Start vhost-switch $: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -- -p 0x1 --stats 1 f. Start docker $: docker run -i -t -v :/var/run/usvhost dpdk-app-l2fwd Signed-off-by: Huawei Xie Signed-off-by: Jianfeng Tan Jianfeng Tan (5): virtio/container: add handler for ioport rd/wr virtio/container: add a new virtual device named eth_cvio virtio/container: unify desc->addr assignment virtio/container: adjust memory initialization process vhost/container: change mode of vhost listening socket config/common_linuxapp | 5 + drivers/net/virtio/Makefile | 4 + drivers/net/virtio/vhost-user.c | 433 +++++++++++++++++++++++++++ drivers/net/virtio/vhost-user.h | 137 +++++++++ drivers/net/virtio/virtio_ethdev.c | 319 +++++++++++++++----- drivers/net/virtio/virtio_ethdev.h | 16 + drivers/net/virtio/virtio_pci.h | 32 +- drivers/net/virtio/virtio_rxtx.c | 9 +- drivers/net/virtio/virtio_rxtx_simple.c | 9 +- drivers/net/virtio/virtqueue.h | 9 +- lib/librte_eal/common/include/rte_memory.h | 5 + lib/librte_eal/linuxapp/eal/eal_memory.c | 58 +++- lib/librte_mempool/rte_mempool.c | 16 +- lib/librte_vhost/vhost_user/vhost-net-user.c | 5 + 14 files changed, 967 insertions(+), 90 deletions(-) create mode 100644 drivers/net/virtio/vhost-user.c create mode 100644 drivers/net/virtio/vhost-user.h -- 2.1.4