From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by dpdk.org (Postfix) with ESMTP id BE126567E for ; Tue, 24 Nov 2015 04:53:09 +0100 (CET) Received: from 172.24.1.48 (EHLO SZXEMA412-HUB.china.huawei.com) ([172.24.1.48]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CWT92086; Tue, 24 Nov 2015 11:53:06 +0800 (CST) Received: from SZXEMA502-MBX.china.huawei.com ([169.254.3.230]) by SZXEMA412-HUB.china.huawei.com ([10.82.72.71]) with mapi id 14.03.0235.001; Tue, 24 Nov 2015 11:53:00 +0800 From: Zhuangyanying To: Jianfeng Tan , "dev@dpdk.org" Thread-Topic: [RFC 0/5] virtio support for container Thread-Index: AQHRGDLlgW++w4c2a0WI/Z/rSJv22J6qpLhg Date: Tue, 24 Nov 2015 03:53:00 +0000 Message-ID: References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> In-Reply-To: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.177.21.2] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020201.5653DF23.00B5, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=169.254.3.230, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 1940705ed36e013c8607676c0e0e3a94 Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , Zhbzg , "mst@redhat.com" , gaoxiaoqiu , "Zhangbo \(Oscar\)" , Zhoujingbin , Guohongzhen Subject: Re: [dpdk-dev] [RFC 0/5] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 03:53:11 -0000 > -----Original Message----- > From: Jianfeng Tan [mailto:jianfeng.tan@intel.com] > Sent: Friday, November 06, 2015 2:31 AM > To: dev@dpdk.org > Cc: mst@redhat.com; mukawa@igel.co.jp; nakajima.yoshihiro@lab.ntt.co.jp; > michael.qiu@intel.com; Guohongzhen; Zhoujingbin; Zhuangyanying; Zhangbo > (Oscar); gaoxiaoqiu; Zhbzg; huawei.xie@intel.com; Jianfeng Tan > Subject: [RFC 0/5] virtio support for container >=20 > This patchset only acts as a PoC to request the community for comments. >=20 > This patchset is to provide high performance networking interface > (virtio) for container-based DPDK applications. The way of starting DPDK > applications in containers with ownership of NIC devices exclusively is b= eyond > the scope. The basic idea here is to present a new virtual device (named > eth_cvio), which can be discovered and initialized in container-based DPD= K > applications rte_eal_init(). > To minimize the change, we reuse already-existing virtio frontend driver = code > (driver/net/virtio/). >=20 > Compared to QEMU/VM case, virtio device framework (translates I/O port r/= w > operations into unix socket/cuse protocol, which is originally provided i= n QEMU), > is integrated in virtio frontend driver. Aka, this new converged driver a= ctually > plays the role of original frontend driver and the role of QEMU device > framework. >=20 > The biggest difference here lies in how to calculate relative address for= backend. > The principle of virtio is that: based on one or multiple shared memory > segments, vhost maintains a reference system with the base addresses and > length of these segments so that an address from VM comes (usually GPA, > Guest Physical Address), vhost can translate it into self-recognizable ad= dress > (aka VVA, Vhost Virtual Address). To decrease the overhead of address > translation, we should maintain as few segments as better. In the context= of > virtual machines, GPA is always locally continuous. So it's a good choice= . In > container's case, CVA (Container Virtual Address) can be used. This means > that: > a. when set_base_addr, CVA address is used; b. when preparing RX's > descriptors, CVA address is used; c. when transmitting packets, CVA is fi= lled in > TX's descriptors; d. in TX and CQ's header, CVA is used. >=20 > How to share memory? In VM's case, qemu always shares all physical layout= to > backend. But it's not feasible for a container, as a process, to share al= l virtual > memory regions to backend. So only specified virtual memory regions (type= is > shared) are sent to backend. It leads to a limitation that only addresses= in > these areas can be used to transmit or receive packets. For now, the shar= ed > memory is created in /dev/shm using shm_open() in the memory initializati= on > process. >=20 > How to use? >=20 > a. Apply the patch of virtio for container. We need two copies of patched= code > (referred as dpdk-app/ and dpdk-vhost/) >=20 > b. To compile container apps: > $: cd dpdk-app > $: vim config/common_linuxapp (uncomment "CONFIG_RTE_VIRTIO_VDEV=3Dy") > $: make config RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc > $: make install RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc > $: make -C examples/l2fwd RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc >=20 > c. To build a docker image using Dockerfile below. > $: cat ./Dockerfile > FROM ubuntu:latest > WORKDIR /usr/src/dpdk > COPY . /usr/src/dpdk > CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0xc", "-n", "4", > "--no-huge", "--no-pci", > "--vdev=3Deth_cvio0,queue_num=3D256,rx=3D1,tx=3D1,cq=3D0,path=3D/var/run/= usvhost", > "--", "-p", "0x1"] > $: docker build -t dpdk-app-l2fwd . >=20 > d. To compile vhost: > $: cd dpdk-vhost > $: make config RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc > $: make install RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc > $: make -C examples/vhost RTE_SDK=3D`pwd` T=3Dx86_64-native-linuxapp-gcc >=20 > e. Start vhost-switch > $: ./examples/vhost/build/vhost-switch -c 3 -n 4 --socket-mem 1024,1024 -= - -p > 0x1 --stats 1 >=20 > f. Start docker > $: docker run -i -t -v :/var/run/usvhost > dpdk-app-l2fwd >=20 > Signed-off-by: Huawei Xie > Signed-off-by: Jianfeng Tan >=20 > Jianfeng Tan (5): > virtio/container: add handler for ioport rd/wr > virtio/container: add a new virtual device named eth_cvio > virtio/container: unify desc->addr assignment > virtio/container: adjust memory initialization process > vhost/container: change mode of vhost listening socket >=20 > config/common_linuxapp | 5 + > drivers/net/virtio/Makefile | 4 + > drivers/net/virtio/vhost-user.c | 433 > +++++++++++++++++++++++++++ > drivers/net/virtio/vhost-user.h | 137 +++++++++ > drivers/net/virtio/virtio_ethdev.c | 319 +++++++++++++++----- > drivers/net/virtio/virtio_ethdev.h | 16 + > drivers/net/virtio/virtio_pci.h | 32 +- > drivers/net/virtio/virtio_rxtx.c | 9 +- > drivers/net/virtio/virtio_rxtx_simple.c | 9 +- > drivers/net/virtio/virtqueue.h | 9 +- > lib/librte_eal/common/include/rte_memory.h | 5 + > lib/librte_eal/linuxapp/eal/eal_memory.c | 58 +++- > lib/librte_mempool/rte_mempool.c | 16 +- > lib/librte_vhost/vhost_user/vhost-net-user.c | 5 + > 14 files changed, 967 insertions(+), 90 deletions(-) create mode 100644 > drivers/net/virtio/vhost-user.c create mode 100644 > drivers/net/virtio/vhost-user.h >=20 > -- > 2.1.4 This patch arose a good idea to add an extra abstracted IO layer, which wo= uld make it simple to extend the function to the kernel mode switch(such as= OVS). That's great. But I have one question here:=20 it's the issue on VHOST_USER_SET_MEM_TABLE. you alloc memory from tmpfs= filesyste, just one fd, could used rte_memseg_info_get() to=20 directly get the memory topology, However, things change in kernel-space, = because mempool should be created on each container's hugetlbfs(rather than tmpfs), which is seperated from each other, at last,= considering of the ioctl's parameter.=20 My solution is as follows for your reference: /* reg =3D mem->regions; reg->guest_phys_addr =3D (__u64) ((struct virtqueue *)(dev->data->rx_queue= s[0]))->mpool->elt_va_start; reg->userspace_addr =3D reg->guest_phys_addr; reg->memory_size =3D ((struct virtqueue *)(dev->data->rx_queues[0]))->mpoo= l->elt_va_end - reg->guest_phys_addr; reg =3D mem->regions + 1; reg->guest_phys_addr =3D (__u64)(((struct virtqueue *)(dev->data->tx_queue= s[0]))->virtio_net_hdr_mem); reg->userspace_addr =3D reg->guest_phys_addr; reg->memory_size =3D vq_size * internals->vtnet_hdr_size; */ =20 But it's a little ugly, any better idea?