From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 2B23C2BE3 for ; Thu, 24 Mar 2016 14:46:00 +0100 (CET) Received: from [107.15.76.160] (helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1aj5a1-0004wo-Nx; Thu, 24 Mar 2016 09:45:52 -0400 Date: Thu, 24 Mar 2016 09:45:40 -0400 From: Neil Horman To: "Tan, Jianfeng" Cc: Neil Horman , dev@dpdk.org Message-ID: <20160324134540.GA19236@hmsreliant.think-freely.org> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> <20160323191743.GB13829@hmsreliant.think-freely.org> <56F35ABA.40403@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56F35ABA.40403@intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Score: -1.0 (-) X-Spam-Status: No Subject: Re: [dpdk-dev] [PATCH v2 0/5] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Mar 2016 13:46:00 -0000 On Thu, Mar 24, 2016 at 11:10:50AM +0800, Tan, Jianfeng wrote: > Hi Neil, > > On 3/24/2016 3:17 AM, Neil Horman wrote: > >On Fri, Feb 05, 2016 at 07:20:23PM +0800, Jianfeng Tan wrote: > >>v1->v2: > >> - Rebase on the patchset of virtio 1.0 support. > >> - Fix cannot create non-hugepage memory. > >> - Fix wrong size of memory region when "single-file" is used. > >> - Fix setting of offset in virtqueue to use virtual address. > >> - Fix setting TUNSETVNETHDRSZ in vhost-user's branch. > >> - Add mac option to specify the mac address of this virtual device. > >> - Update doc. > >> > >>This patchset is to provide high performance networking interface (virtio) > >>for container-based DPDK applications. The way of starting DPDK apps in > >>containers with ownership of NIC devices exclusively is beyond the scope. > >>The basic idea here is to present a new virtual device (named eth_cvio), > >>which can be discovered and initialized in container-based DPDK apps using > >>rte_eal_init(). To minimize the change, we reuse already-existing virtio > >>frontend driver code (driver/net/virtio/). > >>Compared to QEMU/VM case, virtio device framework (translates I/O port r/w > >>operations into unix socket/cuse protocol, which is originally provided in > >>QEMU), is integrated in virtio frontend driver. So this converged driver > >>actually plays the role of original frontend driver and the role of QEMU > >>device framework. > >>The major difference lies in how to calculate relative address for vhost. > >>The principle of virtio is that: based on one or multiple shared memory > >>segments, vhost maintains a reference system with the base addresses and > >>length for each segment so that an address from VM comes (usually GPA, > >>Guest Physical Address) can be translated into vhost-recognizable address > >>(named VVA, Vhost Virtual Address). To decrease the overhead of address > >>translation, we should maintain as few segments as possible. In VM's case, > >>GPA is always locally continuous. In container's case, CVA (Container > >>Virtual Address) can be used. Specifically: > >>a. when set_base_addr, CVA address is used; > >>b. when preparing RX's descriptors, CVA address is used; > >>c. when transmitting packets, CVA is filled in TX's descriptors; > >>d. in TX and CQ's header, CVA is used. > >>How to share memory? In VM's case, qemu always shares all physical layout > >>to backend. But it's not feasible for a container, as a process, to share > >>all virtual memory regions to backend. So only specified virtual memory > >>regions (with type of shared) are sent to backend. It's a limitation that > >>only addresses in these areas can be used to transmit or receive packets. > >> > >>Known issues > >> > >>a. When used with vhost-net, root privilege is required to create tap > >>device inside. > >>b. Control queue and multi-queue are not supported yet. > >>c. When --single-file option is used, socket_id of the memory may be > >>wrong. (Use "numactl -N x -m x" to work around this for now) > >>How to use? > >> > >>a. Apply this patchset. > >> > >>b. To compile container apps: > >>$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > >>$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > >>$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > >>$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > >> > >>c. To build a docker image using Dockerfile below. > >>$: cat ./Dockerfile > >>FROM ubuntu:latest > >>WORKDIR /usr/src/dpdk > >>COPY . /usr/src/dpdk > >>ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/" > >>$: docker build -t dpdk-app-l2fwd . > >> > >>d. Used with vhost-user > >>$: ./examples/vhost/build/vhost-switch -c 3 -n 4 \ > >> --socket-mem 1024,1024 -- -p 0x1 --stats 1 > >>$: docker run -i -t -v :/var/run/usvhost \ > >> -v /dev/hugepages:/dev/hugepages \ > >> dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > >> --vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1 > >> > >>f. Used with vhost-net > >>$: modprobe vhost > >>$: modprobe vhost-net > >>$: docker run -i -t --privileged \ > >> -v /dev/vhost-net:/dev/vhost-net \ > >> -v /dev/net/tun:/dev/net/tun \ > >> -v /dev/hugepages:/dev/hugepages \ > >> dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > >> --vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1 > >> > >>By the way, it's not necessary to run in a container. > >> > >>Signed-off-by: Huawei Xie > >>Signed-off-by: Jianfeng Tan > >> > >>Jianfeng Tan (5): > >> mem: add --single-file to create single mem-backed file > >> mem: add API to obtain memory-backed file info > >> virtio/vdev: add embeded device emulation > >> virtio/vdev: add a new vdev named eth_cvio > >> docs: add release note for virtio for container > >> > >> config/common_linuxapp | 5 + > >> doc/guides/rel_notes/release_2_3.rst | 4 + > >> drivers/net/virtio/Makefile | 4 + > >> drivers/net/virtio/vhost.h | 194 +++++++ > >> drivers/net/virtio/vhost_embedded.c | 809 +++++++++++++++++++++++++++++ > >> drivers/net/virtio/virtio_ethdev.c | 329 +++++++++--- > >> drivers/net/virtio/virtio_ethdev.h | 6 +- > >> drivers/net/virtio/virtio_pci.h | 15 +- > >> drivers/net/virtio/virtio_rxtx.c | 6 +- > >> drivers/net/virtio/virtio_rxtx_simple.c | 13 +- > >> drivers/net/virtio/virtqueue.h | 15 +- > >> lib/librte_eal/common/eal_common_options.c | 17 + > >> lib/librte_eal/common/eal_internal_cfg.h | 1 + > >> lib/librte_eal/common/eal_options.h | 2 + > >> lib/librte_eal/common/include/rte_memory.h | 16 + > >> lib/librte_eal/linuxapp/eal/eal.c | 4 +- > >> lib/librte_eal/linuxapp/eal/eal_memory.c | 88 +++- > >> 17 files changed, 1435 insertions(+), 93 deletions(-) > >> create mode 100644 drivers/net/virtio/vhost.h > >> create mode 100644 drivers/net/virtio/vhost_embedded.c > >> > >>-- > >>2.1.4 > >> > >So, first off, apologies for being so late to review this patch, its been on my > >todo list forever, and I've just not gotten to it. > > > >I've taken a cursory look at the code, and I can't find anything glaringly wrong > >with it. > > Thanks very much for reviewing this series. > > > > >That said, I'm a bit confused about the overall purpose of this PMD. I've read > >the description several times now, and I _think_ I understand the purpose and > >construction of the PMD. Please correct me if this is not the (admittedly very > >generalized) overview: > > > >1) You've created a vdev PMD that is generally named eth_cvio%n, which serves as > >a virtual NIC suitable for use in a containerized space > > > >2) The PMD in (1) establishes a connection to the host via the vhost backend > >(which is either a socket or a character device), which it uses to forward data > >from the containerized dpdk application > > The socket or the character device is used just for control plane messages > to setting up the datapath. The data does not go through the socket or the > character device. > > > > >3) The system hosting the containerized dpdk application ties the other end of > >the tun/tap interface established in (2) to some other forwarding mechanism > >(ostensibly a host based dpdk forwarder) to send the frame out on the physical > >wire. > > There are two kinds of vhost backend: > (1) vhost-user, no need to leverage a tun/tap. the cvio PMD connects to the > backend socket, and communicate memory region information with the > vhost-user backend (the backend is another DPDK application using vhost PMD > by Tetsuya, or using vhost library like vhost example). > (2) vhost-net, here we need a tun/tap. When we open the /dev/vhost-net char > device, and some ioctl on it, it just starts a kthread (backend). We need an > interface (tun/tap) as an agent to blend into kernel networking, so that the > kthread knows where to send those packets (sent by frontend), and where to > receive packets to send to frontend. > > To be honest, vhost-user is the preferred way to achieve high performance. > As far as vhost-net is concerned, it goes through a kernel network stack, > which is the performance bottleneck. > Sure, that makes sense. So in the vhost-user case, we just read/write to a shared memory region? I.e. no user/kernel space transition for the nominal data path? If thats the case, than thats the piece I'm missing Neil