From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by dpdk.org (Postfix) with ESMTP id 5FA54567A for ; Tue, 12 Jan 2016 06:36:29 +0100 (CET) Received: by mail-pa0-f50.google.com with SMTP id cy9so333724772pac.0 for ; Mon, 11 Jan 2016 21:36:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=igel-co-jp.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=ogsQlEo1KgaF0Qy8ZOUdiCsB73mZoPDqEyspv5Z2n6g=; b=UeV3a8C3piM1x3tvwMse2R5wR9gWDnriOAAjOMg4M5SrjxKamVE5Z0lygiCCHv99la b4JnaqqICMw8mwSWpfahMePGn3iArE+JRvOkcpsG+WtXF35UilrowfqL91pjTcipQQGM +z3YzsiWS7H8TUIKdjZ2khF3b6WkID3DX2qALwSXCjRsnq7/Zw1MMHNBI+vxWpHFmLwe KTQu41hCyjVrrE3E9QdY632GqENxGJQjSRvOFWa7BSN6cvZ15FQjeMhk44NnVp+sKA3x fGJmdadPGrnJV1XYGQDnSbiVKMQNGmm91R/zGqp1S1C0PEeTE/GHUHagNd4U/2VpMWnb ihaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=ogsQlEo1KgaF0Qy8ZOUdiCsB73mZoPDqEyspv5Z2n6g=; b=czMCIghRIv/1fOo3gdJnUKOK12GmvGahuu2Fe7jhOxm90CzASQh1UACrlhPl146AQW GTO2GNIqtjR/nVERwAiKV7h76R0v6b93jsWAMwMhyqzwwJfeGDWgATVrEbd5acKN8/00 2MGszk9VPe3Y5GDwbhwmXF7DjWwP2isJ8IYeRPPm7JJWdTxDAH4EqycThjG20yBX4kpf YLBhr3Cqe21ZWgyEUaTF3Plwf2aCPWLZSuM+B15xgEP88oHIBOYVLPCgtQcyjaqK87Bb j7aYNG5GSyPoZuLzOoY97wpyf3Nhj/DqLqp4gip2nvXwasBWiCzctZuy+aVgi2iVCH3r axuA== X-Gm-Message-State: ALoCoQkkzxlnQtaz/qQPWmhl/qoRyeC1UUNGjVQtpbGRPI6XqKm77KhQn9FDiEODE+DW9SZxxskt0P4yntkrnzPE9xO931HyCA== X-Received: by 10.67.5.69 with SMTP id ck5mr186154282pad.125.1452576988715; Mon, 11 Jan 2016 21:36:28 -0800 (PST) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by smtp.googlemail.com with ESMTPSA id e82sm27038620pfb.76.2016.01.11.21.36.25 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 11 Jan 2016 21:36:27 -0800 (PST) To: Jianfeng Tan , dev@dpdk.org, huawei.xie@intel.com References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1452426182-86851-1-git-send-email-jianfeng.tan@intel.com> From: Tetsuya Mukawa X-Enigmail-Draft-Status: N1110 Message-ID: <569490D9.10803@igel.co.jp> Date: Tue, 12 Jan 2016 14:36:25 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <1452426182-86851-1-git-send-email-jianfeng.tan@intel.com> Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Cc: nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, ann.zhuangyanying@huawei.com Subject: Re: [dpdk-dev] [PATCH 0/4] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jan 2016 05:36:29 -0000 On 2016/01/10 20:42, Jianfeng Tan wrote: > This patchset is to provide high performance networking interface (virtio) > for container-based DPDK applications. The way of starting DPDK apps in > containers with ownership of NIC devices exclusively is beyond the scope. > The basic idea here is to present a new virtual device (named eth_cvio), > which can be discovered and initialized in container-based DPDK apps using > rte_eal_init(). To minimize the change, we reuse already-existing virtio > frontend driver code (driver/net/virtio/). > > Compared to QEMU/VM case, virtio device framework (translates I/O port r/w > operations into unix socket/cuse protocol, which is originally provided in > QEMU), is integrated in virtio frontend driver. So this converged driver > actually plays the role of original frontend driver and the role of QEMU > device framework. > > The major difference lies in how to calculate relative address for vhost. > The principle of virtio is that: based on one or multiple shared memory > segments, vhost maintains a reference system with the base addresses and > length for each segment so that an address from VM comes (usually GPA, > Guest Physical Address) can be translated into vhost-recognizable address > (named VVA, Vhost Virtual Address). To decrease the overhead of address > translation, we should maintain as few segments as possible. In VM's case, > GPA is always locally continuous. In container's case, CVA (Container > Virtual Address) can be used. Specifically: > a. when set_base_addr, CVA address is used; > b. when preparing RX's descriptors, CVA address is used; > c. when transmitting packets, CVA is filled in TX's descriptors; > d. in TX and CQ's header, CVA is used. > > How to share memory? In VM's case, qemu always shares all physical layout > to backend. But it's not feasible for a container, as a process, to share > all virtual memory regions to backend. So only specified virtual memory > regions (with type of shared) are sent to backend. It's a limitation that > only addresses in these areas can be used to transmit or receive packets. > > Known issues > > a. When used with vhost-net, root privilege is required to create tap > device inside. > b. Control queue and multi-queue are not supported yet. > c. When --single-file option is used, socket_id of the memory may be > wrong. (Use "numactl -N x -m x" to work around this for now) > > How to use? > > a. Apply this patchset. > > b. To compile container apps: > $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > > c. To build a docker image using Dockerfile below. > $: cat ./Dockerfile > FROM ubuntu:latest > WORKDIR /usr/src/dpdk > COPY . /usr/src/dpdk > ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/" > $: docker build -t dpdk-app-l2fwd . > > d. Used with vhost-user > $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \ > --socket-mem 1024,1024 -- -p 0x1 --stats 1 > $: docker run -i -t -v :/var/run/usvhost \ > -v /dev/hugepages:/dev/hugepages \ > dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > --vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1 > > f. Used with vhost-net > $: modprobe vhost > $: modprobe vhost-net > $: docker run -i -t --privileged \ > -v /dev/vhost-net:/dev/vhost-net \ > -v /dev/net/tun:/dev/net/tun \ > -v /dev/hugepages:/dev/hugepages \ > dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > --vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1 > > By the way, it's not necessary to run in a container. > > Signed-off-by: Huawei Xie > Signed-off-by: Jianfeng Tan > > Jianfeng Tan (4): > mem: add --single-file to create single mem-backed file > mem: add API to obstain memory-backed file info > virtio/vdev: add ways to interact with vhost > virtio/vdev: add a new vdev named eth_cvio > > config/common_linuxapp | 5 + > drivers/net/virtio/Makefile | 4 + > drivers/net/virtio/vhost.c | 734 +++++++++++++++++++++++++++++ > drivers/net/virtio/vhost.h | 192 ++++++++ > drivers/net/virtio/virtio_ethdev.c | 338 ++++++++++--- > drivers/net/virtio/virtio_ethdev.h | 4 + > drivers/net/virtio/virtio_pci.h | 52 +- > drivers/net/virtio/virtio_rxtx.c | 11 +- > drivers/net/virtio/virtio_rxtx_simple.c | 14 +- > drivers/net/virtio/virtqueue.h | 13 +- > lib/librte_eal/common/eal_common_options.c | 17 + > lib/librte_eal/common/eal_internal_cfg.h | 1 + > lib/librte_eal/common/eal_options.h | 2 + > lib/librte_eal/common/include/rte_memory.h | 16 + > lib/librte_eal/linuxapp/eal/eal_memory.c | 82 +++- > 15 files changed, 1392 insertions(+), 93 deletions(-) > create mode 100644 drivers/net/virtio/vhost.c > create mode 100644 drivers/net/virtio/vhost.h > Hi Jianfeng and Xie, I guess my implementation and yours have a lot of common code, so I will try to rebase my patch on yours. BTW, one thing I need to change your memory allocation way is that mmaped address should be under 44bit(32 + PAGE_SHIFT) to work with my patch. This is because VIRTIO_PCI_QUEUE_PFN register only accepts such address. (I may need to add one more EAL parameter like "--mmap-under
") Thanks, Tetsuya