From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 08EC193A4 for ; Wed, 27 Jan 2016 10:39:21 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 27 Jan 2016 01:39:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,354,1449561600"; d="scan'208";a="902046039" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga002.fm.intel.com with ESMTP; 27 Jan 2016 01:39:07 -0800 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 27 Jan 2016 01:39:07 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 27 Jan 2016 01:39:07 -0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.215]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.209]) with mapi id 14.03.0248.002; Wed, 27 Jan 2016 17:39:05 +0800 From: "Xie, Huawei" To: Tetsuya Mukawa , "dev@dpdk.org" , "yuanhan.liu@linux.intel.com" , "Tan, Jianfeng" Thread-Topic: [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment Thread-Index: AdFU7PZnAunQ5LVPT++h9TW0MGQ/9A== Date: Wed, 27 Jan 2016 09:39:04 +0000 Message-ID: References: <1453108389-21006-2-git-send-email-mukawa@igel.co.jp> <1453374478-30996-6-git-send-email-mukawa@igel.co.jp> <56A2065A.9020207@igel.co.jp> <56A6E0D8.9010005@igel.co.jp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.4.80] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 5/5] virtio: Extend virtio-net PMD to support container environment X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jan 2016 09:39:22 -0000 On 1/26/2016 10:58 AM, Tetsuya Mukawa wrote:=0A= > On 2016/01/25 19:15, Xie, Huawei wrote:=0A= >> On 1/22/2016 6:38 PM, Tetsuya Mukawa wrote:=0A= >>> On 2016/01/22 17:14, Xie, Huawei wrote:=0A= >>>> On 1/21/2016 7:09 PM, Tetsuya Mukawa wrote:=0A= >>>>> virtio: Extend virtio-net PMD to support container environment=0A= >>>>>=0A= >>>>> The patch adds a new virtio-net PMD configuration that allows the PMD= to=0A= >>>>> work on host as if the PMD is in VM.=0A= >>>>> Here is new configuration for virtio-net PMD.=0A= >>>>> - CONFIG_RTE_LIBRTE_VIRTIO_HOST_MODE=0A= >>>>> To use this mode, EAL needs physically contiguous memory. To allocate= =0A= >>>>> such memory, add "--shm" option to application command line.=0A= >>>>>=0A= >>>>> To prepare virtio-net device on host, the users need to invoke QEMU= =0A= >>>>> process in special qtest mode. This mode is mainly used for testing Q= EMU=0A= >>>>> devices from outer process. In this mode, no guest runs.=0A= >>>>> Here is QEMU command line.=0A= >>>>>=0A= >>>>> $ qemu-system-x86_64 \=0A= >>>>> -machine pc-i440fx-1.4,accel=3Dqtest \=0A= >>>>> -display none -qtest-log /dev/null \=0A= >>>>> -qtest unix:/tmp/socket,server \=0A= >>>>> -netdev type=3Dtap,script=3D/etc/qemu-ifup,id=3Dnet0,que= ues=3D1\=0A= >>>>> -device virtio-net-pci,netdev=3Dnet0,mq=3Don \=0A= >>>>> -chardev socket,id=3Dchr1,path=3D/tmp/ivshmem,server \= =0A= >>>>> -device ivshmem,size=3D1G,chardev=3Dchr1,vectors=3D1=0A= >>>>>=0A= >>>>> * QEMU process is needed per port.=0A= >>>> Does qtest supports hot plug virtio-net pci device, so that we could r= un=0A= >>>> one QEMU process in host, which provisions the virtio-net virtual=0A= >>>> devices for the container?=0A= >>> Theoretically, we can use hot plug in some cases.=0A= >>> But I guess we have 3 concerns here.=0A= >>>=0A= >>> 1. Security.=0A= >>> If we share QEMU process between multiple DPDK applications, this QEMU= =0A= >>> process will have all fds of the applications on different containers.= =0A= >>> In some cases, it will be security concern.=0A= >>> So, I guess we need to support current 1:1 configuration at least.=0A= >>>=0A= >>> 2. shared memory.=0A= >>> Currently, QEMU and DPDK application will map shared memory using same= =0A= >>> virtual address.=0A= >>> So if multiple DPDK application connects to one QEMU process, each DPDK= =0A= >>> application should have different address for shared memory. I guess=0A= >>> this will be a big limitation.=0A= >>>=0A= >>> 3. PCI bridge.=0A= >>> So far, QEMU has one PCI bridge, so we can connect almost 10 PCI device= s=0A= >>> to QEMU.=0A= >>> (I forget correct number, but it's almost 10, because some slots are=0A= >>> reserved by QEMU)=0A= >>> A DPDK application needs both virtio-net and ivshmem device, so I guess= =0A= >>> almost 5 DPDK applications can connect to one QEMU process, so far.=0A= >>> To add more PCI bridges solves this.=0A= >>> But we need to add a lot of implementation to support cascaded PCI=0A= >>> bridges and PCI devices.=0A= >>> (Also we need to solve above "2nd" concern.)=0A= >>>=0A= >>> Anyway, if we use virtio-net PMD and vhost-user PMD, QEMU process will= =0A= >>> not do anything after initialization.=0A= >>> (QEMU will try to read a qtest socket, then be stopped because there is= =0A= >>> no message after initialization)=0A= >>> So I guess we can ignore overhead of these QEMU processes.=0A= >>> If someone cannot ignore it, I guess this is the one of cases that it's= =0A= >>> nice to use your light weight container implementation.=0A= >> Thanks for the explanation, and also in your opinion where is the best= =0A= >> place to run the QEMU instance? If we run QEMU instances in host, for=0A= >> vhost-kernel support, we could get rid of the root privilege issue.=0A= > Do you mean below?=0A= > If we deploy QEMU instance on host, we can start a container without the= =0A= > root privilege.=0A= > (But on host, still QEMU instance needs the privilege to access to=0A= > vhost-kernel)=0A= =0A= There is no issue running QEMU instance with root privilege on host, but=0A= i think it is not acceptable granting the container root privilege.=0A= =0A= >=0A= > If so, I agree to deploy QEMU instance on host or other privileged=0A= > container will be nice.=0A= > In the case of vhost-user, to deploy on host or non-privileged container= =0A= > will be good.=0A= >=0A= >> Another issue is do you plan to support multiple virtio devices in=0A= >> container? Currently i find the code assuming only one virtio-net device= =0A= >> in QEMU, right?=0A= > Yes, so far, 1 port needs 1 QEMU instance.=0A= > So if you need multiple virtio devices, you need to invoke multiple QEMU= =0A= > instances.=0A= >=0A= > Do you want to deploy 1 QEMU instance for each DPDK application, even if= =0A= > the application has multiple virtio-net ports?=0A= >=0A= > So far, I am not sure whether we need it, because this type of DPDK=0A= > application will need only one port in most cases.=0A= > But if you need this, yes, I can implement using QEMU PCI hotplug feature= .=0A= > (But probably we can only attach almost 10 ports. This will be limitation= .)=0A= =0A= I am OK with supporting one virtio device for the first version.=0A= =0A= >=0A= >> Btw, i have read most of your qtest code. No obvious issues found so far= =0A= >> but quite a couple of nits. You must have spent a lot of time on this.= =0A= >> It is great work!=0A= > I appreciate your reviewing!=0A= >=0A= > BTW, my container implementation needed a QEMU patch in the case of=0A= > vhost-user.=0A= > But the patch has been merged in upstream QEMU, so we don't have this=0A= > limitation any more.=0A= =0A= Great, better put the QEMU dependency information in the commit message=0A= >=0A= > Thanks,=0A= > Tetsuya=0A= >=0A= =0A=