From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by dpdk.org (Postfix) with ESMTP id 199A02952 for ; Wed, 13 Apr 2016 18:14:45 +0200 (CEST) Received: by mail-wm0-f50.google.com with SMTP id v188so183439826wme.1 for ; Wed, 13 Apr 2016 09:14:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:organization:user-agent :in-reply-to:references:mime-version:content-transfer-encoding; bh=89lCK0iEgPPjgnaOhBQvAp2fJZXib1uK8vcAhGnNJ3s=; b=NF+IwGFZk/5ln9pTHGq3vpexEKhrq0u+wWPC2tXMEhpT8fI1GwvXzgNnnyB90CIL5k eV0bXoIeNsPptJ8ukGoF+dyIiTu5zK0Uzd9o4/vc++wt1XS9zroWKoM5fNFRInjHpbmW vAToa1J++qj+++uIn5OT1HT0U5c0Tyk4lwnX/eSmBbnuG/RtMo35temGtBgLwLI62lIT SZKkIYo/J0OhtUkWAfXV3day9UpPjcKqSmiRy3MxgXUFkSPDD4LeliBE7zkYfxYfp0QP wtn90gwto9Yhj4fShiBdI38CbRsbQwAuHNjfdFRQ+grTLUd6ZK0AbMONWz/WlYscGiYL TDcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:organization :user-agent:in-reply-to:references:mime-version :content-transfer-encoding; bh=89lCK0iEgPPjgnaOhBQvAp2fJZXib1uK8vcAhGnNJ3s=; b=IS6OWFhzTuADTHV0gSD4nnQtl4T45OZ/mg54muS7we+n6uJAzU2eFhI6cGdeXrYLy2 i66iBRGN8vykNISpCwm1NLzPNsKmfxMk+t9GS0Aj4Do8y0ZktM0UtRP4dBKyi4zZWwMZ VdpBboZMr1sNz4lhCgO+fe/iQ59/sIvEKRjTWakUPSoujsH0wq+pvp3G/F2BkiZdasqh Xt83Uzen/AcbE1gNlFjmqJ08Q3ThFp1Pp8h+g4D8xecncg+K0qxTMyJirFRm9xR0r1Uu OphZrQLNj6/7reOxb0L0WgtGfuKa5Yqi34oUgga/f7sdRiWFJ1b/NNnHpfpdUrJnrm7y 5Wew== X-Gm-Message-State: AD7BkJLoNVAe5yxSDuqeYlGsUg0nZhVS3YFoubBZe3dDhGDzYFyUlI0nsjiNyfBvJA5Jy6yG X-Received: by 10.28.97.197 with SMTP id v188mr32989047wmb.67.1460564084840; Wed, 13 Apr 2016 09:14:44 -0700 (PDT) Received: from xps13.localnet (245.111.75.86.rev.sfr.net. [86.75.111.245]) by smtp.gmail.com with ESMTPSA id u16sm2009782wmd.5.2016.04.13.09.14.42 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 13 Apr 2016 09:14:43 -0700 (PDT) From: Thomas Monjalon To: Jianfeng Tan Cc: dev@dpdk.org, nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, ann.zhuangyanying@huawei.com Date: Wed, 13 Apr 2016 18:14:41 +0200 Message-ID: <1642018.IWC2Tt5SYA@xps13> Organization: 6WIND User-Agent: KMail/4.14.10 (Linux/4.1.6-1-ARCH; KDE/4.14.11; x86_64; ; ) In-Reply-To: <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v2 0/5] virtio support for container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Apr 2016 16:14:45 -0000 Hi Jianfeng, Thanks for raising the container issues and proposing some solutions. General comments below. 2016-02-05 19:20, Jianfeng Tan: > This patchset is to provide high performance networking interface (virtio) > for container-based DPDK applications. The way of starting DPDK apps in > containers with ownership of NIC devices exclusively is beyond the scope. > The basic idea here is to present a new virtual device (named eth_cvio), > which can be discovered and initialized in container-based DPDK apps using > rte_eal_init(). To minimize the change, we reuse already-existing virtio > frontend driver code (driver/net/virtio/). > > Compared to QEMU/VM case, virtio device framework (translates I/O port r/w > operations into unix socket/cuse protocol, which is originally provided in > QEMU), is integrated in virtio frontend driver. So this converged driver > actually plays the role of original frontend driver and the role of QEMU > device framework. > > The major difference lies in how to calculate relative address for vhost. > The principle of virtio is that: based on one or multiple shared memory > segments, vhost maintains a reference system with the base addresses and > length for each segment so that an address from VM comes (usually GPA, > Guest Physical Address) can be translated into vhost-recognizable address > (named VVA, Vhost Virtual Address). To decrease the overhead of address > translation, we should maintain as few segments as possible. In VM's case, > GPA is always locally continuous. In container's case, CVA (Container > Virtual Address) can be used. Specifically: > a. when set_base_addr, CVA address is used; > b. when preparing RX's descriptors, CVA address is used; > c. when transmitting packets, CVA is filled in TX's descriptors; > d. in TX and CQ's header, CVA is used. > > How to share memory? In VM's case, qemu always shares all physical layout > to backend. But it's not feasible for a container, as a process, to share > all virtual memory regions to backend. So only specified virtual memory > regions (with type of shared) are sent to backend. It's a limitation that > only addresses in these areas can be used to transmit or receive packets. > > Known issues > > a. When used with vhost-net, root privilege is required to create tap > device inside. > b. Control queue and multi-queue are not supported yet. > c. When --single-file option is used, socket_id of the memory may be > wrong. (Use "numactl -N x -m x" to work around this for now) There are 2 different topics in this patchset: 1/ How to provide networking in containers 2/ How to provide memory in containers 1/ You have decided to use the virtio spec to bridge the host with its containers. But there is no virtio device in a container and no vhost interface in the host (except the kernel one). So you are extending virtio to work as a vdev inside the container. Could you explain what is the datapath between virtio and the host app? Does it need to use a fake device from Qemu as Tetsuya has done? Do you think there can be some alternatives to vhost/virtio in containers? 2/ The memory management is already a mess and it's going worst. I think we need to think the requirements first and then write a proper implementation to cover every identified needs. I have started a new thread to cover this part: http://thread.gmane.org/gmane.comp.networking.dpdk.devel/37445