From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) by dpdk.org (Postfix) with ESMTP id 9C53F5A38 for ; Tue, 25 Aug 2015 04:58:45 +0200 (CEST) Received: by pdbfa8 with SMTP id fa8so60872380pdb.1 for ; Mon, 24 Aug 2015 19:58:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=+jaRycqG9ciyiuVAJ5gPwTPnR+7G6Daj++T/DaXiAyA=; b=TnKN2DAlBUBJUuW8tCqcASCZYeesrOLVC2wX/mqvtjAW/Lga48Zv9fhpI3WlT01R7K uXJ5M3S8kSAqFA/+UHdjb8UPR//mig/7lhs8zfk4JU6W+uZWVFJnbnmZCv1JTRVVPTch cX0sQTjS66gAAtmEnr4XJwGHBF4IK3Il8Vue2ta+9WAzMmD/OQhSgEiWqYkAdPDDoXj+ p2tPknB1gW5KbyNGravdM/4WUu2Vt8MNNP8dVegiyAiYMBz7Dvku8bIdg5nrXxA+YX5u +2kB+1/3Zo2Vo2Z0hr4O3+ez9EmjiXRFSwfsdaQopWmUpDwpsT/Ckucos6a2T23HK+EO MlMg== X-Gm-Message-State: ALoCoQl9a0qcpv8R8/F3TTG+q0A+hB0nkUViLaq6NPzZ2oqU5Blck43qyV1SG7eIMLFVd29V6iO2 X-Received: by 10.70.38.69 with SMTP id e5mr39623422pdk.31.1440471524740; Mon, 24 Aug 2015 19:58:44 -0700 (PDT) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by smtp.googlemail.com with ESMTPSA id d13sm8131332pdl.15.2015.08.24.19.58.41 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 19:58:44 -0700 (PDT) Message-ID: <55DBD9E1.3050609@igel.co.jp> Date: Tue, 25 Aug 2015 11:58:41 +0900 From: Tetsuya Mukawa User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Xie, Huawei" , Zhuangyanying , "dev@dpdk.org" References: In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , "zhbzg@huawei.com" , "gaoxiaoqiu@huawei.com" , "oscar.zhangbo@huawei.com" , "zhoujingbin@huawei.com" , "guohongzhen@huawei.com" Subject: Re: [dpdk-dev] vhost compliant virtio based networking interface in container X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 02:58:45 -0000 Hi Xie and Yanping, May I ask you some questions? It seems we are also developing an almost same one. On 2015/08/20 19:14, Xie, Huawei wrote: > Added dev@dpdk.org > > On 8/20/2015 6:04 PM, Xie, Huawei wrote: >> Yanping: >> I read your mail, seems what we did are quite similar. Here i wrote a >> quick mail to describe our design. Let me know if it is the same thing. >> >> Problem Statement: >> We don't have a high performance networking interface in container for >> NFV. Current veth pair based interface couldn't be easily accelerated. >> >> The key components involved: >> 1. DPDK based virtio PMD driver in container. >> 2. device simulation framework in container. >> 3. dpdk(or kernel) vhost running in host. >> >> How virtio is created? >> A: There is no "real" virtio-pci device in container environment. >> 1). Host maintains pools of memories, and shares memory to container. >> This could be accomplished through host share a huge page file to container. >> 2). Containers creates virtio rings based on the shared memory. >> 3). Container creates mbuf memory pools on the shared memory. >> 4) Container send the memory and vring information to vhost through >> vhost message. This could be done either through ioctl call or vhost >> user message. >> >> How vhost message is sent? >> A: There are two alternative ways to do this. >> 1) The customized virtio PMD is responsible for all the vring creation, >> and vhost message sending. Above is our approach so far. It seems Yanping also takes this kind of approach. We are using vhost-user functionality instead of using the vhost-net kernel module. Probably this is the difference between Yanping and us. BTW, we are going to submit a vhost PMD for DPDK-2.2. This PMD is implemented on librte_vhost. It allows DPDK application to handle a vhost-user(cuse) backend as a normal NIC port. This PMD should work with both Xie and Yanping approach. (In the case of Yanping approach, we may need vhost-cuse) >> 2) We could do this through a lightweight device simulation framework. >> The device simulation creates simple PCI bus. On the PCI bus, >> virtio-net PCI devices are created. The device simulations provides >> IOAPI for MMIO/IO access. Does it mean you implemented a kernel module? If so, do you still need vhost-cuse functionality to handle vhost messages n userspace? >> 2.1 virtio PMD configures the pseudo virtio device as how it does in >> KVM guest enviroment. >> 2.2 Rather than using io instruction, virtio PMD uses IOAPI for IO >> operation on the virtio-net PCI device. >> 2.3 The device simulation is responsible for device state machine >> simulation. >> 2.4 The device simulation is responsbile for talking to vhost. >> With this approach, we could minimize the virtio PMD modifications. >> The virtio PMD is like configuring a real virtio-net PCI device. >> >> Memory mapping? >> A: QEMU could access the whole guest memory in KVM enviroment. We need >> to fill the gap. >> container maps the shared memory to container's virtual address space >> and host maps it to host's virtual address space. There is a fixed >> offset mapping. >> Container creates shared vring based on the memory. Container also >> creates mbuf memory pool based on the shared memroy. >> In VHOST_SET_MEMORY_TABLE message, we send the memory mapping >> information for the shared memory. As we require mbuf pool created on >> the shared memory, and buffers are allcoated from the mbuf pools, dpdk >> vhost could translate the GPA in vring desc to host virtual. >> >> >> GPA or CVA in vring desc? >> To ease the memory translation, rather than using GPA, here we use >> CVA(container virtual address). This the tricky thing here. >> 1) virtio PMD writes vring's VFN rather than PFN to PFN register through >> IOAPI. >> 2) device simulation framework will use VFN as PFN. >> 3) device simulation sends SET_VRING_ADDR with CVA. >> 4) virtio PMD fills vring desc with CVA of the mbuf data pointer rather >> than GPA. >> So when host sees the CVA, it could translates it to HVA(host virtual >> address). >> >> Worth to note: >> The virtio interface in container follows the vhost message format, and >> is compliant with dpdk vhost implmentation, i.e, no dpdk vhost >> modification is needed. >> vHost isn't aware whether the incoming virtio comes from KVM guest or >> container. >> >> The pretty much covers the high level design. There are quite some low >> level issues. For example, 32bit PFN is enough for KVM guest, since we >> use 64bit VFN(virtual page frame number), trick is done here through a >> special IOAPI. In addition above, we might consider "namespace" kernel functionality. Technically, it would not be a big problem, but related with security. So it would be nice to take account. Regards, Tetsuya >> /huawei >> >> >> >> >> >> >> >>