DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tetsuya Mukawa <mukawa@igel.co.jp>
To: "Xie, Huawei" <huawei.xie@intel.com>,
	 Zhuangyanying <ann.zhuangyanying@huawei.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "nakajima.yoshihiro@lab.ntt.co.jp"
	<nakajima.yoshihiro@lab.ntt.co.jp>,
	"zhbzg@huawei.com" <zhbzg@huawei.com>,
	"gaoxiaoqiu@huawei.com" <gaoxiaoqiu@huawei.com>,
	"oscar.zhangbo@huawei.com" <oscar.zhangbo@huawei.com>,
	"zhoujingbin@huawei.com" <zhoujingbin@huawei.com>,
	"guohongzhen@huawei.com" <guohongzhen@huawei.com>
Subject: Re: [dpdk-dev] vhost compliant virtio based networking interface in container
Date: Tue, 25 Aug 2015 11:58:41 +0900	[thread overview]
Message-ID: <55DBD9E1.3050609@igel.co.jp> (raw)
In-Reply-To: <C37D651A908B024F974696C65296B57B2BDA19E3@SHSMSX101.ccr.corp.intel.com>

Hi Xie and Yanping,


May I ask you some questions?
It seems we are also developing an almost same one.

On 2015/08/20 19:14, Xie, Huawei wrote:
> Added dev@dpdk.org
>
> On 8/20/2015 6:04 PM, Xie, Huawei wrote:
>> Yanping:
>> I read your mail, seems what we did are quite similar. Here i wrote a
>> quick mail to describe our design. Let me know if it is the same thing.
>>
>> Problem Statement:
>> We don't have a high performance networking interface in container for
>> NFV. Current veth pair based interface couldn't be easily accelerated.
>>
>> The key components involved:
>>     1.    DPDK based virtio PMD driver in container.
>>     2.    device simulation framework in container.
>>     3.    dpdk(or kernel) vhost running in host.
>>
>> How virtio is created?
>> A:  There is no "real" virtio-pci device in container environment.
>> 1). Host maintains pools of memories, and shares memory to container.
>> This could be accomplished through host share a huge page file to container.
>> 2). Containers creates virtio rings based on the shared memory.
>> 3). Container creates mbuf memory pools on the shared memory.
>> 4) Container send the memory and vring information to vhost through
>> vhost message. This could be done either through ioctl call or vhost
>> user message.
>>
>> How vhost message is sent?
>> A: There are two alternative ways to do this.
>> 1) The customized virtio PMD is responsible for all the vring creation,
>> and vhost message sending.

Above is our approach so far.
It seems Yanping also takes this kind of approach.
We are using vhost-user functionality instead of using the vhost-net
kernel module.
Probably this is the difference between Yanping and us.

BTW, we are going to submit a vhost PMD for DPDK-2.2.
This PMD is implemented on librte_vhost.
It allows DPDK application to handle a vhost-user(cuse) backend as a
normal NIC port.
This PMD should work with both Xie and Yanping approach.
(In the case of Yanping approach, we may need vhost-cuse)

>> 2) We could do this through a lightweight device simulation framework.
>>     The device simulation creates simple PCI bus. On the PCI bus,
>> virtio-net PCI devices are created. The device simulations provides
>> IOAPI for MMIO/IO access.

Does it mean you implemented a kernel module?
If so, do you still need vhost-cuse functionality to handle vhost
messages n userspace?

>>    2.1  virtio PMD configures the pseudo virtio device as how it does in
>> KVM guest enviroment.
>>    2.2  Rather than using io instruction, virtio PMD uses IOAPI for IO
>> operation on the virtio-net PCI device.
>>    2.3  The device simulation is responsible for device state machine
>> simulation.
>>    2.4   The device simulation is responsbile for talking to vhost.
>>      With this approach, we could minimize the virtio PMD modifications.
>> The virtio PMD is like configuring a real virtio-net PCI device.
>>
>> Memory mapping?
>> A: QEMU could access the whole guest memory in KVM enviroment. We need
>> to fill the gap.
>> container maps the shared memory to container's virtual address space
>> and host maps it to host's virtual address space. There is a fixed
>> offset mapping.
>> Container creates shared vring based on the memory. Container also
>> creates mbuf memory pool based on the shared memroy.
>> In VHOST_SET_MEMORY_TABLE message, we send the memory mapping
>> information for the shared memory. As we require mbuf pool created on
>> the shared memory, and buffers are allcoated from the mbuf pools, dpdk
>> vhost could translate the GPA in vring desc to host virtual.
>>
>>
>> GPA or CVA in vring desc?
>> To ease the memory translation, rather than using GPA, here we use
>> CVA(container virtual address). This the tricky thing here.
>> 1) virtio PMD writes vring's VFN rather than PFN to PFN register through
>> IOAPI.
>> 2) device simulation framework will use VFN as PFN.
>> 3) device simulation sends SET_VRING_ADDR with CVA.
>> 4) virtio PMD fills vring desc with CVA of the mbuf data pointer rather
>> than GPA.
>> So when host sees the CVA, it could translates it to HVA(host virtual
>> address).
>>
>> Worth to note:
>> The virtio interface in container follows the vhost message format, and
>> is compliant with dpdk vhost implmentation, i.e, no dpdk vhost
>> modification is needed.
>> vHost isn't aware whether the incoming virtio comes from KVM guest or
>> container.
>>
>> The pretty much covers the high level design. There are quite some low
>> level issues. For example, 32bit PFN is enough for KVM guest, since we
>> use 64bit VFN(virtual page frame number),  trick is done here through a
>> special IOAPI.

In addition above, we might consider "namespace" kernel functionality.
Technically, it would not be a big problem, but related with security.
So it would be nice to take account.

Regards,
Tetsuya

>> /huawei
>>
>>  
>>
>>
>>
>>
>>
>>

  reply	other threads:[~2015-08-25  2:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <C37D651A908B024F974696C65296B57B2BD9F976@SHSMSX101.ccr.corp.intel.com>
2015-08-20 10:14 ` Xie, Huawei
2015-08-25  2:58   ` Tetsuya Mukawa [this message]
2015-08-25  9:56     ` Xie, Huawei
2015-08-26  9:23       ` Tetsuya Mukawa
2015-09-07  5:54         ` Xie, Huawei
2015-09-08  4:44           ` Tetsuya Mukawa
2015-09-14  3:15             ` Xie, Huawei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DBD9E1.3050609@igel.co.jp \
    --to=mukawa@igel.co.jp \
    --cc=ann.zhuangyanying@huawei.com \
    --cc=dev@dpdk.org \
    --cc=gaoxiaoqiu@huawei.com \
    --cc=guohongzhen@huawei.com \
    --cc=huawei.xie@intel.com \
    --cc=nakajima.yoshihiro@lab.ntt.co.jp \
    --cc=oscar.zhangbo@huawei.com \
    --cc=zhbzg@huawei.com \
    --cc=zhoujingbin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).