DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tetsuya Mukawa <mukawa@igel.co.jp>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "nakajima.yoshihiro@lab.ntt.co.jp"
	<nakajima.yoshihiro@lab.ntt.co.jp>,
	"mst@redhat.com" <mst@redhat.com>
Subject: Re: [dpdk-dev] [PATCH v1 0/2] Virtio-net PMD Extension to work on host
Date: Mon, 28 Dec 2015 20:06:19 +0900	[thread overview]
Message-ID: <568117AB.1080605@igel.co.jp> (raw)
In-Reply-To: <ED26CBA2FAD1BF48A8719AEF02201E36680E67@SHSMSX103.ccr.corp.intel.com>

On 2015/12/24 23:05, Tan, Jianfeng wrote:
> Hi Tetsuya,
>
> After several days' studying your patch, I have some questions as follows:
>
> 1. Is physically-contig memory really necessary?
> This is a too strong requirement IMHO. IVSHMEM doesn't require this in its original meaning. So how do you think of
> Huawei Xie's idea of using virtual address for address translation? (In addition, virtual address of mem_table could be
> different in application and QTest, but this can be addressed because SET_MEM_TABLE msg will be intercepted by
> QTest)

Hi Jianfeng,

Thanks for your suggestion.
Huawei's idea may solve contig-mem restriction.
Let me have time to check it more.

> 2. Is root privilege OK in container's case?
> Another reason we'd like to give up physically-contig feature is that it needs root privilege to read /proc/self/pagemap
> file. Container has already been widely criticized for bad security isolation. Enabling root privilege will make it worse.

I haven't checked how to invoke DPDK application in non-privileged
container.
But if we can invoke it, it's great.

I guess if we allocate memory like you did, probably we will not read
"/proc/self/pagemap".
Then we will be able to invoke DPDK application in non-privileged container.
Is this correct?

> On the other hand, it's not easy to remove root privilege too. If we use vhost-net as the backend, kernel will definitely
> require root privilege to create a tap device/raw socket. We tend to pick such work, which requires root, into runtime
> preparation of a container. Do you agree?

Yes, I agree. It's not easy to remove root privilege in some cases.
I guess if we can remove it in vhost-user case, it will be enough for
DPDK users.
What do you think?

> 3.Is one Qtest process per virtio device too heavy?
> Although we can foresee that each container always owns only one virtio device, but take its possible high density
> into consideration, hundreds or even thousands of container requires the same number of QTest processes. As
> you mentioned that port hotplug is supported, is it possible to use just one QTest process for all virtio devices
> emulation?

Yes, we can use pci hotplug for that purpose.
But it may depends on security policy.
The shared QEMU process knows all file descriptors of DPDK application
memories.
Because of this, I guess some users don't want to share QEMU process.

If the vhost-user is used, QEMU process doesn't use CPU resource.
So, I am not sure sleeping QEMU process will be overhead.

BTW, If we use pci hotplug, we need to use (virtual) pci bridge to
cascade pci devices.
So implementation will be more complex.
Honestly, I am not sure I will be able to finish it by next DPDK release.
How about starting from this implementation?
If we really need this feature, then add it.

> As you know, we have another solution according to this (which under heavy internal review). But I think we have lots
> of common problems to be solved, right?

Yes, I think so. And thanks for good suggestion.

Tetsuya,

> Thanks for your great work!
>
> Thanks,
> Jianfeng
>
>> -----Original Message-----
>> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
>> Sent: Wednesday, December 16, 2015 4:37 PM
>> To: dev@dpdk.org
>> Cc: nakajima.yoshihiro@lab.ntt.co.jp; Tan, Jianfeng; Xie, Huawei;
>> mst@redhat.com; marcandre.lureau@gmail.com; Tetsuya Mukawa
>> Subject: [PATCH v1 0/2] Virtio-net PMD Extension to work on host
>>
>> [Change log]
>>
>> PATCH v1:
>> (Just listing functionality changes and important bug fix)
>> * Support virtio-net interrupt handling.
>>   (It means virtio-net PMD on host and guest have same virtio-net features)
>> * Fix memory allocation method to allocate contiguous memory correctly.
>> * Port Hotplug is supported.
>> * Rebase on DPDK-2.2.
>>
>>
>> [Abstraction]
>>
>> Normally, virtio-net PMD only works on VM, because there is no virtio-net
>> device on host.
>> This RFC patch extends virtio-net PMD to be able to work on host as virtual
>> PMD.
>> But we didn't implement virtio-net device as a part of virtio-net PMD.
>> To prepare virtio-net device for the PMD, start QEMU process with special
>> QTest mode, then connect it from virtio-net PMD through unix domain
>> socket.
>>
>> The virtio-net PMD on host is fully compatible with the PMD on guest.
>> We can use same functionalities, and connect to anywhere QEMU virtio-net
>> device can.
>> For example, the PMD can use virtio-net multi queues function. Also it can
>> connects to vhost-net kernel module and vhost-user backend application.
>> Similar to virtio-net PMD on QEMU, application memory that uses virtio-net
>> PMD will be shared between vhost backend application. But vhost backend
>> application memory will not be shared.
>>
>> Main target of this PMD is container like docker, rkt, lxc and etc.
>> We can isolate related processes(virtio-net PMD process, QEMU and vhost-
>> user backend process) by container.
>> But, to communicate through unix domain socket, shared directory will be
>> needed.
>>
>>
>> [How to use]
>>
>> So far, we need QEMU patch to connect to vhost-user backend.
>> See below patch.
>>  - http://patchwork.ozlabs.org/patch/552549/
>> To know how to use, check commit log.
>>
>>
>> [Detailed Description]
>>
>>  - virtio-net device implementation
>> This host mode PMD uses QEMU virtio-net device. To do that, QEMU QTest
>> functionality is used.
>> QTest is a test framework of QEMU devices. It allows us to implement a
>> device driver outside of QEMU.
>> With QTest, we can implement DPDK application and virtio-net PMD as
>> standalone process on host.
>> When QEMU is invoked as QTest mode, any guest code will not run.
>> To know more about QTest, see below.
>>  - http://wiki.qemu.org/Features/QTest
>>
>>  - probing devices
>> QTest provides a unix domain socket. Through this socket, driver process can
>> access to I/O port and memory of QEMU virtual machine.
>> The PMD will send I/O port accesses to probe pci devices.
>> If we can find virtio-net and ivshmem device, initialize the devices.
>> Also, I/O port accesses of virtio-net PMD will be sent through socket, and
>> virtio-net PMD can initialize vitio-net device on QEMU correctly.
>>
>>  - ivshmem device to share memory
>> To share memory that virtio-net PMD process uses, ivshmem device will be
>> used.
>> Because ivshmem device can only handle one file descriptor, shared memory
>> should be consist of one file.
>> To allocate such a memory, EAL has new option called "--contig-mem".
>> If the option is specified, EAL will open a file and allocate memory from
>> hugepages.
>> While initializing ivshmem device, we can set BAR(Base Address Register).
>> It represents which memory QEMU vcpu can access to this shared memory.
>> We will specify host physical address of shared memory as this address.
>> It is very useful because we don't need to apply patch to QEMU to calculate
>> address offset.
>> (For example, if virtio-net PMD process will allocate memory from shared
>> memory, then specify the physical address of it to virtio-net register, QEMU
>> virtio-net device can understand it without calculating address offset.)
>>
>>
>> [Known issues]
>>
>>  - vhost-user
>> So far, to use vhost-user, we need to apply a patch to QEMU.
>> This is because, QEMU will not send memory information and file descriptor
>> of ivshmem device to vhost-user backend.
>> I have submitted the patch to QEMU.
>> See "http://patchwork.ozlabs.org/patch/552549/".
>> Also, we may have an issue in DPDK vhost library to handle kickfd and callfd.
>> The patch for this issue is needed. I have a workaround patch, but let me
>> check it more.
>> If someone wants to check vhost-user behavior, I will describe it more in
>> later email.
>>
>>
>>
>>
>> Tetsuya Mukawa (2):
>>   EAL: Add new EAL "--contig-mem" option
>>   virtio: Extend virtio-net PMD to support container environment
>>
>>  config/common_linuxapp                     |    1 +
>>  drivers/net/virtio/Makefile                |    4 +
>>  drivers/net/virtio/qtest.c                 | 1107
>> ++++++++++++++++++++++++++++
>>  drivers/net/virtio/virtio_ethdev.c         |  341 ++++++++-
>>  drivers/net/virtio/virtio_ethdev.h         |   12 +
>>  drivers/net/virtio/virtio_pci.h            |   25 +
>>  lib/librte_eal/common/eal_common_options.c |    7 +
>>  lib/librte_eal/common/eal_internal_cfg.h   |    1 +
>>  lib/librte_eal/common/eal_options.h        |    2 +
>>  lib/librte_eal/linuxapp/eal/eal_memory.c   |   77 +-
>>  10 files changed, 1543 insertions(+), 34 deletions(-)
>>  create mode 100644 drivers/net/virtio/qtest.c
>>
>> --
>> 2.1.4

  reply	other threads:[~2015-12-28 11:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19 10:57 [dpdk-dev] [RFC PATCH " Tetsuya Mukawa
2015-11-19 10:57 ` [dpdk-dev] [RFC PATCH 1/2] EAL: Add new EAL "--shm" option Tetsuya Mukawa
2015-12-16  8:37   ` [dpdk-dev] [PATCH v1 0/2] Virtio-net PMD Extension to work on host Tetsuya Mukawa
2015-12-16  8:37     ` [dpdk-dev] [PATCH v1 1/2] EAL: Add new EAL "--contig-mem" option Tetsuya Mukawa
2015-12-16  8:37     ` [dpdk-dev] [PATCH v1 2/2] virtio: Extend virtio-net PMD to support container environment Tetsuya Mukawa
2015-12-28 11:57       ` Pavel Fedin
2016-01-06  3:57         ` Tetsuya Mukawa
2016-01-06  5:56           ` Tan, Jianfeng
2016-01-06  7:27             ` Tetsuya Mukawa
2015-12-24 14:05     ` [dpdk-dev] [PATCH v1 0/2] Virtio-net PMD Extension to work on host Tan, Jianfeng
2015-12-28 11:06       ` Tetsuya Mukawa [this message]
2016-01-06  3:57         ` Tetsuya Mukawa
2016-01-06  5:42           ` Tan, Jianfeng
2016-01-06  7:35             ` Tetsuya Mukawa
2016-01-11  5:31               ` Tan, Jianfeng
2015-11-19 10:57 ` [dpdk-dev] [RFC PATCH 2/2] virtio: Extend virtio-net PMD to support container environment Tetsuya Mukawa
2015-11-19 18:16 ` [dpdk-dev] [RFC PATCH 0/2] Virtio-net PMD Extension to work on host Rich Lane
2015-11-20  2:00   ` Xie, Huawei
2015-11-20  2:35     ` Tetsuya Mukawa
2015-11-20  2:53       ` Tetsuya Mukawa
2015-12-28  5:15 ` Qiu, Michael
2015-12-28 11:06   ` Tetsuya Mukawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568117AB.1080605@igel.co.jp \
    --to=mukawa@igel.co.jp \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    --cc=mst@redhat.com \
    --cc=nakajima.yoshihiro@lab.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).