DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tetsuya Mukawa <mukawa@igel.co.jp>
To: "Xie, Huawei" <huawei.xie@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
Date: Fri, 19 Dec 2014 12:36:07 +0900
Message-ID: <54939D27.5060308@igel.co.jp> (raw)
In-Reply-To: <C37D651A908B024F974696C65296B57B0F3277D0@SHSMSX101.ccr.corp.intel.com>

(2014/12/18 2:31), Xie, Huawei wrote:
>
>> -----Original Message-----
>> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
>> Sent: Tuesday, December 16, 2014 9:22 PM
>> To: Xie, Huawei; dev@dpdk.org
>> Cc: Linhaifeng (haifeng.lin@huawei.com)
>> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
>>
>> (2014/12/17 12:31), Tetsuya Mukawa wrote:
>>> (2014/12/17 10:06), Xie, Huawei wrote:
>>>>>> +{
>>>>>> +	struct virtio_net *dev = get_device(ctx);
>>>>>> +
>>>>>> +	/* We have to stop the queue (virtio) if it is running. */
>>>>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
>>>>>> +		notify_ops->destroy_device(dev);
>>>>> I have an one concern about finalization of vrings.
>>>>> Can vhost-backend stop accessing RX/TX to the vring before replying to
>>>>> this message?
>>>>>
>>>>> QEMU sends this message when virtio-net device is finalized by
>>>>> virtio-net driver on the guest.
>>>>> After finalization, memories used by the vring will be freed by
>>>>> virtio-net driver, because these memories are allocated by virtio-net
>>>>> driver.
>>>>> Because of this, I guess vhost-backend must stop accessing to vring
>>>>> before replying to this message.
>>>>>
>>>>> I am not sure what is a good way to stop accessing.
>>>>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
>>>>> and rte_vhost_enqueue_burst() is called.
>>>>> Anyway we probably need to wait for stopping access before replying.
>>>>>
>>>>> Thanks,
>>>>> Tetsuya
>>>>>
>>>> I think we have discussed the similar question.
>>> Sorry, probably I might not be able to got your point correctly at the
>>> former email.
>>>
>>>> It is actually the same issue whether the virtio APP in guest is crashed, or is
>> finalized.
>>> I guess when the APP is finalized correctly, we can have a solution.
>>> Could you please read comment I wrote later?
>>>
>>>> The virtio APP will only write to the STATUS register without waiting/syncing
>> to vhost backend.
>>> Yes, virtio APP only write to the STATUS register. I agree with it.
>>>
>>> When the register is written by guest, KVM will catch it, and the
>>> context will be change to QEMU. And QEMU will works like below.
>>> (Also while doing following steps, guest is waiting because the context
>>> is in QEMU)
>>>
>>> Could you please see below with latest QEMU code?
>>> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
>>> wait for replying of this function.
>>> 2. virtio_set_status() [hw/virtio/virtio.c]
>>> 3. virtio_net_set_status() [hw/net/virtio-net.c]
>>> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
>>> 5. vhost_net_stop() [hw/net/vhost_net.c]
>>> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
>>> 7. vhost_dev_stop() [hw/virtio/vhost.c]
>>> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
>>> 9. vhost_user_call() [virtio/vhost-user.c]
>>> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
>>> for backend reply.
>>>
>>> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
>>> APP is stopped.
>>> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
>>> is synchronous message.
>>> Because of above, I guess virtio APP can wait for vhost-backend
>>> finalization.
>>>
>>>> After that, not only the guest memory pointed by vring entry but also the
>> vring itself isn't usable any more.
>>>> The memory for vring or pointed by vring entry might be used by other APPs.
>>> I agree.
>>>
>>>> This will crash guest(rather than the vhost, do you agree?).
>>> I guess we cannot assume how the freed memory is used.
>>> In some cases, a new APP still works, but vhost backend can access
>>> inconsistency vring structure.
>>> In the case vhost backend could receive illegal packets.
>>> For example, avail_idx value might be changed to be 0xFFFF by a new APP.
>>> (I am not sure RX/TX functions can handle such a value correctly)
> Yes, I fully understand your concern and this is a very good point.
> my point is, in theory, a well written vhost backend is either able to detect the error,
> or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
> For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this value blindly, it will 
> 1. for tx ring, receive illegal packets. This is ok.
> 2. for rx ring, dma to guest memory before the availd_idx, which will crash guest virtual machine.
> In current implementation, if there is chance our vhost backend aborts itself in error handling, that is incorrect. We
> need to check our code if there is such case.

I agree. I will also check the code.

>
>>> Anyway, my point is if we can finalize vhost backend correctly, we only
>>> need to take care of crashing case.
>>> (If so, it's very nice :))
>>> So let's make sure whether virtio APP can wait for finalization, or not.
>>> I am thinking how to do it now.
> Sorry for the confuse. 
> The STATUS write must be synchronized with qemu.
> The vcpu thread for the virtio APP willn't continue until qemu  finishes the simulation
> and resumes the vcpu thread.
>
> In the RFC patch, the message handler will first call destroy_device provided by vSwitch
> which will cause the vSwitch stop processing this virtio_device, then handler will send the reply.
> Is there an issue with the RFC patch?

Thanks. I had a miss understanding for destroy_device handler.
I agree with your implementation. It should work correctly.

Thanks,
Tetsuya

>> I added sleep() like below.
>>
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
>> uint32_t addr, uint32_t val)
>>              virtio_pci_stop_ioeventfd(proxy);
>>          }
>>
>>          virtio_set_status(vdev, val & 0xFF);
>> +        if (val == 0)
>> +            sleep(10);
> Yes, the register simulation for virtio device must be synching operation. :).
>>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>>              virtio_pci_start_ioeventfd(proxy);
>>
>> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
>> takes 10 seconds to be finished.
>> So we can assume that virtio APP is able to wait for finalization of
>> vhost backend.
>>
>> Thanks,
>> Tetsuya
>>
>>>> If you mean this issue, I think we have no solution but one walk around: keep
>> the huge page files of crashed app, and
>>>> bind virtio to igb_uio and then delete the huge page files.
>>> Yes I agree.
>>> If the virtio APP is crashed, this will be a solution.
>>>
>>> Thanks,
>>> Tetsuya
>>>
>>>> In our implementation, when vhost sends the message,  we will call the
>> destroy_device provided by the vSwitch to ask the
>>>> vSwitch to stop processing the vring, but this willn't solve the issue I mention
>> above, because the virtio APP in guest will n't
>>>> wait us.
>>>>
>>>> Could you explain a bit more? Is it the same issue?
>>>>
>>>>
>>>> -huawei
>>>>
>>>>
>>>>

  reply	other threads:[~2014-12-19  3:36 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10 21:37 [dpdk-dev] [PATCH RFC v2 00/12] " Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 01/12] lib/librte_vhost: mov vhost-cuse implementation to vhost_cuse directory Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 02/12] lib/librte_vhost: rename vhost-net-cdev.h as vhost-net.h Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 03/12] lib/librte_vhost: move event_copy logic from virtio-net.c to vhost-net-cdev.c Huawei Xie
2015-01-07  9:10   ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 04/12] lib/librte_vhost: copy of host_memory_map from virtio-net.c to new file virtio-net-cdev.c Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 05/12] lib/librte_vhost: host_memory_map refine Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 06/12] lib/librte_vhost: cuse_set_memory_table Huawei Xie
2014-12-15  5:20   ` Tetsuya Mukawa
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 07/12] lib/librte_vhost: async event and callback Huawei Xie
2014-12-15  5:20   ` Tetsuya Mukawa
2014-12-17 17:51     ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support Huawei Xie
2014-12-11  5:36   ` Linhaifeng
2015-01-05 10:21     ` Xie, Huawei
2015-01-23  3:40     ` Xie, Huawei
2015-01-23  3:53       ` Linhaifeng
2014-12-11  6:04   ` Linhaifeng
2014-12-11 17:13     ` Xie, Huawei
2014-12-12  2:25       ` Linhaifeng
2014-12-11 20:16     ` Xie, Huawei
2015-01-23  3:36     ` Xie, Huawei
2015-01-23  8:36       ` Linhaifeng
2014-12-16  3:05   ` Tetsuya Mukawa
2014-12-17  1:06     ` Xie, Huawei
2014-12-17  3:31       ` Tetsuya Mukawa
2014-12-17  4:22         ` Tetsuya Mukawa
2014-12-17 17:31           ` Xie, Huawei
2014-12-19  3:36             ` Tetsuya Mukawa [this message]
2014-12-24  7:21   ` Tetsuya Mukawa
2015-01-04  9:53     ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 09/12] lib/librte_vhost: minor fix Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 10/12] lib/librte_vhost: vhost-user memory region map Huawei Xie
2014-12-16  2:38   ` Tetsuya Mukawa
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 11/12] lib/librte_vhost: kick/callfd fix Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 12/12] lib/librte_vhost: cleanup when vhost user socket connection is closed Huawei Xie
2014-12-10 22:04 ` [dpdk-dev] [PATCH RFC v2 00/12] lib/librte_vhost: vhost-user support Xie, Huawei
2014-12-11  2:21   ` Tetsuya Mukawa
2014-12-15  5:26 ` Tetsuya Mukawa
2014-12-17 17:43   ` Xie, Huawei
2015-01-07 12:43     ` Qiu, Michael
2015-01-23  8:16 ` Linhaifeng
2015-01-26  7:24   ` Xie, Huawei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54939D27.5060308@igel.co.jp \
    --to=mukawa@igel.co.jp \
    --cc=dev@dpdk.org \
    --cc=huawei.xie@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git