DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Xie, Huawei" <huawei.xie@intel.com>
To: Tetsuya Mukawa <mukawa@igel.co.jp>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
Date: Wed, 17 Dec 2014 17:31:12 +0000
Message-ID: <C37D651A908B024F974696C65296B57B0F3277D0@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <549104F7.20906@igel.co.jp>



> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Tuesday, December 16, 2014 9:22 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: Linhaifeng (haifeng.lin@huawei.com)
> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
> 
> (2014/12/17 12:31), Tetsuya Mukawa wrote:
> > (2014/12/17 10:06), Xie, Huawei wrote:
> >>>> +{
> >>>> +	struct virtio_net *dev = get_device(ctx);
> >>>> +
> >>>> +	/* We have to stop the queue (virtio) if it is running. */
> >>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
> >>>> +		notify_ops->destroy_device(dev);
> >>> I have an one concern about finalization of vrings.
> >>> Can vhost-backend stop accessing RX/TX to the vring before replying to
> >>> this message?
> >>>
> >>> QEMU sends this message when virtio-net device is finalized by
> >>> virtio-net driver on the guest.
> >>> After finalization, memories used by the vring will be freed by
> >>> virtio-net driver, because these memories are allocated by virtio-net
> >>> driver.
> >>> Because of this, I guess vhost-backend must stop accessing to vring
> >>> before replying to this message.
> >>>
> >>> I am not sure what is a good way to stop accessing.
> >>> One idea is adding a condition checking when rte_vhost_dequeue_burst()
> >>> and rte_vhost_enqueue_burst() is called.
> >>> Anyway we probably need to wait for stopping access before replying.
> >>>
> >>> Thanks,
> >>> Tetsuya
> >>>
> >> I think we have discussed the similar question.
> > Sorry, probably I might not be able to got your point correctly at the
> > former email.
> >
> >> It is actually the same issue whether the virtio APP in guest is crashed, or is
> finalized.
> > I guess when the APP is finalized correctly, we can have a solution.
> > Could you please read comment I wrote later?
> >
> >> The virtio APP will only write to the STATUS register without waiting/syncing
> to vhost backend.
> > Yes, virtio APP only write to the STATUS register. I agree with it.
> >
> > When the register is written by guest, KVM will catch it, and the
> > context will be change to QEMU. And QEMU will works like below.
> > (Also while doing following steps, guest is waiting because the context
> > is in QEMU)
> >
> > Could you please see below with latest QEMU code?
> > 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will
> > wait for replying of this function.
> > 2. virtio_set_status() [hw/virtio/virtio.c]
> > 3. virtio_net_set_status() [hw/net/virtio-net.c]
> > 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> > 5. vhost_net_stop() [hw/net/vhost_net.c]
> > 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> > 7. vhost_dev_stop() [hw/virtio/vhost.c]
> > 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> > 9. vhost_user_call() [virtio/vhost-user.c]
> > 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> > for backend reply.
> >
> > When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> > APP is stopped.
> > Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> > is synchronous message.
> > Because of above, I guess virtio APP can wait for vhost-backend
> > finalization.
> >
> >> After that, not only the guest memory pointed by vring entry but also the
> vring itself isn't usable any more.
> >> The memory for vring or pointed by vring entry might be used by other APPs.
> > I agree.
> >
> >> This will crash guest(rather than the vhost, do you agree?).
> > I guess we cannot assume how the freed memory is used.
> > In some cases, a new APP still works, but vhost backend can access
> > inconsistency vring structure.
> > In the case vhost backend could receive illegal packets.
> > For example, avail_idx value might be changed to be 0xFFFF by a new APP.
> > (I am not sure RX/TX functions can handle such a value correctly)

Yes, I fully understand your concern and this is a very good point.
my point is, in theory, a well written vhost backend is either able to detect the error,
or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this value blindly, it will 
1. for tx ring, receive illegal packets. This is ok.
2. for rx ring, dma to guest memory before the availd_idx, which will crash guest virtual machine.
In current implementation, if there is chance our vhost backend aborts itself in error handling, that is incorrect. We
need to check our code if there is such case.

> >
> > Anyway, my point is if we can finalize vhost backend correctly, we only
> > need to take care of crashing case.
> > (If so, it's very nice :))
> > So let's make sure whether virtio APP can wait for finalization, or not.
> > I am thinking how to do it now.

Sorry for the confuse. 
The STATUS write must be synchronized with qemu.
The vcpu thread for the virtio APP willn't continue until qemu  finishes the simulation
and resumes the vcpu thread.

In the RFC patch, the message handler will first call destroy_device provided by vSwitch
which will cause the vSwitch stop processing this virtio_device, then handler will send the reply.
Is there an issue with the RFC patch?

> >
> 
> I added sleep() like below.
> 
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
> uint32_t addr, uint32_t val)
>              virtio_pci_stop_ioeventfd(proxy);
>          }
> 
>          virtio_set_status(vdev, val & 0xFF);
> +        if (val == 0)
> +            sleep(10);
Yes, the register simulation for virtio device must be synching operation. :).
> 
>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>              virtio_pci_start_ioeventfd(proxy);
> 
> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
> takes 10 seconds to be finished.
> So we can assume that virtio APP is able to wait for finalization of
> vhost backend.
> 
> Thanks,
> Tetsuya
> 
> >> If you mean this issue, I think we have no solution but one walk around: keep
> the huge page files of crashed app, and
> >> bind virtio to igb_uio and then delete the huge page files.
> > Yes I agree.
> > If the virtio APP is crashed, this will be a solution.
> >
> > Thanks,
> > Tetsuya
> >
> >> In our implementation, when vhost sends the message,  we will call the
> destroy_device provided by the vSwitch to ask the
> >> vSwitch to stop processing the vring, but this willn't solve the issue I mention
> above, because the virtio APP in guest will n't
> >> wait us.
> >>
> >> Could you explain a bit more? Is it the same issue?
> >>
> >>
> >> -huawei
> >>
> >>
> >>
> >
> 

  reply	other threads:[~2014-12-17 17:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10 21:37 [dpdk-dev] [PATCH RFC v2 00/12] " Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 01/12] lib/librte_vhost: mov vhost-cuse implementation to vhost_cuse directory Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 02/12] lib/librte_vhost: rename vhost-net-cdev.h as vhost-net.h Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 03/12] lib/librte_vhost: move event_copy logic from virtio-net.c to vhost-net-cdev.c Huawei Xie
2015-01-07  9:10   ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 04/12] lib/librte_vhost: copy of host_memory_map from virtio-net.c to new file virtio-net-cdev.c Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 05/12] lib/librte_vhost: host_memory_map refine Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 06/12] lib/librte_vhost: cuse_set_memory_table Huawei Xie
2014-12-15  5:20   ` Tetsuya Mukawa
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 07/12] lib/librte_vhost: async event and callback Huawei Xie
2014-12-15  5:20   ` Tetsuya Mukawa
2014-12-17 17:51     ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support Huawei Xie
2014-12-11  5:36   ` Linhaifeng
2015-01-05 10:21     ` Xie, Huawei
2015-01-23  3:40     ` Xie, Huawei
2015-01-23  3:53       ` Linhaifeng
2014-12-11  6:04   ` Linhaifeng
2014-12-11 17:13     ` Xie, Huawei
2014-12-12  2:25       ` Linhaifeng
2014-12-11 20:16     ` Xie, Huawei
2015-01-23  3:36     ` Xie, Huawei
2015-01-23  8:36       ` Linhaifeng
2014-12-16  3:05   ` Tetsuya Mukawa
2014-12-17  1:06     ` Xie, Huawei
2014-12-17  3:31       ` Tetsuya Mukawa
2014-12-17  4:22         ` Tetsuya Mukawa
2014-12-17 17:31           ` Xie, Huawei [this message]
2014-12-19  3:36             ` Tetsuya Mukawa
2014-12-24  7:21   ` Tetsuya Mukawa
2015-01-04  9:53     ` Xie, Huawei
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 09/12] lib/librte_vhost: minor fix Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 10/12] lib/librte_vhost: vhost-user memory region map Huawei Xie
2014-12-16  2:38   ` Tetsuya Mukawa
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 11/12] lib/librte_vhost: kick/callfd fix Huawei Xie
2014-12-10 21:37 ` [dpdk-dev] [PATCH RFC v2 12/12] lib/librte_vhost: cleanup when vhost user socket connection is closed Huawei Xie
2014-12-10 22:04 ` [dpdk-dev] [PATCH RFC v2 00/12] lib/librte_vhost: vhost-user support Xie, Huawei
2014-12-11  2:21   ` Tetsuya Mukawa
2014-12-15  5:26 ` Tetsuya Mukawa
2014-12-17 17:43   ` Xie, Huawei
2015-01-07 12:43     ` Qiu, Michael
2015-01-23  8:16 ` Linhaifeng
2015-01-26  7:24   ` Xie, Huawei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C37D651A908B024F974696C65296B57B0F3277D0@SHSMSX101.ccr.corp.intel.com \
    --to=huawei.xie@intel.com \
    --cc=dev@dpdk.org \
    --cc=mukawa@igel.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git