From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 6BE988082
 for <dev@dpdk.org>; Wed, 17 Dec 2014 18:33:01 +0100 (CET)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP; 17 Dec 2014 09:31:16 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="430233040"
Received: from kmsmsx152.gar.corp.intel.com ([172.21.73.87])
 by FMSMGA003.fm.intel.com with ESMTP; 17 Dec 2014 09:20:06 -0800
Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by
 KMSMSX152.gar.corp.intel.com (172.21.73.87) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Thu, 18 Dec 2014 01:31:15 +0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.110]) by
 SHSMSX104.ccr.corp.intel.com ([169.254.5.182]) with mapi id 14.03.0195.001;
 Thu, 18 Dec 2014 01:31:13 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: Tetsuya Mukawa <mukawa@igel.co.jp>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
Thread-Index: AQHQGN0rDGierkxgakmerAM+nlIdtZyS231wgABVVlSAANKrIA==
Date: Wed, 17 Dec 2014 17:31:12 +0000
Message-ID: <C37D651A908B024F974696C65296B57B0F3277D0@SHSMSX101.ccr.corp.intel.com>
References: <1418247477-13920-1-git-send-email-huawei.xie@intel.com>
 <1418247477-13920-9-git-send-email-huawei.xie@intel.com>
 <548FA172.5030604@igel.co.jp>
 <C37D651A908B024F974696C65296B57B0F326812@SHSMSX101.ccr.corp.intel.com>
 <5490F90E.6050701@igel.co.jp> <549104F7.20906@igel.co.jp>
In-Reply-To: <549104F7.20906@igel.co.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
	support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Dec 2014 17:33:02 -0000


> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Tuesday, December 16, 2014 9:22 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: Linhaifeng (haifeng.lin@huawei.com)
> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support
>=20
> (2014/12/17 12:31), Tetsuya Mukawa wrote:
> > (2014/12/17 10:06), Xie, Huawei wrote:
> >>>> +{
> >>>> +	struct virtio_net *dev =3D get_device(ctx);
> >>>> +
> >>>> +	/* We have to stop the queue (virtio) if it is running. */
> >>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
> >>>> +		notify_ops->destroy_device(dev);
> >>> I have an one concern about finalization of vrings.
> >>> Can vhost-backend stop accessing RX/TX to the vring before replying t=
o
> >>> this message?
> >>>
> >>> QEMU sends this message when virtio-net device is finalized by
> >>> virtio-net driver on the guest.
> >>> After finalization, memories used by the vring will be freed by
> >>> virtio-net driver, because these memories are allocated by virtio-net
> >>> driver.
> >>> Because of this, I guess vhost-backend must stop accessing to vring
> >>> before replying to this message.
> >>>
> >>> I am not sure what is a good way to stop accessing.
> >>> One idea is adding a condition checking when rte_vhost_dequeue_burst(=
)
> >>> and rte_vhost_enqueue_burst() is called.
> >>> Anyway we probably need to wait for stopping access before replying.
> >>>
> >>> Thanks,
> >>> Tetsuya
> >>>
> >> I think we have discussed the similar question.
> > Sorry, probably I might not be able to got your point correctly at the
> > former email.
> >
> >> It is actually the same issue whether the virtio APP in guest is crash=
ed, or is
> finalized.
> > I guess when the APP is finalized correctly, we can have a solution.
> > Could you please read comment I wrote later?
> >
> >> The virtio APP will only write to the STATUS register without waiting/=
syncing
> to vhost backend.
> > Yes, virtio APP only write to the STATUS register. I agree with it.
> >
> > When the register is written by guest, KVM will catch it, and the
> > context will be change to QEMU. And QEMU will works like below.
> > (Also while doing following steps, guest is waiting because the context
> > is in QEMU)
> >
> > Could you please see below with latest QEMU code?
> > 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <=3D virtio APP will
> > wait for replying of this function.
> > 2. virtio_set_status() [hw/virtio/virtio.c]
> > 3. virtio_net_set_status() [hw/net/virtio-net.c]
> > 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> > 5. vhost_net_stop() [hw/net/vhost_net.c]
> > 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> > 7. vhost_dev_stop() [hw/virtio/vhost.c]
> > 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> > 9. vhost_user_call() [virtio/vhost-user.c]
> > 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> > for backend reply.
> >
> > When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> > APP is stopped.
> > Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE
> > is synchronous message.
> > Because of above, I guess virtio APP can wait for vhost-backend
> > finalization.
> >
> >> After that, not only the guest memory pointed by vring entry but also =
the
> vring itself isn't usable any more.
> >> The memory for vring or pointed by vring entry might be used by other =
APPs.
> > I agree.
> >
> >> This will crash guest(rather than the vhost, do you agree?).
> > I guess we cannot assume how the freed memory is used.
> > In some cases, a new APP still works, but vhost backend can access
> > inconsistency vring structure.
> > In the case vhost backend could receive illegal packets.
> > For example, avail_idx value might be changed to be 0xFFFF by a new APP=
.
> > (I am not sure RX/TX functions can handle such a value correctly)

Yes, I fully understand your concern and this is a very good point.
my point is, in theory, a well written vhost backend is either able to dete=
ct the error,
or will crash the guest VM(virtio APP or even guest OS)  if it couldn't.
For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this =
value blindly, it will=20
1. for tx ring, receive illegal packets. This is ok.
2. for rx ring, dma to guest memory before the availd_idx, which will crash=
 guest virtual machine.
In current implementation, if there is chance our vhost backend aborts itse=
lf in error handling, that is incorrect. We
need to check our code if there is such case.

> >
> > Anyway, my point is if we can finalize vhost backend correctly, we only
> > need to take care of crashing case.
> > (If so, it's very nice :))
> > So let's make sure whether virtio APP can wait for finalization, or not=
.
> > I am thinking how to do it now.

Sorry for the confuse.=20
The STATUS write must be synchronized with qemu.
The vcpu thread for the virtio APP willn't continue until qemu  finishes th=
e simulation
and resumes the vcpu thread.

In the RFC patch, the message handler will first call destroy_device provid=
ed by vSwitch
which will cause the vSwitch stop processing this virtio_device, then handl=
er will send the reply.
Is there an issue with the RFC patch?

> >
>=20
> I added sleep() like below.
>=20
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
> uint32_t addr, uint32_t val)
>              virtio_pci_stop_ioeventfd(proxy);
>          }
>=20
>          virtio_set_status(vdev, val & 0xFF);
> +        if (val =3D=3D 0)
> +            sleep(10);
Yes, the register simulation for virtio device must be synching operation. =
:).
>=20
>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>              virtio_pci_start_ioeventfd(proxy);
>=20
> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
> takes 10 seconds to be finished.
> So we can assume that virtio APP is able to wait for finalization of
> vhost backend.
>=20
> Thanks,
> Tetsuya
>=20
> >> If you mean this issue, I think we have no solution but one walk aroun=
d: keep
> the huge page files of crashed app, and
> >> bind virtio to igb_uio and then delete the huge page files.
> > Yes I agree.
> > If the virtio APP is crashed, this will be a solution.
> >
> > Thanks,
> > Tetsuya
> >
> >> In our implementation, when vhost sends the message,  we will call the
> destroy_device provided by the vSwitch to ask the
> >> vSwitch to stop processing the vring, but this willn't solve the issue=
 I mention
> above, because the virtio APP in guest will n't
> >> wait us.
> >>
> >> Could you explain a bit more? Is it the same issue?
> >>
> >>
> >> -huawei
> >>
> >>
> >>
> >
>=20