From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 6BE988082 for ; Wed, 17 Dec 2014 18:33:01 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 17 Dec 2014 09:31:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="430233040" Received: from kmsmsx152.gar.corp.intel.com ([172.21.73.87]) by FMSMGA003.fm.intel.com with ESMTP; 17 Dec 2014 09:20:06 -0800 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by KMSMSX152.gar.corp.intel.com (172.21.73.87) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 18 Dec 2014 01:31:15 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.110]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.182]) with mapi id 14.03.0195.001; Thu, 18 Dec 2014 01:31:13 +0800 From: "Xie, Huawei" To: Tetsuya Mukawa , "dev@dpdk.org" Thread-Topic: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support Thread-Index: AQHQGN0rDGierkxgakmerAM+nlIdtZyS231wgABVVlSAANKrIA== Date: Wed, 17 Dec 2014 17:31:12 +0000 Message-ID: References: <1418247477-13920-1-git-send-email-huawei.xie@intel.com> <1418247477-13920-9-git-send-email-huawei.xie@intel.com> <548FA172.5030604@igel.co.jp> <5490F90E.6050701@igel.co.jp> <549104F7.20906@igel.co.jp> In-Reply-To: <549104F7.20906@igel.co.jp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Dec 2014 17:33:02 -0000 > -----Original Message----- > From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp] > Sent: Tuesday, December 16, 2014 9:22 PM > To: Xie, Huawei; dev@dpdk.org > Cc: Linhaifeng (haifeng.lin@huawei.com) > Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support >=20 > (2014/12/17 12:31), Tetsuya Mukawa wrote: > > (2014/12/17 10:06), Xie, Huawei wrote: > >>>> +{ > >>>> + struct virtio_net *dev =3D get_device(ctx); > >>>> + > >>>> + /* We have to stop the queue (virtio) if it is running. */ > >>>> + if (dev->flags & VIRTIO_DEV_RUNNING) > >>>> + notify_ops->destroy_device(dev); > >>> I have an one concern about finalization of vrings. > >>> Can vhost-backend stop accessing RX/TX to the vring before replying t= o > >>> this message? > >>> > >>> QEMU sends this message when virtio-net device is finalized by > >>> virtio-net driver on the guest. > >>> After finalization, memories used by the vring will be freed by > >>> virtio-net driver, because these memories are allocated by virtio-net > >>> driver. > >>> Because of this, I guess vhost-backend must stop accessing to vring > >>> before replying to this message. > >>> > >>> I am not sure what is a good way to stop accessing. > >>> One idea is adding a condition checking when rte_vhost_dequeue_burst(= ) > >>> and rte_vhost_enqueue_burst() is called. > >>> Anyway we probably need to wait for stopping access before replying. > >>> > >>> Thanks, > >>> Tetsuya > >>> > >> I think we have discussed the similar question. > > Sorry, probably I might not be able to got your point correctly at the > > former email. > > > >> It is actually the same issue whether the virtio APP in guest is crash= ed, or is > finalized. > > I guess when the APP is finalized correctly, we can have a solution. > > Could you please read comment I wrote later? > > > >> The virtio APP will only write to the STATUS register without waiting/= syncing > to vhost backend. > > Yes, virtio APP only write to the STATUS register. I agree with it. > > > > When the register is written by guest, KVM will catch it, and the > > context will be change to QEMU. And QEMU will works like below. > > (Also while doing following steps, guest is waiting because the context > > is in QEMU) > > > > Could you please see below with latest QEMU code? > > 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <=3D virtio APP will > > wait for replying of this function. > > 2. virtio_set_status() [hw/virtio/virtio.c] > > 3. virtio_net_set_status() [hw/net/virtio-net.c] > > 4. virtio_net_vhost_status() [hw/net/virtio-net.c] > > 5. vhost_net_stop() [hw/net/vhost_net.c] > > 6. vhost_net_stop_one() [hw/net/vhost_net.c] > > 7. vhost_dev_stop() [hw/virtio/vhost.c] > > 8. vhost_virtqueue_stop() [hw/virtio/vhost.c] > > 9. vhost_user_call() [virtio/vhost-user.c] > > 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting > > for backend reply. > > > > When the vhost-user backend receives GET_VRING_BASE, I guess the guest > > APP is stopped. > > Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE > > is synchronous message. > > Because of above, I guess virtio APP can wait for vhost-backend > > finalization. > > > >> After that, not only the guest memory pointed by vring entry but also = the > vring itself isn't usable any more. > >> The memory for vring or pointed by vring entry might be used by other = APPs. > > I agree. > > > >> This will crash guest(rather than the vhost, do you agree?). > > I guess we cannot assume how the freed memory is used. > > In some cases, a new APP still works, but vhost backend can access > > inconsistency vring structure. > > In the case vhost backend could receive illegal packets. > > For example, avail_idx value might be changed to be 0xFFFF by a new APP= . > > (I am not sure RX/TX functions can handle such a value correctly) Yes, I fully understand your concern and this is a very good point. my point is, in theory, a well written vhost backend is either able to dete= ct the error, or will crash the guest VM(virtio APP or even guest OS) if it couldn't. For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this = value blindly, it will=20 1. for tx ring, receive illegal packets. This is ok. 2. for rx ring, dma to guest memory before the availd_idx, which will crash= guest virtual machine. In current implementation, if there is chance our vhost backend aborts itse= lf in error handling, that is incorrect. We need to check our code if there is such case. > > > > Anyway, my point is if we can finalize vhost backend correctly, we only > > need to take care of crashing case. > > (If so, it's very nice :)) > > So let's make sure whether virtio APP can wait for finalization, or not= . > > I am thinking how to do it now. Sorry for the confuse.=20 The STATUS write must be synchronized with qemu. The vcpu thread for the virtio APP willn't continue until qemu finishes th= e simulation and resumes the vcpu thread. In the RFC patch, the message handler will first call destroy_device provid= ed by vSwitch which will cause the vSwitch stop processing this virtio_device, then handl= er will send the reply. Is there an issue with the RFC patch? > > >=20 > I added sleep() like below. >=20 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque, > uint32_t addr, uint32_t val) > virtio_pci_stop_ioeventfd(proxy); > } >=20 > virtio_set_status(vdev, val & 0xFF); > + if (val =3D=3D 0) > + sleep(10); Yes, the register simulation for virtio device must be synching operation. = :). >=20 > if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > virtio_pci_start_ioeventfd(proxy); >=20 > When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command > takes 10 seconds to be finished. > So we can assume that virtio APP is able to wait for finalization of > vhost backend. >=20 > Thanks, > Tetsuya >=20 > >> If you mean this issue, I think we have no solution but one walk aroun= d: keep > the huge page files of crashed app, and > >> bind virtio to igb_uio and then delete the huge page files. > > Yes I agree. > > If the virtio APP is crashed, this will be a solution. > > > > Thanks, > > Tetsuya > > > >> In our implementation, when vhost sends the message, we will call the > destroy_device provided by the vSwitch to ask the > >> vSwitch to stop processing the vring, but this willn't solve the issue= I mention > above, because the virtio APP in guest will n't > >> wait us. > >> > >> Could you explain a bit more? Is it the same issue? > >> > >> > >> -huawei > >> > >> > >> > > >=20