From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by dpdk.org (Postfix) with ESMTP id CD2322A1A for ; Fri, 19 Dec 2014 04:36:08 +0100 (CET) Received: by mail-pa0-f46.google.com with SMTP id lf10so298193pab.19 for ; Thu, 18 Dec 2014 19:36:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=UYDrjq3UFL601D3d2Djkhmowm+81MAFao4hQuR5XoKg=; b=N2TPFPtmu8lyTa0kfeLwOH7+yHCB75c391ZR+/p6xtyQozEDhpVSBPJ6q9jKFbWRfG JS7WzYv75Zb/l/FJ4qBDFXMf9LTv61Q3t8j8KHyDZWmOz+MAAl6Ik6gQUjnUeyxIsI4O uEiMeEmgXQc2tXuglrLKeQN1oVeDBTxFg6b3WQKQo4w6yeJBeqj0O99H1jRORX89hNjr ng7q89KBYkF1ndLLtLoErNxfNZGf2t9kvG2e64iTcBz2oofcyVdMrCOSKLu3bHiDr5sD KV2K71XK0ePTmpweeTypEoVYjkCAba40lqLQ2R9wshiiHZB1O4cI5/W4OlSpdW4OYX8z 8JKQ== X-Gm-Message-State: ALoCoQmnbZ45rpSWvB8VwnMX3yJ6PcuiXJyBSCO6axnrX+3QpjrxKHv/H0wkDG77KF4orxVen0Rb X-Received: by 10.68.132.103 with SMTP id ot7mr6910185pbb.0.1418960167326; Thu, 18 Dec 2014 19:36:07 -0800 (PST) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by mx.google.com with ESMTPSA id w14sm8140050pas.14.2014.12.18.19.36.05 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Dec 2014 19:36:06 -0800 (PST) Message-ID: <54939D27.5060308@igel.co.jp> Date: Fri, 19 Dec 2014 12:36:07 +0900 From: Tetsuya Mukawa User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: "Xie, Huawei" , "dev@dpdk.org" References: <1418247477-13920-1-git-send-email-huawei.xie@intel.com> <1418247477-13920-9-git-send-email-huawei.xie@intel.com> <548FA172.5030604@igel.co.jp> <5490F90E.6050701@igel.co.jp> <549104F7.20906@igel.co.jp> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Dec 2014 03:36:09 -0000 (2014/12/18 2:31), Xie, Huawei wrote: > >> -----Original Message----- >> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp] >> Sent: Tuesday, December 16, 2014 9:22 PM >> To: Xie, Huawei; dev@dpdk.org >> Cc: Linhaifeng (haifeng.lin@huawei.com) >> Subject: Re: [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support >> >> (2014/12/17 12:31), Tetsuya Mukawa wrote: >>> (2014/12/17 10:06), Xie, Huawei wrote: >>>>>> +{ >>>>>> + struct virtio_net *dev = get_device(ctx); >>>>>> + >>>>>> + /* We have to stop the queue (virtio) if it is running. */ >>>>>> + if (dev->flags & VIRTIO_DEV_RUNNING) >>>>>> + notify_ops->destroy_device(dev); >>>>> I have an one concern about finalization of vrings. >>>>> Can vhost-backend stop accessing RX/TX to the vring before replying to >>>>> this message? >>>>> >>>>> QEMU sends this message when virtio-net device is finalized by >>>>> virtio-net driver on the guest. >>>>> After finalization, memories used by the vring will be freed by >>>>> virtio-net driver, because these memories are allocated by virtio-net >>>>> driver. >>>>> Because of this, I guess vhost-backend must stop accessing to vring >>>>> before replying to this message. >>>>> >>>>> I am not sure what is a good way to stop accessing. >>>>> One idea is adding a condition checking when rte_vhost_dequeue_burst() >>>>> and rte_vhost_enqueue_burst() is called. >>>>> Anyway we probably need to wait for stopping access before replying. >>>>> >>>>> Thanks, >>>>> Tetsuya >>>>> >>>> I think we have discussed the similar question. >>> Sorry, probably I might not be able to got your point correctly at the >>> former email. >>> >>>> It is actually the same issue whether the virtio APP in guest is crashed, or is >> finalized. >>> I guess when the APP is finalized correctly, we can have a solution. >>> Could you please read comment I wrote later? >>> >>>> The virtio APP will only write to the STATUS register without waiting/syncing >> to vhost backend. >>> Yes, virtio APP only write to the STATUS register. I agree with it. >>> >>> When the register is written by guest, KVM will catch it, and the >>> context will be change to QEMU. And QEMU will works like below. >>> (Also while doing following steps, guest is waiting because the context >>> is in QEMU) >>> >>> Could you please see below with latest QEMU code? >>> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <= virtio APP will >>> wait for replying of this function. >>> 2. virtio_set_status() [hw/virtio/virtio.c] >>> 3. virtio_net_set_status() [hw/net/virtio-net.c] >>> 4. virtio_net_vhost_status() [hw/net/virtio-net.c] >>> 5. vhost_net_stop() [hw/net/vhost_net.c] >>> 6. vhost_net_stop_one() [hw/net/vhost_net.c] >>> 7. vhost_dev_stop() [hw/virtio/vhost.c] >>> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c] >>> 9. vhost_user_call() [virtio/vhost-user.c] >>> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting >>> for backend reply. >>> >>> When the vhost-user backend receives GET_VRING_BASE, I guess the guest >>> APP is stopped. >>> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE >>> is synchronous message. >>> Because of above, I guess virtio APP can wait for vhost-backend >>> finalization. >>> >>>> After that, not only the guest memory pointed by vring entry but also the >> vring itself isn't usable any more. >>>> The memory for vring or pointed by vring entry might be used by other APPs. >>> I agree. >>> >>>> This will crash guest(rather than the vhost, do you agree?). >>> I guess we cannot assume how the freed memory is used. >>> In some cases, a new APP still works, but vhost backend can access >>> inconsistency vring structure. >>> In the case vhost backend could receive illegal packets. >>> For example, avail_idx value might be changed to be 0xFFFF by a new APP. >>> (I am not sure RX/TX functions can handle such a value correctly) > Yes, I fully understand your concern and this is a very good point. > my point is, in theory, a well written vhost backend is either able to detect the error, > or will crash the guest VM(virtio APP or even guest OS) if it couldn't. > For example, if avail_idx is set to 0Xfffff, if vhost_backend accepts this value blindly, it will > 1. for tx ring, receive illegal packets. This is ok. > 2. for rx ring, dma to guest memory before the availd_idx, which will crash guest virtual machine. > In current implementation, if there is chance our vhost backend aborts itself in error handling, that is incorrect. We > need to check our code if there is such case. I agree. I will also check the code. > >>> Anyway, my point is if we can finalize vhost backend correctly, we only >>> need to take care of crashing case. >>> (If so, it's very nice :)) >>> So let's make sure whether virtio APP can wait for finalization, or not. >>> I am thinking how to do it now. > Sorry for the confuse. > The STATUS write must be synchronized with qemu. > The vcpu thread for the virtio APP willn't continue until qemu finishes the simulation > and resumes the vcpu thread. > > In the RFC patch, the message handler will first call destroy_device provided by vSwitch > which will cause the vSwitch stop processing this virtio_device, then handler will send the reply. > Is there an issue with the RFC patch? Thanks. I had a miss understanding for destroy_device handler. I agree with your implementation. It should work correctly. Thanks, Tetsuya >> I added sleep() like below. >> >> --- a/hw/virtio/virtio-pci.c >> +++ b/hw/virtio/virtio-pci.c >> @@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque, >> uint32_t addr, uint32_t val) >> virtio_pci_stop_ioeventfd(proxy); >> } >> >> virtio_set_status(vdev, val & 0xFF); >> + if (val == 0) >> + sleep(10); > Yes, the register simulation for virtio device must be synching operation. :). >> if (val & VIRTIO_CONFIG_S_DRIVER_OK) { >> virtio_pci_start_ioeventfd(proxy); >> >> When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command >> takes 10 seconds to be finished. >> So we can assume that virtio APP is able to wait for finalization of >> vhost backend. >> >> Thanks, >> Tetsuya >> >>>> If you mean this issue, I think we have no solution but one walk around: keep >> the huge page files of crashed app, and >>>> bind virtio to igb_uio and then delete the huge page files. >>> Yes I agree. >>> If the virtio APP is crashed, this will be a solution. >>> >>> Thanks, >>> Tetsuya >>> >>>> In our implementation, when vhost sends the message, we will call the >> destroy_device provided by the vSwitch to ask the >>>> vSwitch to stop processing the vring, but this willn't solve the issue I mention >> above, because the virtio APP in guest will n't >>>> wait us. >>>> >>>> Could you explain a bit more? Is it the same issue? >>>> >>>> >>>> -huawei >>>> >>>> >>>>