From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mukawa@igel.co.jp>
Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com
 [209.85.192.169]) by dpdk.org (Postfix) with ESMTP id 72A603B5
 for <dev@dpdk.org>; Wed, 17 Dec 2014 05:22:16 +0100 (CET)
Received: by mail-pd0-f169.google.com with SMTP id z10so15337354pdj.14
 for <dev@dpdk.org>; Tue, 16 Dec 2014 20:22:15 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=tKkQscjp3B6Yeu+E2im2wmQ1XAKBgeReX8PPXOHwzJo=;
 b=IomQRKF7lw0KD4LAC4RZ8btJmUNcnLEJOkpj0ujeU+Gb1T4HM3B2sMQBQyt7dXtUyl
 rLIZtTiKoS44Rqv+loeWgqc1WwVwoDTcW1ecEmaIJusCswHvkUcbhXkC+Qgi9pg0p0X6
 mDFlETpR4y+rjXxjSVXoz1SjOd4SKa7Y87rZCpRIGqNbZUQ1Pnd0pHFeNuLyl3GNPF5n
 2jrF+l6gZkoa6m3s88V3B05y4m1lfi1SENWxg0sLXbeOnkr2OTyjr4V028PEhE3M4U+2
 0m/K39hHnvDsgDpHazUW3h2a49xiq3s8d+8DDtlkN9HjsCEKoMV4ESAW3O59x5N5yIrG
 yGfg==
X-Gm-Message-State: ALoCoQmJCQawX1PDs3vIW1RFwmb/dJyUIFo9r7ejm9QM47E6PTnkIonGOFe4ER/4tCz9HUbU4Z4b
X-Received: by 10.66.151.202 with SMTP id us10mr66527066pab.78.1418790135655; 
 Tue, 16 Dec 2014 20:22:15 -0800 (PST)
Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132])
 by mx.google.com with ESMTPSA id r1sm2368205pdp.83.2014.12.16.20.22.13
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 16 Dec 2014 20:22:15 -0800 (PST)
Message-ID: <549104F7.20906@igel.co.jp>
Date: Wed, 17 Dec 2014 13:22:15 +0900
From: Tetsuya Mukawa <mukawa@igel.co.jp>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: "Xie, Huawei" <huawei.xie@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
References: <1418247477-13920-1-git-send-email-huawei.xie@intel.com>
 <1418247477-13920-9-git-send-email-huawei.xie@intel.com>
 <548FA172.5030604@igel.co.jp>
 <C37D651A908B024F974696C65296B57B0F326812@SHSMSX101.ccr.corp.intel.com>
 <5490F90E.6050701@igel.co.jp>
In-Reply-To: <5490F90E.6050701@igel.co.jp>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
	support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Dec 2014 04:22:17 -0000

(2014/12/17 12:31), Tetsuya Mukawa wrote:
> (2014/12/17 10:06), Xie, Huawei wrote:
>>>> +{
>>>> +	struct virtio_net *dev =3D get_device(ctx);
>>>> +
>>>> +	/* We have to stop the queue (virtio) if it is running. */
>>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
>>>> +		notify_ops->destroy_device(dev);
>>> I have an one concern about finalization of vrings.
>>> Can vhost-backend stop accessing RX/TX to the vring before replying t=
o
>>> this message?
>>>
>>> QEMU sends this message when virtio-net device is finalized by
>>> virtio-net driver on the guest.
>>> After finalization, memories used by the vring will be freed by
>>> virtio-net driver, because these memories are allocated by virtio-net=

>>> driver.
>>> Because of this, I guess vhost-backend must stop accessing to vring
>>> before replying to this message.
>>>
>>> I am not sure what is a good way to stop accessing.
>>> One idea is adding a condition checking when rte_vhost_dequeue_burst(=
)
>>> and rte_vhost_enqueue_burst() is called.
>>> Anyway we probably need to wait for stopping access before replying.
>>>
>>> Thanks,
>>> Tetsuya
>>>
>> I think we have discussed the similar question.
> Sorry, probably I might not be able to got your point correctly at the
> former email.
>
>> It is actually the same issue whether the virtio APP in guest is crash=
ed, or is finalized.
> I guess when the APP is finalized correctly, we can have a solution.
> Could you please read comment I wrote later?
>
>> The virtio APP will only write to the STATUS register without waiting/=
syncing to vhost backend.
> Yes, virtio APP only write to the STATUS register. I agree with it.
>
> When the register is written by guest, KVM will catch it, and the
> context will be change to QEMU. And QEMU will works like below.
> (Also while doing following steps, guest is waiting because the context=

> is in QEMU)
>
> Could you please see below with latest QEMU code?
> 1. virtio_ioport_write() [hw/virtio/virtio-pci.c] <=3D virtio APP will
> wait for replying of this function.
> 2. virtio_set_status() [hw/virtio/virtio.c]
> 3. virtio_net_set_status() [hw/net/virtio-net.c]
> 4. virtio_net_vhost_status() [hw/net/virtio-net.c]
> 5. vhost_net_stop() [hw/net/vhost_net.c]
> 6. vhost_net_stop_one() [hw/net/vhost_net.c]
> 7. vhost_dev_stop() [hw/virtio/vhost.c]
> 8. vhost_virtqueue_stop() [hw/virtio/vhost.c]
> 9. vhost_user_call() [virtio/vhost-user.c]
> 10. VHOST_USER_GET_VRING_BASE message is sent to backend. And waiting
> for backend reply.
>
> When the vhost-user backend receives GET_VRING_BASE, I guess the guest
> APP is stopped.
> Also QEMU will wait for vhost-user backend reply because GET_VRING_BASE=

> is synchronous message.
> Because of above, I guess virtio APP can wait for vhost-backend
> finalization.
>
>> After that, not only the guest memory pointed by vring entry but also =
the vring itself isn't usable any more.
>> The memory for vring or pointed by vring entry might be used by other =
APPs.
> I agree.
>
>> This will crash guest(rather than the vhost, do you agree?).
> I guess we cannot assume how the freed memory is used.
> In some cases, a new APP still works, but vhost backend can access
> inconsistency vring structure.
> In the case vhost backend could receive illegal packets.
> For example, avail_idx value might be changed to be 0xFFFF by a new APP=
=2E
> (I am not sure RX/TX functions can handle such a value correctly)
>
> Anyway, my point is if we can finalize vhost backend correctly, we only=

> need to take care of crashing case.
> (If so, it's very nice :))
> So let's make sure whether virtio APP can wait for finalization, or not=
=2E
> I am thinking how to do it now.
>

I added sleep() like below.

--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -300,7 +300,10 @@ static void virtio_ioport_write(void *opaque,
uint32_t addr, uint32_t val)
             virtio_pci_stop_ioeventfd(proxy);
         }
=20
         virtio_set_status(vdev, val & 0xFF);
+        if (val =3D=3D 0)
+            sleep(10);
=20
         if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
             virtio_pci_start_ioeventfd(proxy);

When I type 'dpdk_nic_bind.py' to cause GET_VRING_BASE, this command
takes 10 seconds to be finished.
So we can assume that virtio APP is able to wait for finalization of
vhost backend.

Thanks,
Tetsuya

>> If you mean this issue, I think we have no solution but one walk aroun=
d: keep the huge page files of crashed app, and=20
>> bind virtio to igb_uio and then delete the huge page files.
> Yes I agree.
> If the virtio APP is crashed, this will be a solution.
>
> Thanks,
> Tetsuya
>
>> In our implementation, when vhost sends the message,  we will call the=
 destroy_device provided by the vSwitch to ask the
>> vSwitch to stop processing the vring, but this willn't solve the issue=
 I mention above, because the virtio APP in guest will n't=20
>> wait us.
>>
>> Could you explain a bit more? Is it the same issue?
>>
>>
>> -huawei
>>
>>
>>
>