From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [119.145.14.66]) by dpdk.org (Postfix) with ESMTP id DBA029AC9 for ; Tue, 3 Feb 2015 13:49:26 +0100 (CET) Received: from 172.24.2.119 (EHLO szxeml427-hub.china.huawei.com) ([172.24.2.119]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BBI69723; Tue, 03 Feb 2015 20:49:23 +0800 (CST) Received: from [127.0.0.1] (10.177.19.115) by szxeml427-hub.china.huawei.com (10.82.67.182) with Microsoft SMTP Server id 14.3.158.1; Tue, 3 Feb 2015 20:47:47 +0800 Message-ID: <54D0C3AB.2060304@huawei.com> Date: Tue, 3 Feb 2015 20:48:43 +0800 From: Linhaifeng User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: "Xie, Huawei" , "dpd >> dev@dpdk.org" , "ms >> Michael S. Tsirkin" References: <54C744D9.3060900@huawei.com> In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.115] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.54D0C3D3.0216, ss=1, re=0.001, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: af9b4838fbae17a428e67dd5a888d7fe Cc: "liuyongan@huawei.com" Subject: Re: [dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2015 12:49:28 -0000 On 2015/1/28 17:51, Xie, Huawei wrote: > > >> -----Original Message----- >> From: Linhaifeng [mailto:haifeng.lin@huawei.com] >> Sent: Tuesday, January 27, 2015 3:57 PM >> To: dpd >> dev@dpdk.org; ms >> Michael S. Tsirkin >> Cc: lilijun; liuyongan@huawei.com; Xie, Huawei >> Subject: vhost: virtio-net rx-ring stop work after work many hours,bug? >> >> Hi,all >> >> I use vhost-user to send data to VM at first it cant work well but after many >> hours VM can not receive data but can send data. >> >> (gdb)p avail_idx >> $4 = 2668 >> (gdb)p free_entries >> $5 = 0 >> (gdb)l >> /* check that we have enough buffers */ >> if (unlikely(count > free_entries)) >> count = free_entries; >> >> if (count == 0){ >> int b=0; >> if(b) { // when set b=1 to notify guest rx_ring will restart to work >> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) { >> >> eventfd_write(vq->callfd, 1); >> } >> } >> return 0; >> } >> >> some info i print in guest: >> >> net eth3:vi->num=199 >> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 >> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644 >> >> net eth3:vi->num=199 >> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 >> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645 >> >> net eth3:vi->num=199 >> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 >> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646 >> >> # free >> total used free shared buffers cached >> Mem: 3924100 337252 3586848 0 95984 138060 >> -/+ buffers/cache: 103208 3820892 >> Swap: 970748 0 970748 >> >> I have two questions: >> 1.Should we need to notify guest when there is no buffer in vq->avail? >> 2.Why virtio_net stop to fill avail? >> >> > > Haifeng: > Thanks for reporting this issue. > It might not be vhost-user specific, because as long vhost-user has received all the vring information correctly, it shares the same code receiving/transmitting packets with vhost-cuse. > Are you using latest patch or the old patch? Xie: Sorry, I saw this mail until now. I use the old code not latest patch.The lastest patch is ok because it will notify guest after copy each pkt when merge-able.(May be is not OK when you close the merge-able feature) > 1 Do you disable merge-able feature support in vhost example? There is an bug in vhost-user feature negotiation which is fixed in latest patch. It could cause guest not receive packets at all. So if you are testing only using linux net device, this isn't the cause. Yes, i disabled it. > 2.Do you still have the spot? Could you check if there are available descriptors from checking the desc ring or even dump the vring status? Check the notify_on_empty flag Michael mentioned? I find a bug in vhost library when processing three or more chained descriptors. But if you never re-configure eth0 with different features, this isn't the cause. > 3. Is this reproduce-able? Next time if you run long hours stability test, could you try to disable guest virtio feature? > -device virtio-net-pci,netdev=mynet0,mac=54:00:00:54:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off > > I have run more than ten hours' nightly test many times before, and haven't met this issue. > We will check * if there is issue in the vhost code delivering interrupts to guest which cause potential deadlock *if there are places we should but miss delivering interrupts to guest. > >> >> >> >> >> -- >> Regards, >> Haifeng > -- Regards, Haifeng