From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id 1D0C92B91 for ; Wed, 13 Jul 2016 09:34:11 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout1.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OA800G7LT0XAT10@mailout1.w1.samsung.com> for dev@dpdk.org; Wed, 13 Jul 2016 08:34:09 +0100 (BST) X-AuditID: cbfec7f5-f792a6d000001302-68-5785eef0de27 Received: from eusync3.samsung.com ( [203.254.199.213]) by eucpsbgm2.samsung.com (EUCPMTA) with SMTP id 20.34.04866.0FEE5875; Wed, 13 Jul 2016 08:34:08 +0100 (BST) Received: from [106.109.129.180] by eusync3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OA800FJET0VS280@eusync3.samsung.com>; Wed, 13 Jul 2016 08:34:08 +0100 (BST) To: Yuanhan Liu References: <1463748604-27251-1-git-send-email-i.maximets@samsung.com> <20160701073506.GQ2831@yliu-dev.sh.intel.com> <577CE930.2070007@samsung.com> <20160706122446.GO26521@yliu-dev.sh.intel.com> <577F9328.1030901@samsung.com> <20160710131731.GS26521@yliu-dev.sh.intel.com> <20160711083825.GY26521@yliu-dev.sh.intel.com> <57836BE0.2070401@samsung.com> <20160711110503.GZ26521@yliu-dev.sh.intel.com> <5783876C.1050103@samsung.com> <20160712024305.GB26521@yliu-dev.sh.intel.com> <578485CC.8070809@samsung.com> Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Jianfeng Tan , Stephen Hemminger , Thomas Monjalon From: Ilya Maximets Message-id: <5785EEEF.3080400@samsung.com> Date: Wed, 13 Jul 2016 10:34:07 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-version: 1.0 In-reply-to: <578485CC.8070809@samsung.com> Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrBLMWRmVeSWpSXmKPExsVy+t/xq7of3rWGG/Tds7Z492k7k8W0z7fZ LdpnnmWyuNL+k92ie/YXNovJs6UsFt+Rs/iyaTqbxfUJF1gdOD0u9t9h9Pi1YCmrx+I9L5k8 5p0M9Og5OY/Jo2/LKsYAtigum5TUnMyy1CJ9uwSujFeT3rAXdCtX/Ph7iL2BcaFMFyMnh4SA icSvB1PYIWwxiQv31rN1MXJxCAksZZSYM+kRM4TzglFi76+FYFXCAn4SazdMYgSxRQR0JZ7O WccKYgsJtLFIvHnoCtLALNDDJPGypZUFJMEmoCNxavURsAZeAS2J+buPgDWwCKhKvHx4lhnE FhWIkJi1/QcTRI2gxI/J98B6OQW0Je42XANazAE0VE/i/kUtkDCzgLzE5jVvmScwCsxC0jEL oWoWkqoFjMyrGEVTS5MLipPSc430ihNzi0vz0vWS83M3MULC/+sOxqXHrA4xCnAwKvHwrhBs DRdiTSwrrsw9xCjBwawkwiv9GijEm5JYWZValB9fVJqTWnyIUZqDRUmcd+au9yFCAumJJanZ qakFqUUwWSYOTqkGRkfvc3vzlp2vuxnRv2b67vfeH5/s7GftfX+hhC+v/WH81ytl+xSv5sre ErPeszh4T2BpyhPGlrqcqhPrjjY6/mqqkjtYGae9UPNTvL/o87Wuk+NuRCU7s34vFNoiIK1w 4zKfK+/LD+c37FRWr38eHcrqqle7d1fyDZc5TOndj/1Coza55Jk5KLEUZyQaajEXFScCANhX WNh7AgAA Subject: Re: [dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2016 07:34:11 -0000 On 12.07.2016 08:53, Ilya Maximets wrote: > On 12.07.2016 05:43, Yuanhan Liu wrote: >> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: >>> On 11.07.2016 14:05, Yuanhan Liu wrote: >>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: >>>>> On 11.07.2016 11:38, Yuanhan Liu wrote: >>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>>>>>>> >>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>>>>>>> rxq=2. >>>>>>> >>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering >>>>>>> this issue? >>>>> >>>>> fbfd99551ca3 ("mbuf: add raw allocation function") >>>>> >>>>>> >>>>>> I guess I have understood what goes wrong in you case. >>>>>> >>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs, >>>>>> including one Tx and Rx queue; below usage is the same) configured, >>>>>> so does to your QEMU. However, you just enabled 1 queue while starting >>>>>> testpmd inside the guest, and you want to enable 2 queues by running >>>>>> following testpmd commands: >>>>>> >>>>>> stop >>>>>> port stop all >>>>>> port config all rxq 2 >>>>>> port config all txq 2 >>>>>> port start all >>>>>> >>>>>> Badly, that won't work for current virtio PMD implementation, and what's >>>>>> worse, it triggers a vhost crash, the one you saw. >>>>>> >>>>>> Here is how it comes. Since you just enabled 1 queue while starting >>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** >>>>>> information will be sent to vhost. You might see SET_VRING_ADDR >>>>>> (and related vhost messages) for the other queue as well, but they >>>>>> are just the dummy messages: they don't include any valid/real >>>>>> information about the 2nd queue: the driver don't setup it after all. >>>>>> >>>>>> So far, so good. It became broken when you run above commands. Those >>>>>> commands do setup for the 2nd queue, however, they failed to trigger >>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning >>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost >>>>>> untold and not updated. >>>>>> >>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE >>>>>> messages, to enable all the vrings. And since the vrings for the 2nd >>>>>> queue are not properly configured, the crash happens. >>>>> >>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case? >>>> >>>> Hmm, really? You are sure that data flows in your 2nd queue after those >>>> commands? From what I know is that your patch just avoid a crash, but >>>> does not fix it. >>> >>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. >> >> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: >> workaround stale vring base"), accidentally. That's why I asked you >> above the HEAD commit you were using. > > Thanks for pointing this. I'll check it. I've checked my DPDK HEAD with above commit backported. Yes, it helps to avoid vhost crash in my scenario. As expected, after reconfiguration new virtqueue doesn't work, QEMU hangs sometimes. >>>>>> So maybe we should do virtio reset on port start? >>>>> >>>>> I guess it was removed by this patch: >>>>> a85786dc816f ("virtio: fix states handling during initialization"). >>>> >>>> Seems yes. >> >> Actually, we should not do that: do reset on port start. The right fix >> should be allocating MAX queues virtio device supports (2 here). This >> would allow us changing the queue number dynamically. > > Yes, I agree that this is the right way to fix this issue. > >> But this doesn't sound a simple fix; it involves many code changes, due >> to it was not designed this way before. Therefore, we will not fix it >> in this release, due to it's too late. Let's fix it in the next release >> instead. For the crash issue, it will not happen with the latest HEAD. >> Though it's accident fix, I think we are fine here. This scenario fixed somehow, I agree. But this patch still needed to protect vhost from untrusted VM, from malicious or buggy virtio application. Maybe we could change the commit-message and resend this patch as a security enhancement? What do you think?