From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout3.w1.samsung.com (mailout3.w1.samsung.com [210.118.77.13]) by dpdk.org (Postfix) with ESMTP id 9FD8D374F for ; Tue, 12 Jul 2016 07:53:19 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0OA6009GXTOT0TA0@mailout3.w1.samsung.com> for dev@dpdk.org; Tue, 12 Jul 2016 06:53:17 +0100 (BST) X-AuditID: cbfec7f5-f792a6d000001302-cc-578485cd57d9 Received: from eusync3.samsung.com ( [203.254.199.213]) by eucpsbgm2.samsung.com (EUCPMTA) with SMTP id 31.4E.04866.DC584875; Tue, 12 Jul 2016 06:53:17 +0100 (BST) Received: from [106.109.129.180] by eusync3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0OA6001APTOSQJ20@eusync3.samsung.com>; Tue, 12 Jul 2016 06:53:17 +0100 (BST) To: Yuanhan Liu References: <1463748604-27251-1-git-send-email-i.maximets@samsung.com> <20160701073506.GQ2831@yliu-dev.sh.intel.com> <577CE930.2070007@samsung.com> <20160706122446.GO26521@yliu-dev.sh.intel.com> <577F9328.1030901@samsung.com> <20160710131731.GS26521@yliu-dev.sh.intel.com> <20160711083825.GY26521@yliu-dev.sh.intel.com> <57836BE0.2070401@samsung.com> <20160711110503.GZ26521@yliu-dev.sh.intel.com> <5783876C.1050103@samsung.com> <20160712024305.GB26521@yliu-dev.sh.intel.com> Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Jianfeng Tan , Stephen Hemminger , Thomas Monjalon From: Ilya Maximets Message-id: <578485CC.8070809@samsung.com> Date: Tue, 12 Jul 2016 08:53:16 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-version: 1.0 In-reply-to: <20160712024305.GB26521@yliu-dev.sh.intel.com> Content-type: text/plain; charset=windows-1252 Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrJLMWRmVeSWpSXmKPExsVy+t/xq7pnW1vCDd5sVLF492k7k8W0z7fZ LdpnnmWyuNL+k92ie/YXNovJs6UsFt+Rs/iyaTqbxfUJF1gdOD0u9t9h9Pi1YCmrx+I9L5k8 5p0M9Og5OY/Jo2/LKsYAtigum5TUnMyy1CJ9uwSujL55W5kLrstV7DrfxNjAuEeii5GTQ0LA ROLDnS/sELaYxIV769m6GLk4hASWMkpMnDOZHcJ5wSjR9OAPM0iVsICfxNoNkxhBbBEBXYmn c9axQhQ9Z5Z42HsGrJ1ZoIdJ4mVLKwtIFZuAjsSp1UfAOngFtCQ+L7kPto9FQFViysQesBpR gQiJWdt/MEHUCEr8mHwPLM4pYC0x9eVsoM0cQEP1JO5f1AIJMwvIS2xe85Z5AqPALCQdsxCq ZiGpWsDIvIpRNLU0uaA4KT3XSK84Mbe4NC9dLzk/dxMjJAK+7mBceszqEKMAB6MSD2/HyeZw IdbEsuLK3EOMEhzMSiK8scD4EeJNSaysSi3Kjy8qzUktPsQozcGiJM47c9f7ECGB9MSS1OzU 1ILUIpgsEwenVAOjxD95ZkZvjXdVVWK7K+XWN70Sysk9ZtSgc65dNuSAd7bF2z1zf67brGkx 47Pabq0FlZcypDUM91gW+n69pr0sT/xijb3BqV2PpwrsNzmYYfC9cVr+8qMf9xzl43j8Ouem Nb/qlzKFjGlblWQDrOSXcW6b0/P1kN/jz/J24i0rxc45tC70WJuhxFKckWioxVxUnAgAUAQn lXwCAAA= Subject: Re: [dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2016 05:53:19 -0000 On 12.07.2016 05:43, Yuanhan Liu wrote: > On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: >> On 11.07.2016 14:05, Yuanhan Liu wrote: >>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: >>>> On 11.07.2016 11:38, Yuanhan Liu wrote: >>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>>>>>> >>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>>>>>> rxq=2. >>>>>> >>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering >>>>>> this issue? >>>> >>>> fbfd99551ca3 ("mbuf: add raw allocation function") >>>> >>>>> >>>>> I guess I have understood what goes wrong in you case. >>>>> >>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs, >>>>> including one Tx and Rx queue; below usage is the same) configured, >>>>> so does to your QEMU. However, you just enabled 1 queue while starting >>>>> testpmd inside the guest, and you want to enable 2 queues by running >>>>> following testpmd commands: >>>>> >>>>> stop >>>>> port stop all >>>>> port config all rxq 2 >>>>> port config all txq 2 >>>>> port start all >>>>> >>>>> Badly, that won't work for current virtio PMD implementation, and what's >>>>> worse, it triggers a vhost crash, the one you saw. >>>>> >>>>> Here is how it comes. Since you just enabled 1 queue while starting >>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** >>>>> information will be sent to vhost. You might see SET_VRING_ADDR >>>>> (and related vhost messages) for the other queue as well, but they >>>>> are just the dummy messages: they don't include any valid/real >>>>> information about the 2nd queue: the driver don't setup it after all. >>>>> >>>>> So far, so good. It became broken when you run above commands. Those >>>>> commands do setup for the 2nd queue, however, they failed to trigger >>>>> the QEMU virtio device to start the vhost-user negotiation, meaning >>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost >>>>> untold and not updated. >>>>> >>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE >>>>> messages, to enable all the vrings. And since the vrings for the 2nd >>>>> queue are not properly configured, the crash happens. >>>> >>>> Hmm, why 2nd queue works properly with my fix to vhost in this case? >>> >>> Hmm, really? You are sure that data flows in your 2nd queue after those >>> commands? From what I know is that your patch just avoid a crash, but >>> does not fix it. >> >> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. > > The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: > workaround stale vring base"), accidentally. That's why I asked you > above the HEAD commit you were using. Thanks for pointing this. I'll check it. >>>>> So maybe we should do virtio reset on port start? >>>> >>>> I guess it was removed by this patch: >>>> a85786dc816f ("virtio: fix states handling during initialization"). >>> >>> Seems yes. > > Actually, we should not do that: do reset on port start. The right fix > should be allocating MAX queues virtio device supports (2 here). This > would allow us changing the queue number dynamically. Yes, I agree that this is the right way to fix this issue. > But this doesn't sound a simple fix; it involves many code changes, due > to it was not designed this way before. Therefore, we will not fix it > in this release, due to it's too late. Let's fix it in the next release > instead. For the crash issue, it will not happen with the latest HEAD. > Though it's accident fix, I think we are fine here. > > --yliu > >