From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id DAEBF1DBF for ; Mon, 5 Mar 2018 15:25:11 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 33291150062; Mon, 5 Mar 2018 14:25:11 +0000 (UTC) Received: from [10.36.112.55] (ovpn-112-55.ams2.redhat.com [10.36.112.55]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6971D20A846E; Mon, 5 Mar 2018 14:25:09 +0000 (UTC) To: Luca Boccassi , stable@dpdk.org, Yuanhan Liu Cc: ktraynor@redhat.com References: <20180302171042.26094-1-maxime.coquelin@redhat.com> <1520011690.22753.86.camel@debian.org> <3106e381-ebd5-d628-fbc7-23a47f523711@redhat.com> <97270ccb-e40c-f141-0e80-33ed8ff2091f@redhat.com> <1520255133.27712.6.camel@debian.org> From: Maxime Coquelin Message-ID: <024f758d-97c8-3f2c-ff11-b17d462ee899@redhat.com> Date: Mon, 5 Mar 2018 15:25:07 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <1520255133.27712.6.camel@debian.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 05 Mar 2018 14:25:11 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 05 Mar 2018 14:25:11 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: Re: [dpdk-stable] [PATCH v16.11 LTS] vhost: protect active rings from async ring changes X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2018 14:25:12 -0000 On 03/05/2018 02:05 PM, Luca Boccassi wrote: > On Mon, 2018-03-05 at 13:34 +0100, Maxime Coquelin wrote: >> With up-to-date Yuanhan address. >> >> On 03/05/2018 01:32 PM, Maxime Coquelin wrote: >>> >>> >>> On 03/02/2018 06:28 PM, Luca Boccassi wrote: >>>> On Fri, 2018-03-02 at 18:10 +0100, Maxime Coquelin wrote: >>>>> From: Victor Kaplansky >>>>> >>>>> [ backported from upstream commit >>>>> a3688046995f88c518fa27c45b39ae389260b18d ] >>>>> >>>>> When performing live migration or memory hot-plugging, >>>>> the changes to the device and vrings made by message handler >>>>> done independently from vring usage by PMD threads. >>>>> >>>>> This causes for example segfaults during live-migration >>>>> with MQ enable, but in general virtually any request >>>>> sent by qemu changing the state of device can cause >>>>> problems. >>>>> >>>>> These patches fixes all above issues by adding a spinlock >>>>> to every vring and requiring message handler to start operation >>>>> only after ensuring that all PMD threads related to the device >>>>> are out of critical section accessing the vring data. >>>>> >>>>> Each vring has its own lock in order to not create contention >>>>> between PMD threads of different vrings and to prevent >>>>> performance degradation by scaling queue pair number. >>>>> >>>>> See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 >>>>> >>>>> Cc: stable@dpdk.org >>>>> Signed-off-by: Victor Kaplansky >>>>> Reviewed-by: Maxime Coquelin >>>>> Acked-by: Yuanhan Liu >>>>> >>>>> Backport conflicts: >>>>>     lib/librte_vhost/vhost.c >>>>>     lib/librte_vhost/vhost.h >>>>>     lib/librte_vhost/vhost_user.c >>>>>     lib/librte_vhost/virtio_net.c >>>>> >>>>> Signed-off-by: Maxime Coquelin >>>>> --- >>>>> >>>>> Hi Luca, All, >>>>> >>>>> This is the v16.11 backport for Victor's patch already >>>>> available in >>>>> master and v17.11 LTS. It needed some rework to be applied to >>>>> v16.11. >>>> >>>> Thank you, applied and pushed to dpdk-stable/16.11. >>>> >>> >>> Thanks Luca, >>> >>> There is another patch that would be applied on top of it, as >>> Victor's >>> patch introduce a regression with Virtio-user. I see it is neither >>> in >>> 16.11 nor 17.11 LTS: >>> >>> commit 9fce5d0b401fc2c13a860bbbfdebcf85080334e1 >>> Author: Maxime Coquelin >>> Date:   Mon Feb 12 16:46:12 2018 +0100 >>> >>>      vhost: do not take lock on owner reset >>> >>>      A deadlock happens when handling VHOST_USER_RESET_OWNER >>> request >>>      for the same reason the lock is not taken for >>>      VHOST_USER_GET_VRING_BASE. >>> >>>      It is safe not to take the lock, as the queues are no more >>> used >>>      by the application when the virtqueues and the device are >>> reset. >>> >>>      Fixes: a3688046995f ("vhost: protect active rings from async >>> ring >>> changes") >>>      Cc: stable@dpdk.org >>> >>>      Signed-off-by: Maxime Coquelin >>>      Reviewed-by: Tiwei Bie >>>      Reviewed-by: Jianfeng Tan >>> >>> >>> Let me know if you want me to post the backport to stable@dpdk.org, >>> or if you can pick it directly from upstream master. >>> >>> Cheers, >>> Maxime > > I can take care of that for 16.11 - have you tested it on top of the > current dpdk-stable/16.11 ? We are in the 11th hour so I want to make > sure any new patches that I pick are tested :-) I understand! So I just tried to test the patch with virtio-usr, but it is not enabled in default config, and I didn't made it to work when enabled. The issue it solves can be reproduced with old QEMU that sends _RESET_OWNER request (v2.4 and earlier). With this, I manage to reproduce the bug, and once patch is applied, the deadlock no more appear, so: Tested-by: Maxime Coquelin Thanks! Maxime