From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f180.google.com (mail-wr0-f180.google.com [209.85.128.180]) by dpdk.org (Postfix) with ESMTP id 7B129126B for ; Mon, 5 Mar 2018 16:54:52 +0100 (CET) Received: by mail-wr0-f180.google.com with SMTP id o76so17794899wrb.7 for ; Mon, 05 Mar 2018 07:54:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:content-transfer-encoding:mime-version; bh=xaZOZAxzFGnmibGYRBVAuA/gG2wYmesScccZEe1kfbw=; b=ngKCHViv2/h0wAm0/Sr/G/lF9ld5oH4GfvS9DWzsrhydTYpCK8ryXAm1NdaXIQzUQU okAbb6gMa3PIDjcyXNK/0q7vwggpiaFOH4TsXXRuoEe7al13vSlN7zXvcy8aL8GOa5ef 25aAsX8fmOzdOzCprQ+ecZ8UGr10qWiFw3A/q8Ig/O7a//BKSMIenQs5drODW2Nmwm00 k+Ev09N9rm4fpKBPMsV3zF0QjDk+bn7AgtqlUliSzd8798eZAr8jOpgZFt8dXKo9XsYR r7QR3LY71y56A5bUogh1fFA6M5CGdb5zegr/4GdHA4RtbkmLMiDz9Lh6JAZyW4eA7wIU DPDw== X-Gm-Message-State: APf1xPDo3thT1t9e0WhlhfUIELu3j4T+IlCNgs89RXZ/ccFjJc+KrKBc vPZfR7xs3hZqrMqMqzilwJE= X-Google-Smtp-Source: AG47ELvEvaMZAzyaKA5ZeHr6tKCPFa7GQ7hVxKwp72NS2FzWOm9FRiFlY6xob4GVMenY1QyVOKCP4Q== X-Received: by 10.223.177.132 with SMTP id q4mr12614953wra.27.1520265291790; Mon, 05 Mar 2018 07:54:51 -0800 (PST) Received: from localhost ([2a00:23c5:be85:1400:6930:196b:56ac:33db]) by smtp.gmail.com with ESMTPSA id v17sm11633512wrd.5.2018.03.05.07.54.50 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 05 Mar 2018 07:54:51 -0800 (PST) Message-ID: <1520265290.27712.8.camel@debian.org> From: Luca Boccassi To: Maxime Coquelin , stable@dpdk.org, Yuanhan Liu Cc: ktraynor@redhat.com Date: Mon, 05 Mar 2018 15:54:50 +0000 In-Reply-To: <024f758d-97c8-3f2c-ff11-b17d462ee899@redhat.com> References: <20180302171042.26094-1-maxime.coquelin@redhat.com> <1520011690.22753.86.camel@debian.org> <3106e381-ebd5-d628-fbc7-23a47f523711@redhat.com> <97270ccb-e40c-f141-0e80-33ed8ff2091f@redhat.com> <1520255133.27712.6.camel@debian.org> <024f758d-97c8-3f2c-ff11-b17d462ee899@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Subject: Re: [dpdk-stable] [PATCH v16.11 LTS] vhost: protect active rings from async ring changes X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2018 15:54:52 -0000 On Mon, 2018-03-05 at 15:25 +0100, Maxime Coquelin wrote: >=20 > On 03/05/2018 02:05 PM, Luca Boccassi wrote: > > On Mon, 2018-03-05 at 13:34 +0100, Maxime Coquelin wrote: > > > With up-to-date Yuanhan address. > > >=20 > > > On 03/05/2018 01:32 PM, Maxime Coquelin wrote: > > > >=20 > > > >=20 > > > > On 03/02/2018 06:28 PM, Luca Boccassi wrote: > > > > > On Fri, 2018-03-02 at 18:10 +0100, Maxime Coquelin wrote: > > > > > > From: Victor Kaplansky > > > > > >=20 > > > > > > [ backported from upstream commit > > > > > > a3688046995f88c518fa27c45b39ae389260b18d ] > > > > > >=20 > > > > > > When performing live migration or memory hot-plugging, > > > > > > the changes to the device and vrings made by message > > > > > > handler > > > > > > done independently from vring usage by PMD threads. > > > > > >=20 > > > > > > This causes for example segfaults during live-migration > > > > > > with MQ enable, but in general virtually any request > > > > > > sent by qemu changing the state of device can cause > > > > > > problems. > > > > > >=20 > > > > > > These patches fixes all above issues by adding a spinlock > > > > > > to every vring and requiring message handler to start > > > > > > operation > > > > > > only after ensuring that all PMD threads related to the > > > > > > device > > > > > > are out of critical section accessing the vring data. > > > > > >=20 > > > > > > Each vring has its own lock in order to not create > > > > > > contention > > > > > > between PMD threads of different vrings and to prevent > > > > > > performance degradation by scaling queue pair number. > > > > > >=20 > > > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=3D1450680 > > > > > >=20 > > > > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Victor Kaplansky > > > > > > Reviewed-by: Maxime Coquelin > > > > > > Acked-by: Yuanhan Liu > > > > > >=20 > > > > > > Backport conflicts: > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0lib/librte_vhost/vhost.c > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0lib/librte_vhost/vhost.h > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0lib/librte_vhost/vhost_user.c > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0lib/librte_vhost/virtio_net.c > > > > > >=20 > > > > > > Signed-off-by: Maxime Coquelin > > > > > > --- > > > > > >=20 > > > > > > Hi Luca, All, > > > > > >=20 > > > > > > This is the v16.11 backport for Victor's patch already > > > > > > available in > > > > > > master and v17.11 LTS. It needed some rework to be applied > > > > > > to > > > > > > v16.11. > > > > >=20 > > > > > Thank you, applied and pushed to dpdk-stable/16.11. > > > > >=20 > > > >=20 > > > > Thanks Luca, > > > >=20 > > > > There is another patch that would be applied on top of it, as > > > > Victor's > > > > patch introduce a regression with Virtio-user. I see it is > > > > neither > > > > in > > > > 16.11 nor 17.11 LTS: > > > >=20 > > > > commit 9fce5d0b401fc2c13a860bbbfdebcf85080334e1 > > > > Author: Maxime Coquelin > > > > Date:=C2=A0=C2=A0 Mon Feb 12 16:46:12 2018 +0100 > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vhost: do not take lock on owner res= et > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 A deadlock happens when handling VHO= ST_USER_RESET_OWNER > > > > request > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for the same reason the lock is not = taken for > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VHOST_USER_GET_VRING_BASE. > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 It is safe not to take the lock, as = the queues are no > > > > more > > > > used > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 by the application when the virtqueu= es and the device are > > > > reset. > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Fixes: a3688046995f ("vhost: protect= active rings from > > > > async > > > > ring > > > > changes") > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Cc: stable@dpdk.org > > > >=20 > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Signed-off-by: Maxime Coquelin > > > m> > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Reviewed-by: Tiwei Bie > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Reviewed-by: Jianfeng Tan > > > >=20 > > > >=20 > > > > Let me know if you want me to post the backport to stable@dpdk. > > > > org, > > > > or if you can pick it directly from upstream master. > > > >=20 > > > > Cheers, > > > > Maxime > >=20 > > I can take care of that for 16.11 - have you tested it on top of > > the > > current dpdk-stable/16.11 ? We are in the 11th hour so I want to > > make > > sure any new patches that I pick are tested :-) >=20 > I understand! > So I just tried to test the patch with virtio-usr, but it is not > enabled=C2=A0 > in default config, and I didn't made it to work when enabled. >=20 > The issue it solves can be reproduced with old QEMU that sends > _RESET_OWNER request (v2.4 and earlier). With this, I manage to > reproduce the bug, and once patch is applied, the deadlock no more > appear, so: >=20 > Tested-by: Maxime Coquelin >=20 > Thanks! > Maxime Great, thank you, applied & pushed to dpdk-stable/16.11 --=20 Kind regards, Luca Boccassi