From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D2083A052B for ; Tue, 28 Jul 2020 10:56:38 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8BAA62B8D; Tue, 28 Jul 2020 10:56:38 +0200 (CEST) Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by dpdk.org (Postfix) with ESMTP id 2C1912B8D for ; Tue, 28 Jul 2020 10:56:37 +0200 (CEST) Received: by mail-wm1-f67.google.com with SMTP id f18so17388877wml.3 for ; Tue, 28 Jul 2020 01:56:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nfware-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tcTNKCwH7QXyh90yYkc1LrZ5fropss9HJXVjZr2/c5o=; b=TDmmQpOHuQ7XRT6Wq8olXeOaAsKWlFacc5U7ieDUn+rdlc8tslzRX87s3lsYxwd4BI 28aJJ9xg8euJzyJaicQBHwcYyEKWut7ajfg9YvNFugRI48V5EyWiSRf6lN/VROJ+n62w gWYKEVJeD66OSmSInH50r5f+oze/XoJmku04GE6TGCimzr3Im/ek/PUMvY7HioltuTpu nbqGoQ+1V1kqtahT3fs9a9xztM6LsOrpmMCGEUSr1yo/V4vlKQ3SViT6hJJr0pKXhzdw /oNJyk9RWxCrcpkv8cTiGpOu0WBZnH13lyO3rZhWFBurl5gOteR4zqMGiPyQGcpiF8U8 NcFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tcTNKCwH7QXyh90yYkc1LrZ5fropss9HJXVjZr2/c5o=; b=MFD2rNAKNSU4JlCXEadey2vRkzAkpbdS3GKWgF1DsFEin3LqaLoUQEp5N4cBTTOaIM nnw+w6awbI5k3QgJZwghT+NJjCduVjV5BqGG+xXY1gBOX/cgaLovTvxaF5VgS40ezCX1 nqYWigMcAe+h5JUItqX5+lME6FhV2GXyrYlXUEnN87xoqOakMBX5WN+q0xPDcbfuclN7 GbO6dUGrE4a6FTT6N3N9KCutCWBzZOjIDdyrsquOgiIYpjdSPbQqQ4n3bqNj8eCxSUIg TJ2kGYmZyPCVPOuS61Tvf6NY7WdAgaryU2W4lwdbTinecEB3TaAuPl4XclPAyB4qCnfX jgcA== X-Gm-Message-State: AOAM532ktFFU15gIyiRhAmezRMUMhFg1N2lu4jjR+r/ScPgwH3zqCJRb igUMJmk2qfgINT4R04TAwTJreVRqWqzrmxTwcepkLA== X-Google-Smtp-Source: ABdhPJzVaCeL4YZPf1YQSU/0hSh4d7Pp/be98vBMKEmQ962a1wl6e380mTpGcUTe4VafKHwahbfJzFbNnyFDQRl06U8= X-Received: by 2002:a05:600c:514:: with SMTP id i20mr3086241wmc.102.1595926597519; Tue, 28 Jul 2020 01:56:37 -0700 (PDT) MIME-Version: 1.0 References: <20191222175551.17684-1-stephen@networkplumber.org> <3101970.h16uAIiOU7@xps> <20200505171454.00274f10@hermes.lan> <20200727105255.74981391@hermes.lan> In-Reply-To: <20200727105255.74981391@hermes.lan> From: Igor Ryzhov Date: Tue, 28 Jul 2020 11:56:26 +0300 Message-ID: To: Stephen Hemminger Cc: Ferruh Yigit , Thomas Monjalon , dev , dpdk stable Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, Didn't see this patch previously, but we came up with the same idea internally and also faced a hang during the application shutdown. We didn't dig deep, but it occurred in kni_release function. Igor On Mon, Jul 27, 2020 at 8:53 PM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Mon, 27 Jul 2020 18:33:08 +0100 > Ferruh Yigit wrote: > > > On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > > > On Wed, 18 Mar 2020 16:17:57 +0100 > > > Thomas Monjalon wrote: > > > > > >> 17/01/2020 17:43, Ferruh Yigit: > > >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: > > >>>> This fixes a deadlock when using KNI with bifurcated drivers. > > >>>> Bringing kni device up always times out when using Mellanox > > >>>> devices. > > >>>> > > >>>> The kernel KNI driver sends message to userspace to complete > > >>>> the request. For the case of bifurcated driver, this may involve > > >>>> an additional request to kernel to change state. This request > > >>>> would deadlock because KNI was holding the RTNL mutex. > > >>>> > > >>>> This was a bad design which goes back to the original code. > > >>>> A workaround is for KNI driver to drop RTNL while waiting. > > >>>> To prevent the device from disappearing while the operation > > >>>> is in progress, it needs to hold reference to network device > > >>>> while waiting. > > >>>> > > >>>> As an added benefit, an useless error check can also be removed. > > >>>> > > >>>> Fixes: 3fc5ca2f6352 ("kni: initial import") > > >>>> Cc: stable@dpdk.org > > >>>> Signed-off-by: Stephen Hemminger > > >>>> --- > > >>> > > >>> This patch cause a hang on my server, not sure what exactly was the > problem but > > >>> kernel log was continuously printing "Cannot send to req_q". Will > dig more. > > >> > > >> Ferruh, did you have a chance to check what is hanging? > > >> Stephen, is there any news on your side? > > >> > > >> > > > > > > It did not hang when I tested it. The bug report is still open > > > > > > > Sorry for the delay, since I am working remotely I was worried about > loosing the > > connection to my server, finally I did create a virtual environment to > test again. > > > > I confirm the hang observed %100 when two different process updates the > kni > > interface, like two different process sets the mtu. Without this patch > this > > works fine. > > > > I understand the motivation of the patch, but with change there is a > possibility > > to hang the server, which we can't allow, need to find another way. Can > updating > > mlx interface wait KNI interface operation to complete? > > Still KNI driver is broken. Calling userspace with RTNL held is > fundamentally > broken design. If KNI were to be incorporated in upstream kernel, then the > netdev > developer would see this. > > What ever solution you think is best. > I will continue to recommend against anyone using KNI. >