From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 79196A052B; Tue, 28 Jul 2020 10:56:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BA7A21C0B2; Tue, 28 Jul 2020 10:56:38 +0200 (CEST) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by dpdk.org (Postfix) with ESMTP id 2EE191BE8A for ; Tue, 28 Jul 2020 10:56:37 +0200 (CEST) Received: by mail-wm1-f68.google.com with SMTP id g10so15080529wmc.1 for ; Tue, 28 Jul 2020 01:56:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nfware-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tcTNKCwH7QXyh90yYkc1LrZ5fropss9HJXVjZr2/c5o=; b=TDmmQpOHuQ7XRT6Wq8olXeOaAsKWlFacc5U7ieDUn+rdlc8tslzRX87s3lsYxwd4BI 28aJJ9xg8euJzyJaicQBHwcYyEKWut7ajfg9YvNFugRI48V5EyWiSRf6lN/VROJ+n62w gWYKEVJeD66OSmSInH50r5f+oze/XoJmku04GE6TGCimzr3Im/ek/PUMvY7HioltuTpu nbqGoQ+1V1kqtahT3fs9a9xztM6LsOrpmMCGEUSr1yo/V4vlKQ3SViT6hJJr0pKXhzdw /oNJyk9RWxCrcpkv8cTiGpOu0WBZnH13lyO3rZhWFBurl5gOteR4zqMGiPyQGcpiF8U8 NcFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tcTNKCwH7QXyh90yYkc1LrZ5fropss9HJXVjZr2/c5o=; b=PuiV+ySHb/jcoa9AgNlUfEQqnZA5u5RB7YMH5+iqdTmeCC6nRRcD9ME3+Y93Sj7QkF 00EFSi6EpQuKdz4ql0Dbp+K+68coBRrIAvojLRKNN9NtSzHNck1YpQx9C/rv4aprThIi AB9HY++ZXh4He61JSg2SToFrXaUZ7OvStFJ7Ode1HfhBQwiJAciRZf4/qN7AFTyP8vNj 1vpejIYi3gtCnuWfHQXOWENgeBAOss0RC6uOLwhho52EszVtDaDOjABUA7dJcWkq0avK 4TB9UlOYr9mkueJMrdspiZslkAEbas7QKJgZQZ85glpcOWjxLNg2142Xj7Pe08o1aymc Ekmg== X-Gm-Message-State: AOAM531j+9mELiNIHhaKCI6IbFTLwiQ7WMa5zpV4rguPT4Tpuluj9UWA Z4CS3QD5nM3ajS7uXj9GjM8vbgpjSoqeRJrcV6dcQw== X-Google-Smtp-Source: ABdhPJzVaCeL4YZPf1YQSU/0hSh4d7Pp/be98vBMKEmQ962a1wl6e380mTpGcUTe4VafKHwahbfJzFbNnyFDQRl06U8= X-Received: by 2002:a05:600c:514:: with SMTP id i20mr3086241wmc.102.1595926597519; Tue, 28 Jul 2020 01:56:37 -0700 (PDT) MIME-Version: 1.0 References: <20191222175551.17684-1-stephen@networkplumber.org> <3101970.h16uAIiOU7@xps> <20200505171454.00274f10@hermes.lan> <20200727105255.74981391@hermes.lan> In-Reply-To: <20200727105255.74981391@hermes.lan> From: Igor Ryzhov Date: Tue, 28 Jul 2020 11:56:26 +0300 Message-ID: To: Stephen Hemminger Cc: Ferruh Yigit , Thomas Monjalon , dev , dpdk stable Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, Didn't see this patch previously, but we came up with the same idea internally and also faced a hang during the application shutdown. We didn't dig deep, but it occurred in kni_release function. Igor On Mon, Jul 27, 2020 at 8:53 PM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Mon, 27 Jul 2020 18:33:08 +0100 > Ferruh Yigit wrote: > > > On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > > > On Wed, 18 Mar 2020 16:17:57 +0100 > > > Thomas Monjalon wrote: > > > > > >> 17/01/2020 17:43, Ferruh Yigit: > > >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: > > >>>> This fixes a deadlock when using KNI with bifurcated drivers. > > >>>> Bringing kni device up always times out when using Mellanox > > >>>> devices. > > >>>> > > >>>> The kernel KNI driver sends message to userspace to complete > > >>>> the request. For the case of bifurcated driver, this may involve > > >>>> an additional request to kernel to change state. This request > > >>>> would deadlock because KNI was holding the RTNL mutex. > > >>>> > > >>>> This was a bad design which goes back to the original code. > > >>>> A workaround is for KNI driver to drop RTNL while waiting. > > >>>> To prevent the device from disappearing while the operation > > >>>> is in progress, it needs to hold reference to network device > > >>>> while waiting. > > >>>> > > >>>> As an added benefit, an useless error check can also be removed. > > >>>> > > >>>> Fixes: 3fc5ca2f6352 ("kni: initial import") > > >>>> Cc: stable@dpdk.org > > >>>> Signed-off-by: Stephen Hemminger > > >>>> --- > > >>> > > >>> This patch cause a hang on my server, not sure what exactly was the > problem but > > >>> kernel log was continuously printing "Cannot send to req_q". Will > dig more. > > >> > > >> Ferruh, did you have a chance to check what is hanging? > > >> Stephen, is there any news on your side? > > >> > > >> > > > > > > It did not hang when I tested it. The bug report is still open > > > > > > > Sorry for the delay, since I am working remotely I was worried about > loosing the > > connection to my server, finally I did create a virtual environment to > test again. > > > > I confirm the hang observed %100 when two different process updates the > kni > > interface, like two different process sets the mtu. Without this patch > this > > works fine. > > > > I understand the motivation of the patch, but with change there is a > possibility > > to hang the server, which we can't allow, need to find another way. Can > updating > > mlx interface wait KNI interface operation to complete? > > Still KNI driver is broken. Calling userspace with RTNL held is > fundamentally > broken design. If KNI were to be incorporated in upstream kernel, then the > netdev > developer would see this. > > What ever solution you think is best. > I will continue to recommend against anyone using KNI. >