From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2FF73A0540 for ; Mon, 27 Jul 2020 19:53:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DF7921C0B5; Mon, 27 Jul 2020 19:53:08 +0200 (CEST) Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by dpdk.org (Postfix) with ESMTP id D3A8B1BFE2 for ; Mon, 27 Jul 2020 19:53:05 +0200 (CEST) Received: by mail-pj1-f65.google.com with SMTP id il6so4197255pjb.0 for ; Mon, 27 Jul 2020 10:53:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZjpoOXKM2o8bNGq+gbeLagBfwpa+gePDm43SWhiGfQs=; b=BNeblxMTPqKOZKAXboHUiIx0w1j0871dlY0GzNgBEGb1LoKniTanRCTpCeLYU0oyuk 6olTTM2YMBcUOVGm8g/807g9Y4cMXM73phshHQHNNYo/HygxtO1tuDdNBenckS5GBidT SRX+lZwfFkFwe1yy6FIJ9QtonFIdLmFz/2/8P5dXl5W2FWczEMmsD01H7qoFDZSq2Ig3 Y+2WfSFUZJJovAQ3GrUEcw9umhihjPuHpaRNEUA8DKrHWyyNvUML5yDMP6OCJEHiOmkJ TIl16ApbyW+MNxroZX5AexsGRtof7+vlYRpY3ii+jMfcSS8zQbzx5Akdz9HMy/mBCyEc uGcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZjpoOXKM2o8bNGq+gbeLagBfwpa+gePDm43SWhiGfQs=; b=caN1NRQN9CULX6mlJkj7DESA+DKFImAlSeOaGcYI2xQvhhm++6qlJtuyKAIou6RBYf 4Lj0fAH8eR2xjB+lu3ogz9301UBrmpOxKp5vfPbVcgid53aIZlKH2DJGJO/4G5lFLGOg CN+WaNWv6XllxbALLLhGtHjcwfKdCUhCMROQPg4rPUU/R9rLpSXuPjK9ASONBshmcSbB 1Wwqsw9f+AXBXN6sIwCMepREhPkO03IRSSvHL0vxhSkbvUR97kHk68uDBXaznslWQLbG kEeO/G9yrq8DjuN4M00PclazGM7Dt8fqDKoXlI+rRgu/ko4UlgKiovfnp0XYKkTQ6RFG HqkQ== X-Gm-Message-State: AOAM533CiyptWd+utpNlwU7DaLM/3ckQBvqVVAG5dC7hn0Ia6IvVc4Ea 8T+geCV+1k2t4XRFPOVuKNZNuCmI1iNTjQ== X-Google-Smtp-Source: ABdhPJye8DY2eyUXKspGTlAqH9uTCPmNxD96OmJYUNaKTil/brxVVVZPKVbQlLIp7daefoqVeoLvaA== X-Received: by 2002:a17:90a:6509:: with SMTP id i9mr446540pjj.104.1595872384878; Mon, 27 Jul 2020 10:53:04 -0700 (PDT) Received: from hermes.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id g4sm15918344pgn.64.2020.07.27.10.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jul 2020 10:53:04 -0700 (PDT) Date: Mon, 27 Jul 2020 10:52:55 -0700 From: Stephen Hemminger To: Ferruh Yigit Cc: Thomas Monjalon , dev@dpdk.org, stable@dpdk.org Message-ID: <20200727105255.74981391@hermes.lan> In-Reply-To: References: <20191222175551.17684-1-stephen@networkplumber.org> <3101970.h16uAIiOU7@xps> <20200505171454.00274f10@hermes.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" On Mon, 27 Jul 2020 18:33:08 +0100 Ferruh Yigit wrote: > On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > > On Wed, 18 Mar 2020 16:17:57 +0100 > > Thomas Monjalon wrote: > > > >> 17/01/2020 17:43, Ferruh Yigit: > >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: > >>>> This fixes a deadlock when using KNI with bifurcated drivers. > >>>> Bringing kni device up always times out when using Mellanox > >>>> devices. > >>>> > >>>> The kernel KNI driver sends message to userspace to complete > >>>> the request. For the case of bifurcated driver, this may involve > >>>> an additional request to kernel to change state. This request > >>>> would deadlock because KNI was holding the RTNL mutex. > >>>> > >>>> This was a bad design which goes back to the original code. > >>>> A workaround is for KNI driver to drop RTNL while waiting. > >>>> To prevent the device from disappearing while the operation > >>>> is in progress, it needs to hold reference to network device > >>>> while waiting. > >>>> > >>>> As an added benefit, an useless error check can also be removed. > >>>> > >>>> Fixes: 3fc5ca2f6352 ("kni: initial import") > >>>> Cc: stable@dpdk.org > >>>> Signed-off-by: Stephen Hemminger > >>>> --- > >>> > >>> This patch cause a hang on my server, not sure what exactly was the problem but > >>> kernel log was continuously printing "Cannot send to req_q". Will dig more. > >> > >> Ferruh, did you have a chance to check what is hanging? > >> Stephen, is there any news on your side? > >> > >> > > > > It did not hang when I tested it. The bug report is still open > > > > Sorry for the delay, since I am working remotely I was worried about loosing the > connection to my server, finally I did create a virtual environment to test again. > > I confirm the hang observed %100 when two different process updates the kni > interface, like two different process sets the mtu. Without this patch this > works fine. > > I understand the motivation of the patch, but with change there is a possibility > to hang the server, which we can't allow, need to find another way. Can updating > mlx interface wait KNI interface operation to complete? Still KNI driver is broken. Calling userspace with RTNL held is fundamentally broken design. If KNI were to be incorporated in upstream kernel, then the netdev developer would see this. What ever solution you think is best. I will continue to recommend against anyone using KNI.