From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2B33CA053E; Mon, 27 Jul 2020 19:53:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9B16A1BFE2; Mon, 27 Jul 2020 19:53:07 +0200 (CEST) Received: from mail-pj1-f66.google.com (mail-pj1-f66.google.com [209.85.216.66]) by dpdk.org (Postfix) with ESMTP id CBD8D2C01 for ; Mon, 27 Jul 2020 19:53:05 +0200 (CEST) Received: by mail-pj1-f66.google.com with SMTP id f9so3862152pju.4 for ; Mon, 27 Jul 2020 10:53:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZjpoOXKM2o8bNGq+gbeLagBfwpa+gePDm43SWhiGfQs=; b=BNeblxMTPqKOZKAXboHUiIx0w1j0871dlY0GzNgBEGb1LoKniTanRCTpCeLYU0oyuk 6olTTM2YMBcUOVGm8g/807g9Y4cMXM73phshHQHNNYo/HygxtO1tuDdNBenckS5GBidT SRX+lZwfFkFwe1yy6FIJ9QtonFIdLmFz/2/8P5dXl5W2FWczEMmsD01H7qoFDZSq2Ig3 Y+2WfSFUZJJovAQ3GrUEcw9umhihjPuHpaRNEUA8DKrHWyyNvUML5yDMP6OCJEHiOmkJ TIl16ApbyW+MNxroZX5AexsGRtof7+vlYRpY3ii+jMfcSS8zQbzx5Akdz9HMy/mBCyEc uGcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZjpoOXKM2o8bNGq+gbeLagBfwpa+gePDm43SWhiGfQs=; b=tv3fJEVc0IY5P5E4KRgLP/tHKq/xgtupo3SDzIgPiFsdo2iv7+pwFTGmCEy29miXLE Wn/fsmvJM7eHFJ16d/CFb/Xh3B3WWOEoRgVwJB3/8KLe6cZGiDrsVNMTy/F2AtyAPpd7 uGeVpeqwZhfDHLQDYpuBLd+c7693HQm/tb31diZ+XXI/4TWgfVbpTTfOVMdXcK7csdJO Wne6lQF0+7vPzadiMhAZRqfpapoP+wekcCH4iq42JfH7s0R8hs0sTWWMutuWPiDSxmIN 5dX2NVw1lb+dxu4QY1yd89GHe5Qgr0v8ifmBTqnZL2AMg0cpm7WWDVemDe925IO1mgWe 294A== X-Gm-Message-State: AOAM533W3GarEPiq0iO2OsFmV2fk848BUl5T6Y4zBbjyR8pWAufPdzmk ZhoP4OfkrJIrv8jBbyOEd34GUw== X-Google-Smtp-Source: ABdhPJye8DY2eyUXKspGTlAqH9uTCPmNxD96OmJYUNaKTil/brxVVVZPKVbQlLIp7daefoqVeoLvaA== X-Received: by 2002:a17:90a:6509:: with SMTP id i9mr446540pjj.104.1595872384878; Mon, 27 Jul 2020 10:53:04 -0700 (PDT) Received: from hermes.lan (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id g4sm15918344pgn.64.2020.07.27.10.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jul 2020 10:53:04 -0700 (PDT) Date: Mon, 27 Jul 2020 10:52:55 -0700 From: Stephen Hemminger To: Ferruh Yigit Cc: Thomas Monjalon , dev@dpdk.org, stable@dpdk.org Message-ID: <20200727105255.74981391@hermes.lan> In-Reply-To: References: <20191222175551.17684-1-stephen@networkplumber.org> <3101970.h16uAIiOU7@xps> <20200505171454.00274f10@hermes.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, 27 Jul 2020 18:33:08 +0100 Ferruh Yigit wrote: > On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > > On Wed, 18 Mar 2020 16:17:57 +0100 > > Thomas Monjalon wrote: > > > >> 17/01/2020 17:43, Ferruh Yigit: > >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: > >>>> This fixes a deadlock when using KNI with bifurcated drivers. > >>>> Bringing kni device up always times out when using Mellanox > >>>> devices. > >>>> > >>>> The kernel KNI driver sends message to userspace to complete > >>>> the request. For the case of bifurcated driver, this may involve > >>>> an additional request to kernel to change state. This request > >>>> would deadlock because KNI was holding the RTNL mutex. > >>>> > >>>> This was a bad design which goes back to the original code. > >>>> A workaround is for KNI driver to drop RTNL while waiting. > >>>> To prevent the device from disappearing while the operation > >>>> is in progress, it needs to hold reference to network device > >>>> while waiting. > >>>> > >>>> As an added benefit, an useless error check can also be removed. > >>>> > >>>> Fixes: 3fc5ca2f6352 ("kni: initial import") > >>>> Cc: stable@dpdk.org > >>>> Signed-off-by: Stephen Hemminger > >>>> --- > >>> > >>> This patch cause a hang on my server, not sure what exactly was the problem but > >>> kernel log was continuously printing "Cannot send to req_q". Will dig more. > >> > >> Ferruh, did you have a chance to check what is hanging? > >> Stephen, is there any news on your side? > >> > >> > > > > It did not hang when I tested it. The bug report is still open > > > > Sorry for the delay, since I am working remotely I was worried about loosing the > connection to my server, finally I did create a virtual environment to test again. > > I confirm the hang observed %100 when two different process updates the kni > interface, like two different process sets the mtu. Without this patch this > works fine. > > I understand the motivation of the patch, but with change there is a possibility > to hang the server, which we can't allow, need to find another way. Can updating > mlx interface wait KNI interface operation to complete? Still KNI driver is broken. Calling userspace with RTNL held is fundamentally broken design. If KNI were to be incorporated in upstream kernel, then the netdev developer would see this. What ever solution you think is best. I will continue to recommend against anyone using KNI.