From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D7928A034F; Tue, 23 Feb 2021 13:53:31 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 51C0B4068C; Tue, 23 Feb 2021 13:53:31 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 3A7AB40041 for ; Tue, 23 Feb 2021 13:53:29 +0100 (CET) IronPort-SDR: iz1kfyfsfXU2r7xAD1G+mQJq/dn8aoWBxsxhuzbyokOiZs5PaydVh1oRt1OL38ayv68U3yrWth +zEy1+50NFFQ== X-IronPort-AV: E=McAfee;i="6000,8403,9903"; a="184841252" X-IronPort-AV: E=Sophos;i="5.81,200,1610438400"; d="scan'208";a="184841252" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2021 04:53:27 -0800 IronPort-SDR: KJtiYqOeV9L7F1NY8X+FbqFT/Pt1hc7Pn/pMgfxin5WrJc91sbi2ppSCd9IipdJSJ4fIi/b57I I1VrGHwStpdA== X-IronPort-AV: E=Sophos;i="5.81,200,1610438400"; d="scan'208";a="390878149" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.213.234.66]) ([10.213.234.66]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2021 04:53:26 -0800 To: Elad Nachman Cc: iryzhov@nfware.com, stephen@networkplumber.org, dev@dpdk.org References: <20201126144613.4986-1-eladv6@gmail.com> <20210223120512.29216-1-eladv6@gmail.com> From: Ferruh Yigit X-User: ferruhy Message-ID: <741c0e4c-0adc-c849-48ad-01cba46c7212@intel.com> Date: Tue, 23 Feb 2021 12:53:23 +0000 MIME-Version: 1.0 In-Reply-To: <20210223120512.29216-1-eladv6@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH V2] kni: fix rtnl deadlocks and race conditions v2 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2/23/2021 12:05 PM, Elad Nachman wrote: > This version 2 of the patch leverages on Stephen Hemminger's 64106 > patch from Dec 2019, > and fixes the issues reported by Ferruh and Igor: > > A. KNI sync lock is being locked while rtnl is held. > If two threads are calling kni_net_process_request() , > then the first one will take the sync lock, release rtnl lock then sleep. > The second thread will try to lock sync lock while holding rtnl. > The first thread will wake, and try to lock rtnl, resulting in a deadlock. > The remedy is to release rtnl before locking the KNI sync lock. > Since in between nothing is accessing Linux network-wise, > no rtnl locking is needed. > > B. There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > Since the description for the original patch indicate the > original motivation was bringing the device up, > I have changed kni_net_process_request() to hold the rtnl mutex > in case of bringing the device down since this is the path called > from __dev_close_many() , causing the corruption of the close_list. > > Depends-on: patch-64106 ("kni: fix kernel deadlock when using mlx devices") > Can you please make new version of the patches on top of latest git head, not exiting patches, we don't support incremental updates. > > Signed-off-by: Elad Nachman > --- > V2: > * rebuild the patch as increment from patch 64106 > * fix comment and blank lines > > --- > kernel/linux/kni/kni_net.c | 25 +++++++++++++++++-------- > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > index f0b6e9a8d..b41360220 100644 > --- a/kernel/linux/kni/kni_net.c > +++ b/kernel/linux/kni/kni_net.c > @@ -110,9 +110,22 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) > void *resp_va; > uint32_t num; > int ret_val; > + int req_is_dev_stop = 0; > + One more thing, can you please add comment to code why "stop" request is special? You have it in the commit log, but a short description in code also cna be helpful.