From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2FC79A0547; Mon, 29 Mar 2021 16:37:15 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 93794140DB4; Mon, 29 Mar 2021 16:37:08 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 67BC3140DA5; Mon, 29 Mar 2021 16:37:06 +0200 (CEST) IronPort-SDR: TbYrxaDisUDWJ800QBon06oXVwK/MgE6wnuvxBYwNxN6Du+iWv5i/3imdOs7E8ndrmGmUezr09 FpNohSEjBUtA== X-IronPort-AV: E=McAfee;i="6000,8403,9938"; a="191589153" X-IronPort-AV: E=Sophos;i="5.81,288,1610438400"; d="scan'208";a="191589153" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2021 07:37:05 -0700 IronPort-SDR: erv4UKTQHhOH9GFq/NFM9glhzxG0ykD4VSurSTrRcwh8meyjeB79NEAF7sJh9//04xCnqjba1A Dik3zAkvTgmg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,288,1610438400"; d="scan'208";a="526979731" Received: from silpixa00399752.ir.intel.com (HELO silpixa00399752.ger.corp.intel.com) ([10.237.222.27]) by orsmga004.jf.intel.com with ESMTP; 29 Mar 2021 07:37:04 -0700 From: Ferruh Yigit To: dev@dpdk.org Cc: stable@dpdk.org, Elad Nachman , Stephen Hemminger , Igor Ryzhov , Dan Gora Date: Mon, 29 Mar 2021 15:36:55 +0100 Message-Id: <20210329143655.521750-3-ferruh.yigit@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210329143655.521750-1-ferruh.yigit@intel.com> References: <20201126144613.4986-1-eladv6@gmail.com> <20210329143655.521750-1-ferruh.yigit@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" KNI runs userspace callback with rtnl lock held, this is not working fine with some devices that needs to interact with kernel interface in the callback, like Mellanox devices. The solution is releasing the rtnl lock before calling the userspace callback. But it requires two consideration: 1. The rtnl lock needs to released before 'kni->sync_lock', otherwise it causes deadlock with multiple KNI devices, please check below the A. for the details of the deadlock condition. 2. When rtnl lock is released for interface down event, it cause a regression and deadlock, so can't release the rtnl lock for interface down event, please check below B. for the details. As a solution, interface down event is handled asynchronously and for all other events rtnl lock is released before processing the callback. A. KNI sync lock is being locked while rtnl is held. If two threads are calling kni_net_process_request() , then the first one will take the sync lock, release rtnl lock then sleep. The second thread will try to lock sync lock while holding rtnl. The first thread will wake, and try to lock rtnl, resulting in a deadlock. The remedy is to release rtnl before locking the KNI sync lock. Since in between nothing is accessing Linux network-wise, no rtnl locking is needed. B. There is a race condition in __dev_close_many() processing the close_list while the application terminates. It looks like if two KNI interfaces are terminating, and one releases the rtnl lock, the other takes it, updating the close_list in an unstable state, causing the close_list to become a circular linked list, hence list_for_each_entry() will endlessly loop inside __dev_close_many() . To summarize: request != interface down : unlock rtnl, send request to user-space, wait for response, send the response error code to caller in user-space. request == interface down: send request to user-space, return immediately with error code of 0 (success) to user-space. Fixes: 3fc5ca2f6352 ("kni: initial import") Cc: stable@dpdk.org Signed-off-by: Elad Nachman --- Cc: Stephen Hemminger Cc: Igor Ryzhov Cc: Dan Gora # kernel/linux/kni/kni_net.c.rej --- kernel/linux/kni/kni_net.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 6cf99da0dc92..f259327954b2 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -113,6 +113,14 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) ASSERT_RTNL(); + /* If we need to wait and RTNL mutex is held + * drop the mutex and hold reference to keep device + */ + if (req->async == 0) { + dev_hold(dev); + rtnl_unlock(); + } + mutex_lock(&kni->sync_lock); /* Construct data */ @@ -152,6 +160,10 @@ kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) fail: mutex_unlock(&kni->sync_lock); + if (req->async == 0) { + rtnl_lock(); + dev_put(dev); + } return ret; } @@ -194,6 +206,10 @@ kni_net_release(struct net_device *dev) /* Setting if_up to 0 means down */ req.if_up = 0; + + /* request async because of the deadlock problem */ + req.async = 1; + ret = kni_net_process_request(dev, &req); return (ret == 0) ? req.result : ret; -- 2.30.2