From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 23DF6A034F; Mon, 22 Feb 2021 16:58:39 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 09DEA22A23E; Mon, 22 Feb 2021 16:58:39 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 61A1222A231 for ; Mon, 22 Feb 2021 16:58:37 +0100 (CET) IronPort-SDR: 7lfYAakukeDj0JqhZLM7i0wkSG7enDez96ua92SOpYM1BwKvzhPAxJ457RTus32GgLD+Qhnurn 55bYAA5q7WXA== X-IronPort-AV: E=McAfee;i="6000,8403,9903"; a="164329680" X-IronPort-AV: E=Sophos;i="5.81,197,1610438400"; d="scan'208";a="164329680" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Feb 2021 07:58:36 -0800 IronPort-SDR: vRkfjvyNQ+vl54UKKWNTBhNLXypYkqlS9qN05CeoHxrl4iMBNuw5PbVcgjCMGpkn7Ii2njnAgG eCKu9TVXpudw== X-IronPort-AV: E=Sophos;i="5.81,197,1610438400"; d="scan'208";a="498606532" Received: from mukeshku-mobl.gar.corp.intel.com (HELO [10.213.216.168]) ([10.213.216.168]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Feb 2021 07:58:35 -0800 To: Elad Nachman Cc: dev@dpdk.org, Igor Ryzhov , Stephen Hemminger References: <20201126144613.4986-1-eladv6@gmail.com> <9f7284a8-4158-c246-c329-bfec27f543a5@intel.com> From: Ferruh Yigit X-User: ferruhy Message-ID: <334c7e47-cee5-0523-6858-a9074a12b7e0@intel.com> Date: Mon, 22 Feb 2021 15:58:33 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] kni: fix rtnl deadlocks and race conditions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2/21/2021 8:03 AM, Elad Nachman wrote: > Hi, > > Regarding the asynchronous call - thought about it, but then the > request will always return OK to user-space and I will have no way to > return failure error codes back to user-space. > Right, let's continue with this patch. Can you please send a new version with updates mentioned below? > If the above explanation is acceptable, per your other comments - I > can send a new patch without the parameter change , without the empty > line, and with the comment moved to the proper place in the code. > > Waiting for your decision, > > Elad. > > On Fri, Feb 19, 2021 at 8:42 PM Ferruh Yigit wrote: >> >> On 11/26/2020 2:46 PM, Elad Nachman wrote: >>> This patch leverages on Stephen Hemminger's 64106 patch from Dec 2019, >>> and fixes the issues reported by Ferruh and Igor: >>> >>> A. KNI sync lock is being locked while rtnl is held. >>> If two threads are calling kni_net_process_request() , >>> then the first one wil take the sync lock, release rtnl lock then sleep. >>> The second thread will try to lock sync lock while holding rtnl. >>> The first thread will wake, and try to lock rtnl, resulting in a deadlock. >>> The remedy is to release rtnl before locking the KNI sync lock. >>> Since in between nothing is accessing Linux network-wise, >>> no rtnl locking is needed. >> >> Hi Elad, >> >> Thanks for explanation, that clarifies the issue. >> Also I confirm I don't see the hang, at least as much as I test. >> >>> >>> B. There is a race condition in __dev_close_many() processing the >>> close_list while the application terminates. >>> It looks like if two vEth devices are terminating, >>> and one releases the rtnl lock, the other takes it, >>> updating the close_list in an unstable state, >>> causing the close_list to become a circular linked list, >>> hence list_for_each_entry() will endlessly loop inside >>> __dev_close_many() . >>> Since the description for the original patch indicate the >>> original motivation was bringing the device up, >>> I have changed kni_net_process_request() to hold the rtnl mutex >>> in case of bringing the device down since this is the path called >>> from __dev_close_many() , causing the corruption of the close_list. >>> >> >> I can't reproduce this case, I see the protection in the code, but better to get >> confirmation from Igor. >> >> >> >> Overall the issue seems calling a function pointed by 'rte_kni_ops' which >> requires to acquire the rtnl lock. >> So I wonder if this can't be handled in the ops function, by processing the >> request asynchronously, >> like recording the request, return from 'rte_kni_ops', and process the request >> afterwards? >> >> I assume the application we mention is not kni sample application. >> >>> >>> >>> Signed-off-by: Elad Nachman >>> --- >>> kernel/linux/kni/kni_net.c | 47 +++++++++++++++++++++++++------------- >>> 1 file changed, 31 insertions(+), 16 deletions(-) >>> >>> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c >>> index 4b752083d..cf5b0845d 100644 >>> --- a/kernel/linux/kni/kni_net.c >>> +++ b/kernel/linux/kni/kni_net.c >>> @@ -17,6 +17,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include >>> #include >>> @@ -102,18 +103,26 @@ get_data_kva(struct kni_dev *kni, void *pkt_kva) >>> * It can be called to process the request. >>> */ >>> static int >>> -kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req) >>> +kni_net_process_request(struct net_device *dev, struct rte_kni_request *req) >>> { >>> + struct kni_dev *kni = netdev_priv(dev); >>> int ret = -1; >>> void *resp_va; >>> uint32_t num; >>> int ret_val; >>> + int req_is_dev_stop = 0; >>> >>> - if (!kni || !req) { >>> - pr_err("No kni instance or request\n"); >>> - return -EINVAL; >>> - } >>> + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && >>> + req->if_up == 0) >>> + req_is_dev_stop = 1; >>> >>> + ASSERT_RTNL(); >>> + >>> + if (!req_is_dev_stop) { >>> + dev_hold(dev); >>> + rtnl_unlock(); >>> + } >>> + >>> mutex_lock(&kni->sync_lock); >>> >>> /* Construct data */ >>> @@ -125,8 +134,13 @@ kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req) >>> goto fail; >>> } >>> >>> + /* Since we need to wait and RTNL mutex is held >>> + * drop the mutex and hold refernce to keep device >>> + */ >>> + >> >> Comment seems left here, need to go up. s/refernce/reference >> >>> ret_val = wait_event_interruptible_timeout(kni->wq, >>> kni_fifo_count(kni->resp_q), 3 * HZ); >>> + >>> if (signal_pending(current) || ret_val <= 0) { >>> ret = -ETIME; >>> goto fail; >>> @@ -144,6 +158,13 @@ kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req) >>> >>> fail: >>> mutex_unlock(&kni->sync_lock); >>> + >>> + >> >> extra empty line >> >>> + if (!req_is_dev_stop) { >>> + rtnl_lock(); >>> + dev_put(dev); >>> + } >>> + >>> return ret; >>> } >>> >>> @@ -155,7 +176,6 @@ kni_net_open(struct net_device *dev) >>> { >>> int ret; >>> struct rte_kni_request req; >>> - struct kni_dev *kni = netdev_priv(dev); >>> >>> netif_start_queue(dev); >>> if (kni_dflt_carrier == 1) >>> @@ -168,7 +188,7 @@ kni_net_open(struct net_device *dev) >>> >>> /* Setting if_up to non-zero means up */ >>> req.if_up = 1; >>> - ret = kni_net_process_request(kni, &req); >>> + ret = kni_net_process_request(dev, &req); >>> >> >> Althoug it is not soo confusing, these lines and following ones are noise for >> this patch, they are just for 'kni_net_process_request' paramter change. >> >> What do you think do the 'kni_net_process_request' parameter change in first >> patch, and fix the issue in second, this way second patch can contain only the >> actual changes required for fix.