From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 765E6A054F; Wed, 24 Feb 2021 17:31:50 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 359F74069B; Wed, 24 Feb 2021 17:31:50 +0100 (CET) Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by mails.dpdk.org (Postfix) with ESMTP id D2F0940040 for ; Wed, 24 Feb 2021 17:31:48 +0100 (CET) Received: by mail-ej1-f42.google.com with SMTP id t11so4079326ejx.6 for ; Wed, 24 Feb 2021 08:31:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nfware-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Qu9H73CecLRZp3QlNNr9HiDxKJ1Z1WxUWCvtgEEivbw=; b=nfyYqfK/GaTcr/q2UhpdRdhp/IvehBr4wCLWhQkGyjSpKHp21FQ+rN9pVSl5bo6gzP WAVlhx6Rn0vUncgoJftH7qBlMQM/AYvMIBhlL777kB17brAn3Ak8Nq/xQw0kqbIGBNQk EFM3LQ1R5iJ640TNgbl387YG16aGafGqT6Qb/tPeu7oyHBVhk9if0qK+0zZTCO5Uq2jc yxRqYhin6JuIW8w1StA2d9tC5PgbZUGxIH3f2VcejV3AzD8JQ+3ltdWxswwCHpciIApR b5DbbKV++TBAG9R+K+W6i8MPCa9YHyeGPMZvayYhAWR1bOMlX8QkDdRA0uUln3u2mSWg WdRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Qu9H73CecLRZp3QlNNr9HiDxKJ1Z1WxUWCvtgEEivbw=; b=tyh5RmNEMjZP0juAEfAOGT6Tf3oh6/sDfSRlOvzC9OFEWcTdENTp9gLXAgxeUk1dO5 Yy5hFZvtss5ZxAzHmfJUY970/1cqfg2l3J/gi3M9dakGB4CMziyKnZ5URZCHaSY0ZN2m GpiAYoM9TNCJim3ypWVBFD1mTAglI7oAXmBR5ryYAXey4imhic/ycnpacNC/5XRfp220 i+uz1D2Yp1m2zFxXL0rxP6BmGnZ13CgvG+XxxLLy3qUzMO/nfVvLlO9BHGB3JS/HOy/w 2rFPU+tTirMwMCzguW0ijiZmql14XJt6n/PfyWwX7T/+i8xdjGhl1XD61jXU5QVunF8S SdoA== X-Gm-Message-State: AOAM533dJSe7ugQgbTtidAtkHFXwpmnAll67/5I1HGVhKRP/cxtKp+ph ++lUhzNUyKNRhlcz16EmpD7ZsttJAgPH9JFLjbdSi0NiyKCZzUhB X-Google-Smtp-Source: ABdhPJwY7oBiw+A4V66wwaxUy79t2iUKm8XhS4b9U8LNawC3ktLYz4FE2aIEyf98iPAuvRxpF+JA+HtOZPJOCztcf+s= X-Received: by 2002:a17:906:ad87:: with SMTP id la7mr28927008ejb.534.1614184308495; Wed, 24 Feb 2021 08:31:48 -0800 (PST) MIME-Version: 1.0 References: <20201126144613.4986-1-eladv6@gmail.com> <20210223134504.699-1-eladv6@gmail.com> In-Reply-To: From: Igor Ryzhov Date: Wed, 24 Feb 2021 19:31:37 +0300 Message-ID: To: Elad Nachman Cc: Ferruh Yigit , Stephen Hemminger , dev Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Elad, To make it work on Mellanox NIC, you need to find a way to send ALL requests to userspace without rtnl_lock held, including link down. As I understand, the race condition in __dev_close_many must be solved somehow. I can't provide remote access, but I am happy to test on Mellanox NICs, if you find a way to fix link down requests. On Wed, Feb 24, 2021 at 7:11 PM Elad Nachman wrote: > Sorry, don't have Mellanox NIC currently. Will have one in 8-12 weeks. > Will be happy to test it remotely if anyone can provide remote HW or > VM (Azure, for example). > > Elad. > > On Wed, Feb 24, 2021 at 5:18 PM Igor Ryzhov wrote: > > > > Stephen's idea was to fix the deadlock when working with the bifurcated > driver. > > Your rework breaks this because you still send link down requests under > rtnl_lock. > > Did you test your patch with Mellanox devices? > > > > On Wed, Feb 24, 2021 at 5:56 PM Elad Nachman wrote: > >> > >> The deadlock scenarios are explained below: > >> > >> It is described in Stephen Hemminger's original patch: > >> > >> " > >> > >> This fixes a deadlock when using KNI with bifurcated drivers. > >> Bringing kni device up always times out when using Mellanox > >> devices. > >> > >> The kernel KNI driver sends message to userspace to complete > >> the request. For the case of bifurcated driver, this may involve > >> an additional request to kernel to change state. This request > >> would deadlock because KNI was holding the RTNL mutex. > >> > >> " > >> > >> And also in my patch: > >> > >> " > >> KNI sync lock is being locked while rtnl is held. > >> If two threads are calling kni_net_process_request() , > >> then the first one will take the sync lock, release rtnl lock then > sleep. > >> The second thread will try to lock sync lock while holding rtnl. > >> The first thread will wake, and try to lock rtnl, resulting in a > deadlock. > >> The remedy is to release rtnl before locking the KNI sync lock. > >> Since in between nothing is accessing Linux network-wise, > >> no rtnl locking is needed. > >> " > >> > >> FYI, > >> > >> Elad. > >> > >> On Wed, Feb 24, 2021 at 4:41 PM Igor Ryzhov wrote: > >> > > >> > Both link up and link down also work for me without this patch. > >> > So what's the point in merging it? > >> > > >> > Just to clarify - I am not against the idea of this patch. > >> > Talking to userspace under rtnl_lock is a bad idea. > >> > I just think that any patch should fix some specified problem. > >> > > >> > If this patch is trying to solve the overall "userspace request under > rtnl_lock" problem, > >> > then it doesn't solve it correctly, because we still send link down > requests under the lock. > >> > > >> > If this patch is trying to solve some other issue, for example, some > "KNI deadlocks" > >> > you're talking about, then you should explain what these deadlocks > are, how to reproduce > >> > them and why this patch solves the issue. > >> > > >> > On Wed, Feb 24, 2021 at 5:07 PM Elad Nachman > wrote: > >> >> > >> >> I tested both link up and link down many times without any problems > on > >> >> 100 restarts of the application. > >> >> > >> >> Having KNI deadlock frequently for real life applications is far > worst, IMHO. > >> >> > >> >> FYI > >> >> > >> >> Elad. > >> >> > >> >> On Wed, Feb 24, 2021 at 4:04 PM Igor Ryzhov > wrote: > >> >> > > >> >> > Elad, > >> >> > > >> >> > I understand your point. > >> >> > But the fact that this fix works for you doesn't mean that it will > work for all DPDK users. > >> >> > > >> >> > For example, I provided two simple commands: "ip link set up" and > "ip link set down". > >> >> > Your fix works for only one of them. For me, this is not a proper > fix. > >> >> > It may work for you because you don't disable interfaces, but it > will fail for users who do. > >> >> > > >> >> > On Wed, Feb 24, 2021 at 4:33 PM Elad Nachman > wrote: > >> >> >> > >> >> >> Currently KNI has a lot of issues with deadlocks locking the code, > >> >> >> after this commit, they are gone, and the code runs properly > without > >> >> >> crashing. > >> >> >> That was tested with over 100 restarts of the application, which > >> >> >> previously required a hard reset of the board. > >> >> >> > >> >> >> I think this benefit overweights the complication of the code. > >> >> >> > >> >> >> The function is called with rtnl locked because this is how the > Linux > >> >> >> kernel is designed to work - it is not designed to work with > deferral > >> >> >> to user-space mid-function. > >> >> >> > >> >> >> To fix all such requests you need to reach an agreement with Linux > >> >> >> netdev, which is unlikely. > >> >> >> > >> >> >> Calling user-space can be done asynchronously, as Ferruh asked, > but > >> >> >> then you will always have to return success, even on failure, as > Linux > >> >> >> kernel does not have a mechanism to asynchronously report on > failure > >> >> >> for such system calls. > >> >> >> > >> >> >> IMHO - weighting the non-reporting of failure versus how the code > >> >> >> looks (as it functions perfectly OK), I decided to go with > >> >> >> functionality. > >> >> >> > >> >> >> FYI, > >> >> >> > >> >> >> Elad. > >> >> >> > >> >> >> On Wed, Feb 24, 2021 at 2:50 PM Igor Ryzhov > wrote: > >> >> >> > > >> >> >> > This looks more like a hack than an actual fix to me. > >> >> >> > > >> >> >> > After this commit: > >> >> >> > "ip link set up" is sent to the userspace with unlocked > rtnl_lock > >> >> >> > "ip link set down" is sent to the userspace with locked > rtnl_lock > >> >> >> > > >> >> >> > How is this really fixing anything? IMHO it only complicates > the code. > >> >> >> > If talking with userspace under rtnl_lock is a problem, then we > should fix all such requests, not only part of them. > >> >> >> > If it is not a problem, then I don't see any point in merging > this. > >> >> >> > > >> >> >> > On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman > wrote: > >> >> >> >> > >> >> >> >> This part of the series includes my fixes for the issues > reported > >> >> >> >> by Ferruh and Igor on top of part 1 of the patch series: > >> >> >> >> > >> >> >> >> A. KNI sync lock is being locked while rtnl is held. > >> >> >> >> If two threads are calling kni_net_process_request() , > >> >> >> >> then the first one will take the sync lock, release rtnl lock > then sleep. > >> >> >> >> The second thread will try to lock sync lock while holding > rtnl. > >> >> >> >> The first thread will wake, and try to lock rtnl, resulting in > a deadlock. > >> >> >> >> The remedy is to release rtnl before locking the KNI sync lock. > >> >> >> >> Since in between nothing is accessing Linux network-wise, > >> >> >> >> no rtnl locking is needed. > >> >> >> >> > >> >> >> >> B. There is a race condition in __dev_close_many() processing > the > >> >> >> >> close_list while the application terminates. > >> >> >> >> It looks like if two vEth devices are terminating, > >> >> >> >> and one releases the rtnl lock, the other takes it, > >> >> >> >> updating the close_list in an unstable state, > >> >> >> >> causing the close_list to become a circular linked list, > >> >> >> >> hence list_for_each_entry() will endlessly loop inside > >> >> >> >> __dev_close_many() . > >> >> >> >> Since the description for the original patch indicate the > >> >> >> >> original motivation was bringing the device up, > >> >> >> >> I have changed kni_net_process_request() to hold the rtnl mutex > >> >> >> >> in case of bringing the device down since this is the path > called > >> >> >> >> from __dev_close_many() , causing the corruption of the > close_list. > >> >> >> >> > >> >> >> >> Signed-off-by: Elad Nachman > >> >> >> >> --- > >> >> >> >> v3: > >> >> >> >> * Include original patch and new patch as a series of patch, > added a > >> >> >> >> comment to the new patch > >> >> >> >> v2: > >> >> >> >> * rebuild the patch as increment from patch 64106 > >> >> >> >> * fix comment and blank lines > >> >> >> >> --- > >> >> >> >> kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++-------- > >> >> >> >> 1 file changed, 21 insertions(+), 8 deletions(-) > >> >> >> >> > >> >> >> >> diff --git a/kernel/linux/kni/kni_net.c > b/kernel/linux/kni/kni_net.c > >> >> >> >> index f0b6e9a8d..017e44812 100644 > >> >> >> >> --- a/kernel/linux/kni/kni_net.c > >> >> >> >> +++ b/kernel/linux/kni/kni_net.c > >> >> >> >> @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device > *dev, struct rte_kni_request *req) > >> >> >> >> void *resp_va; > >> >> >> >> uint32_t num; > >> >> >> >> int ret_val; > >> >> >> >> + int req_is_dev_stop = 0; > >> >> >> >> + > >> >> >> >> + /* For configuring the interface to down, > >> >> >> >> + * rtnl must be held all the way to prevent race > condition > >> >> >> >> + * inside __dev_close_many() between two netdev > instances of KNI > >> >> >> >> + */ > >> >> >> >> + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && > >> >> >> >> + req->if_up == 0) > >> >> >> >> + req_is_dev_stop = 1; > >> >> >> >> > >> >> >> >> ASSERT_RTNL(); > >> >> >> >> > >> >> >> >> + /* Since we need to wait and RTNL mutex is held > >> >> >> >> + * drop the mutex and hold reference to keep device > >> >> >> >> + */ > >> >> >> >> + if (!req_is_dev_stop) { > >> >> >> >> + dev_hold(dev); > >> >> >> >> + rtnl_unlock(); > >> >> >> >> + } > >> >> >> >> + > >> >> >> >> mutex_lock(&kni->sync_lock); > >> >> >> >> > >> >> >> >> /* Construct data */ > >> >> >> >> @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device > *dev, struct rte_kni_request *req) > >> >> >> >> goto fail; > >> >> >> >> } > >> >> >> >> > >> >> >> >> - /* Since we need to wait and RTNL mutex is held > >> >> >> >> - * drop the mutex and hold refernce to keep device > >> >> >> >> - */ > >> >> >> >> - dev_hold(dev); > >> >> >> >> - rtnl_unlock(); > >> >> >> >> - > >> >> >> >> ret_val = wait_event_interruptible_timeout(kni->wq, > >> >> >> >> kni_fifo_count(kni->resp_q), 3 * HZ); > >> >> >> >> - rtnl_lock(); > >> >> >> >> - dev_put(dev); > >> >> >> >> > >> >> >> >> if (signal_pending(current) || ret_val <= 0) { > >> >> >> >> ret = -ETIME; > >> >> >> >> @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device > *dev, struct rte_kni_request *req) > >> >> >> >> > >> >> >> >> fail: > >> >> >> >> mutex_unlock(&kni->sync_lock); > >> >> >> >> + if (!req_is_dev_stop) { > >> >> >> >> + rtnl_lock(); > >> >> >> >> + dev_put(dev); > >> >> >> >> + } > >> >> >> >> return ret; > >> >> >> >> } > >> >> >> >> > >> >> >> >> -- > >> >> >> >> 2.17.1 > >> >> >> >> >