From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DABFAA034F; Wed, 24 Feb 2021 13:50:01 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C6FBE16083C; Wed, 24 Feb 2021 13:50:01 +0100 (CET) Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by mails.dpdk.org (Postfix) with ESMTP id 029C74069B for ; Wed, 24 Feb 2021 13:50:00 +0100 (CET) Received: by mail-ej1-f52.google.com with SMTP id do6so2865756ejc.3 for ; Wed, 24 Feb 2021 04:50:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nfware-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qZl4sO5J/D1sXc7XCUgyVpghuAhaeZ8sEJxXWhjXOQ0=; b=HEQ68iuZu1wtEcsp3mp0eZ+Ko1XmrRYMP8EsQwo6knvVgwYIxqN//uSg0ejXeOB+9b jcK+1hobi/wP6UGFKLeGqOE9rfdx/vCcYVtGmHRtF+aQT6f7IFj5Puae9wDStBSjOYzE hlbNC4qNW3/0VyEfQZuQ1g1RX14xMoqcSHHNk6jgPPgoz03oYH0HSFvuDE5OZwNFoRc7 wsLGD+zrlVCnqEWlxzh+euNTHtLwqDRzUBIjZV/+lzLhYx5oXaRhUF/nKCNnAPaJ7nHH bbi7Yq8NmbYeEeBXX7UxYYA0Wcd8JSd8C4WSFl7U0C2auEQ06JPKecuOXNp+MLLtzJXy syIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qZl4sO5J/D1sXc7XCUgyVpghuAhaeZ8sEJxXWhjXOQ0=; b=XuB6aB24dhnTxgNANlRh1uiZE4HbgAwC0JBLH+POX60+tVQxdYGuChaS1ZKiM0wQeX NaVOD84zHDc8lfeemSLTER94DngUsGu6vaXGlBlO4b318LpZ6oCu927uaM3nM+zO7FHE pgU4agB1c70jGQtucpm0dbShLRycYWfJLkH6mTFSDTlaGTk6aLNUoiegOwaqn76UTl6I onQFxU8HQB3Rzt0w/o0y4SZGMwyqGx8vAdExHU3x1UQNHw2R7vQiPbGFuCT5zKVmbEy/ edx/7Vs5R2iEqnAotOxv+3GuCh0W24MTipbXoijJdMZnIhpTWmVwW2ivk5y2a/gjYsBU piQg== X-Gm-Message-State: AOAM533esVYzYJyHua4xgRN25dAzXzN/tw3uXP20dCSgAoRKJITSTNFq WAMmsg3Xakr9lwNArhSoU8iovCTRr0kEp4jP9dNlaA== X-Google-Smtp-Source: ABdhPJwE5ru1p8RfgKz289OexWSZl/VlumqrGpjSxgzZgQBpmOyzzQtcPUQy4VgDu3Cx0oaaHj71+VjfDSQHlAuDhLU= X-Received: by 2002:a17:907:98af:: with SMTP id ju15mr20805180ejc.263.1614171000678; Wed, 24 Feb 2021 04:50:00 -0800 (PST) MIME-Version: 1.0 References: <20201126144613.4986-1-eladv6@gmail.com> <20210223134504.699-1-eladv6@gmail.com> In-Reply-To: <20210223134504.699-1-eladv6@gmail.com> From: Igor Ryzhov Date: Wed, 24 Feb 2021 15:49:49 +0300 Message-ID: To: Elad Nachman Cc: Ferruh Yigit , Stephen Hemminger , dev Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v3 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This looks more like a hack than an actual fix to me. After this commit: "ip link set up" is sent to the userspace with unlocked rtnl_lock "ip link set down" is sent to the userspace with locked rtnl_lock How is this really fixing anything? IMHO it only complicates the code. If talking with userspace under rtnl_lock is a problem, then we should fix all such requests, not only part of them. If it is not a problem, then I don't see any point in merging this. On Tue, Feb 23, 2021 at 4:45 PM Elad Nachman wrote: > This part of the series includes my fixes for the issues reported > by Ferruh and Igor on top of part 1 of the patch series: > > A. KNI sync lock is being locked while rtnl is held. > If two threads are calling kni_net_process_request() , > then the first one will take the sync lock, release rtnl lock then sleep. > The second thread will try to lock sync lock while holding rtnl. > The first thread will wake, and try to lock rtnl, resulting in a deadlock. > The remedy is to release rtnl before locking the KNI sync lock. > Since in between nothing is accessing Linux network-wise, > no rtnl locking is needed. > > B. There is a race condition in __dev_close_many() processing the > close_list while the application terminates. > It looks like if two vEth devices are terminating, > and one releases the rtnl lock, the other takes it, > updating the close_list in an unstable state, > causing the close_list to become a circular linked list, > hence list_for_each_entry() will endlessly loop inside > __dev_close_many() . > Since the description for the original patch indicate the > original motivation was bringing the device up, > I have changed kni_net_process_request() to hold the rtnl mutex > in case of bringing the device down since this is the path called > from __dev_close_many() , causing the corruption of the close_list. > > Signed-off-by: Elad Nachman > --- > v3: > * Include original patch and new patch as a series of patch, added a > comment to the new patch > v2: > * rebuild the patch as increment from patch 64106 > * fix comment and blank lines > --- > kernel/linux/kni/kni_net.c | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c > index f0b6e9a8d..017e44812 100644 > --- a/kernel/linux/kni/kni_net.c > +++ b/kernel/linux/kni/kni_net.c > @@ -110,9 +110,26 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > void *resp_va; > uint32_t num; > int ret_val; > + int req_is_dev_stop = 0; > + > + /* For configuring the interface to down, > + * rtnl must be held all the way to prevent race condition > + * inside __dev_close_many() between two netdev instances of KNI > + */ > + if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF && > + req->if_up == 0) > + req_is_dev_stop = 1; > > ASSERT_RTNL(); > > + /* Since we need to wait and RTNL mutex is held > + * drop the mutex and hold reference to keep device > + */ > + if (!req_is_dev_stop) { > + dev_hold(dev); > + rtnl_unlock(); > + } > + > mutex_lock(&kni->sync_lock); > > /* Construct data */ > @@ -124,16 +141,8 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > goto fail; > } > > - /* Since we need to wait and RTNL mutex is held > - * drop the mutex and hold refernce to keep device > - */ > - dev_hold(dev); > - rtnl_unlock(); > - > ret_val = wait_event_interruptible_timeout(kni->wq, > kni_fifo_count(kni->resp_q), 3 * HZ); > - rtnl_lock(); > - dev_put(dev); > > if (signal_pending(current) || ret_val <= 0) { > ret = -ETIME; > @@ -152,6 +161,10 @@ kni_net_process_request(struct net_device *dev, > struct rte_kni_request *req) > > fail: > mutex_unlock(&kni->sync_lock); > + if (!req_is_dev_stop) { > + rtnl_lock(); > + dev_put(dev); > + } > return ret; > } > > -- > 2.17.1 > >