From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 23DF6A034F;
	Mon, 22 Feb 2021 16:58:39 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 09DEA22A23E;
	Mon, 22 Feb 2021 16:58:39 +0100 (CET)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by mails.dpdk.org (Postfix) with ESMTP id 61A1222A231
 for <dev@dpdk.org>; Mon, 22 Feb 2021 16:58:37 +0100 (CET)
IronPort-SDR: 7lfYAakukeDj0JqhZLM7i0wkSG7enDez96ua92SOpYM1BwKvzhPAxJ457RTus32GgLD+Qhnurn
 55bYAA5q7WXA==
X-IronPort-AV: E=McAfee;i="6000,8403,9903"; a="164329680"
X-IronPort-AV: E=Sophos;i="5.81,197,1610438400"; d="scan'208";a="164329680"
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 22 Feb 2021 07:58:36 -0800
IronPort-SDR: vRkfjvyNQ+vl54UKKWNTBhNLXypYkqlS9qN05CeoHxrl4iMBNuw5PbVcgjCMGpkn7Ii2njnAgG
 eCKu9TVXpudw==
X-IronPort-AV: E=Sophos;i="5.81,197,1610438400"; d="scan'208";a="498606532"
Received: from mukeshku-mobl.gar.corp.intel.com (HELO [10.213.216.168])
 ([10.213.216.168])
 by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 22 Feb 2021 07:58:35 -0800
To: Elad Nachman <eladv6@gmail.com>
Cc: dev@dpdk.org, Igor Ryzhov <iryzhov@nfware.com>,
 Stephen Hemminger <stephen@networkplumber.org>
References: <20201126144613.4986-1-eladv6@gmail.com>
 <9f7284a8-4158-c246-c329-bfec27f543a5@intel.com>
 <CACXF7qmStRSTAu7nugtaFf2DEwO+a-pQ12H2J4GKenmnBNQfrw@mail.gmail.com>
From: Ferruh Yigit <ferruh.yigit@intel.com>
X-User: ferruhy
Message-ID: <334c7e47-cee5-0523-6858-a9074a12b7e0@intel.com>
Date: Mon, 22 Feb 2021 15:58:33 +0000
MIME-Version: 1.0
In-Reply-To: <CACXF7qmStRSTAu7nugtaFf2DEwO+a-pQ12H2J4GKenmnBNQfrw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] [PATCH] kni: fix rtnl deadlocks and race conditions
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 2/21/2021 8:03 AM, Elad Nachman wrote:
> Hi,
> 
> Regarding the asynchronous call - thought about it, but then the
> request will always return OK to user-space and I will have no way to
> return failure error codes back to user-space.
> 

Right, let's continue with this patch. Can you please send a new version with 
updates mentioned below?

> If the above explanation is acceptable, per your other comments - I
> can send a new patch without the parameter change , without the empty
> line, and with the comment moved to the proper place in the code.
> 
> Waiting for your decision,
> 
> Elad.
> 
> On Fri, Feb 19, 2021 at 8:42 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>
>> On 11/26/2020 2:46 PM, Elad Nachman wrote:
>>> This patch leverages on Stephen Hemminger's 64106 patch from Dec 2019,
>>> and fixes the issues reported by Ferruh and Igor:
>>>
>>> A. KNI sync lock is being locked while rtnl is held.
>>> If two threads are calling kni_net_process_request() ,
>>> then the first one wil take the sync lock, release rtnl lock then sleep.
>>> The second thread will try to lock sync lock while holding rtnl.
>>> The first thread will wake, and try to lock rtnl, resulting in a deadlock.
>>> The remedy is to release rtnl before locking the KNI sync lock.
>>> Since in between nothing is accessing Linux network-wise,
>>> no rtnl locking is needed.
>>
>> Hi Elad,
>>
>> Thanks for explanation, that clarifies the issue.
>> Also I confirm I don't see the hang, at least as much as I test.
>>
>>>
>>> B. There is a race condition in __dev_close_many() processing the
>>> close_list while the application terminates.
>>> It looks like if two vEth devices are terminating,
>>> and one releases the rtnl lock, the other takes it,
>>> updating the close_list in an unstable state,
>>> causing the close_list to become a circular linked list,
>>> hence list_for_each_entry() will endlessly loop inside
>>> __dev_close_many() .
>>> Since the description for the original patch indicate the
>>> original motivation was bringing the device up,
>>> I have changed kni_net_process_request() to hold the rtnl mutex
>>> in case of bringing the device down since this is the path called
>>> from __dev_close_many() , causing the corruption of the close_list.
>>>
>>
>> I can't reproduce this case, I see the protection in the code, but better to get
>> confirmation from Igor.
>>
>>
>>
>> Overall the issue seems calling a function pointed by 'rte_kni_ops' which
>> requires to acquire the rtnl lock.
>> So I wonder if this can't be handled in the ops function, by processing the
>> request asynchronously,
>> like recording the request, return from 'rte_kni_ops', and process the request
>> afterwards?
>>
>> I assume the application we mention is not kni sample application.
>>
>>>
>>>
>>> Signed-off-by: Elad Nachman <eladv6@gmail.com>
>>> ---
>>>    kernel/linux/kni/kni_net.c | 47 +++++++++++++++++++++++++-------------
>>>    1 file changed, 31 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
>>> index 4b752083d..cf5b0845d 100644
>>> --- a/kernel/linux/kni/kni_net.c
>>> +++ b/kernel/linux/kni/kni_net.c
>>> @@ -17,6 +17,7 @@
>>>    #include <linux/skbuff.h>
>>>    #include <linux/kthread.h>
>>>    #include <linux/delay.h>
>>> +#include <linux/rtnetlink.h>
>>>
>>>    #include <rte_kni_common.h>
>>>    #include <kni_fifo.h>
>>> @@ -102,18 +103,26 @@ get_data_kva(struct kni_dev *kni, void *pkt_kva)
>>>     * It can be called to process the request.
>>>     */
>>>    static int
>>> -kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req)
>>> +kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
>>>    {
>>> +     struct kni_dev *kni = netdev_priv(dev);
>>>        int ret = -1;
>>>        void *resp_va;
>>>        uint32_t num;
>>>        int ret_val;
>>> +     int req_is_dev_stop = 0;
>>>
>>> -     if (!kni || !req) {
>>> -             pr_err("No kni instance or request\n");
>>> -             return -EINVAL;
>>> -     }
>>> +     if (req->req_id == RTE_KNI_REQ_CFG_NETWORK_IF &&
>>> +                     req->if_up == 0)
>>> +             req_is_dev_stop = 1;
>>>
>>> +     ASSERT_RTNL();
>>> +
>>> +     if (!req_is_dev_stop) {
>>> +             dev_hold(dev);
>>> +             rtnl_unlock();
>>> +     }
>>> +
>>>        mutex_lock(&kni->sync_lock);
>>>
>>>        /* Construct data */
>>> @@ -125,8 +134,13 @@ kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req)
>>>                goto fail;
>>>        }
>>>
>>> +     /* Since we need to wait and RTNL mutex is held
>>> +      * drop the mutex and hold refernce to keep device
>>> +      */
>>> +
>>
>> Comment seems left here, need to go up. s/refernce/reference
>>
>>>        ret_val = wait_event_interruptible_timeout(kni->wq,
>>>                        kni_fifo_count(kni->resp_q), 3 * HZ);
>>> +
>>>        if (signal_pending(current) || ret_val <= 0) {
>>>                ret = -ETIME;
>>>                goto fail;
>>> @@ -144,6 +158,13 @@ kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req)
>>>
>>>    fail:
>>>        mutex_unlock(&kni->sync_lock);
>>> +
>>> +
>>
>> extra empty line
>>
>>> +     if (!req_is_dev_stop) {
>>> +             rtnl_lock();
>>> +             dev_put(dev);
>>> +     }
>>> +
>>>        return ret;
>>>    }
>>>
>>> @@ -155,7 +176,6 @@ kni_net_open(struct net_device *dev)
>>>    {
>>>        int ret;
>>>        struct rte_kni_request req;
>>> -     struct kni_dev *kni = netdev_priv(dev);
>>>
>>>        netif_start_queue(dev);
>>>        if (kni_dflt_carrier == 1)
>>> @@ -168,7 +188,7 @@ kni_net_open(struct net_device *dev)
>>>
>>>        /* Setting if_up to non-zero means up */
>>>        req.if_up = 1;
>>> -     ret = kni_net_process_request(kni, &req);
>>> +     ret = kni_net_process_request(dev, &req);
>>>
>>
>> Althoug it is not soo confusing, these lines and following ones are noise for
>> this patch, they are just for 'kni_net_process_request' paramter change.
>>
>> What do you think do the 'kni_net_process_request' parameter change in first
>> patch, and fix the issue in second, this way second patch can contain only the
>> actual changes required for fix.