From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E8713A0C44; Mon, 12 Apr 2021 16:35:45 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B2D3B141212; Mon, 12 Apr 2021 16:35:45 +0200 (CEST) Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by mails.dpdk.org (Postfix) with ESMTP id CA85614120D; Mon, 12 Apr 2021 16:35:44 +0200 (CEST) Received: by mail-il1-f169.google.com with SMTP id l19so7303901ilk.13; Mon, 12 Apr 2021 07:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9FjHD1s0eKt/sDXAJy0Czcoz3bXSUe3YDP9IylARa+8=; b=nEB+/VEj/JuN+oHxbJD6lEnu0CBXy6i58tf7l6IDcO4I+uF3XZ0PNbvNik5ZPUzW9C Tzv7FHZ4jgMbxyKJO2EbArgCDiE3CYBpQDIb5Z5HhOoU/rQez5kbPqbHD6T7i+rjkJNU xvDXz4pXVCraE6nFFNYiIvhrA+ovBAPwqk5/7WIMtuQ4/wuwUaqXe9wXVslMUyAEVCIG cYLGpmtzJo6IbMZbsJya0WFFfWBGmBuyL9JX0b27MTDzf3NJyrVu583UZYPMoJNknXMc w3u4cYuF5TrBp9MuZ356QtwqZL6pxQcZk77I3qsFbw1MEQ1iI1a/xASZ78lW/3iuyUq8 Issw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9FjHD1s0eKt/sDXAJy0Czcoz3bXSUe3YDP9IylARa+8=; b=X3iYxv4ulHk7XQvPAylbcH+WKO7xXzXGRlZ4kq6cMFpJtasHoxjsssIvqEuq64g669 2srD/T9to0H+06vNbsRsuAocxTVBl7e24JQ2NBlRYaG8iIvGKmJlYxz0EtVN08gYOt3k LUughfAiHZ1fE9wHUCK8SlBQDdMzsULTEkTgD5L1IeClutSYLv8pRTTlUJ+9GPAi7JgY zS1NtSS+0vRrBR+gaHZCYuZawflZk54yh34UhIUAzz9H1Q9TwzWIZQN/TW1GlOSORKPx gHNyS1K9yVyhEmfyMBSr15b0HIi41zNYXvNIJzlobOcVYLcuESwDOcI1T4YdHW6As51Q p79w== X-Gm-Message-State: AOAM532JhAEwnC6CwLiJUjjXO+Rq1+wQezZOH/0OV0Ctrj1ORy17JKNx SKlqaysuJT7HAwUd8XugP3dXqQeqRuNra+RubZ8= X-Google-Smtp-Source: ABdhPJwcsPn32jxZeExhme9WpHv7fkijRoHCfv4a4DmAbcY5STLVp4sB5OPYoz1WbsnxsiVdiiIam9OGzu4aii9Ag/Q= X-Received: by 2002:a05:6e02:1ca2:: with SMTP id x2mr20574050ill.128.1618238144128; Mon, 12 Apr 2021 07:35:44 -0700 (PDT) MIME-Version: 1.0 References: <20201126144613.4986-1-eladv6@gmail.com> <20210329143655.521750-1-ferruh.yigit@intel.com> <20210329143655.521750-3-ferruh.yigit@intel.com> In-Reply-To: From: Elad Nachman Date: Mon, 12 Apr 2021 17:35:32 +0300 Message-ID: To: Ferruh Yigit Cc: Igor Ryzhov , stable@dpdk.org, Stephen Hemminger , Dan Gora , dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, The new patch is fine by me. Tested several dozens restarts of our proprietary application without apparent problem. FYI, Elad. =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=95=D7=B3, 9 =D7= =91=D7=90=D7=A4=D7=A8=D7=B3 2021, 17:56, =D7=9E=D7=90=D7=AA Ferruh Yigit = =E2=80=8F< ferruh.yigit@intel.com>: > On 3/29/2021 3:36 PM, Ferruh Yigit wrote: > > KNI runs userspace callback with rtnl lock held, this is not working > > fine with some devices that needs to interact with kernel interface in > > the callback, like Mellanox devices. > > > > The solution is releasing the rtnl lock before calling the userspace > > callback. But it requires two consideration: > > > > 1. The rtnl lock needs to released before 'kni->sync_lock', otherwise i= t > > causes deadlock with multiple KNI devices, please check below the A= . > > for the details of the deadlock condition. > > > > 2. When rtnl lock is released for interface down event, it cause a > > regression and deadlock, so can't release the rtnl lock for interfa= ce > > down event, please check below B. for the details. > > > > As a solution, interface down event is handled asynchronously and for > > all other events rtnl lock is released before processing the callback. > > > > A. KNI sync lock is being locked while rtnl is held. > > If two threads are calling kni_net_process_request() , > > then the first one will take the sync lock, release rtnl lock then slee= p. > > The second thread will try to lock sync lock while holding rtnl. > > The first thread will wake, and try to lock rtnl, resulting in a > > deadlock. The remedy is to release rtnl before locking the KNI sync > > lock. > > Since in between nothing is accessing Linux network-wise, no rtnl > > locking is needed. > > > > B. There is a race condition in __dev_close_many() processing the > > close_list while the application terminates. > > It looks like if two KNI interfaces are terminating, > > and one releases the rtnl lock, the other takes it, > > updating the close_list in an unstable state, > > causing the close_list to become a circular linked list, > > hence list_for_each_entry() will endlessly loop inside > > __dev_close_many() . > > > > To summarize: > > request !=3D interface down : unlock rtnl, send request to user-space, > > wait for response, send the response error code to caller in user-space= . > > > > request =3D=3D interface down: send request to user-space, return immed= iately > > with error code of 0 (success) to user-space. > > > > Fixes: 3fc5ca2f6352 ("kni: initial import") > > Cc: stable@dpdk.org > > > > Signed-off-by: Elad Nachman > > --- > > Cc: Stephen Hemminger > > Cc: Igor Ryzhov > > Cc: Dan Gora > > > > Hi Elad, Igor, > > Can you please review/test this set when you have time? > > Thanks, > ferruh > >