From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8D0DFA0C4D; Mon, 4 Oct 2021 19:00:06 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 07F2941398; Mon, 4 Oct 2021 19:00:06 +0200 (CEST) Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by mails.dpdk.org (Postfix) with ESMTP id F309741390 for ; Mon, 4 Oct 2021 19:00:04 +0200 (CEST) Received: by mail-pf1-f176.google.com with SMTP id 145so15047820pfz.11 for ; Mon, 04 Oct 2021 10:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mJqyIiMZyRE5KihSJXkkhDnqEgajnrk5DXMtvxXoMuA=; b=pFRWXg49G1qKqFArQvdOw0zPlNMAArOUwKWfNhojaZBEPLlwuuvvuHnmXBeAC56fui tJJrTBs70ZEXuHeRIuKsTkTOem7sFtW9+5MmOiZJQjqLVWQkc2DMHWtx7uvxo5YhcQ3d nsZmcnVFrW1l6tsaqWMBbMWSjVYy6elQuF56ekEhauwTj8w8+Z7ZDlyZN/ZmPFrd16Ce uQ4RaHs9K5vhkRHu7i27wTBkdLH2VxSKn8MdvnTUWRKPnxYvGf4EgGVOFlnixqM8LfQC uGeDO+V+ANTmV/ozgy59wY0JICtR7fVvyj4F9ZPI1ckC2cLMDop2o8dNxRQqgxULZJF6 vglA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mJqyIiMZyRE5KihSJXkkhDnqEgajnrk5DXMtvxXoMuA=; b=4ob25mMbIXIRbq57wT+x1jSDll5uArNZA5uJ5OhkBD8a0qvaZcPqquAVg0Bzp/TJ2l Dny3pGZ3TzHGn3ZeWH8CUwVrcPne9Tf1l8Xpy40FboXCUjBDrzrhsRTmV99qp3IVmUWW N+nDbxF3F3Z+JZ0eXC5u4KMKujTfkpwb/iUPaKnRPG7POaKGMbgYXHzaOvmO2KAufkjN OIcxzCWtnmuwxquNAe6Z0MwuBMFyOcAoDp6fQ5OwKxCDCeBSkoMrj1HR5l31F0B1vsfb moXs88XqeL1NwFYp0HgCG5T+2XNn0u/gqQ4WAzAonJQYPedjqn4ATCgtjlSwS5dpM0UJ bqAA== X-Gm-Message-State: AOAM532a2YOdZw5C8PPfu0xR10Aau/LnxWw3ql5cpy+8Oy6aRYmJRmmr bRYqGZ4vtLyQZPz9qrYadn/D1S/3xgPNi6q2XAg= X-Google-Smtp-Source: ABdhPJwPf1RYDT4hwDMdVNY/qbiSYmCTprk6IoZ2tnYht3tJ515XleklxiLY2Bt1RjnVT8u4iDUhUuJE0BP4Bq9iAtk= X-Received: by 2002:a63:2dc7:: with SMTP id t190mr11943706pgt.455.1633366803990; Mon, 04 Oct 2021 10:00:03 -0700 (PDT) MIME-Version: 1.0 References: <20210924105409.21711-1-eladv6@gmail.com> <3ae193df-292c-4907-df4a-88ce3d6735fc@intel.com> <1a17d552-8b81-04f9-7594-61e84ea7990f@intel.com> <8525082f-eb28-92db-11d3-ef4d24144be4@intel.com> In-Reply-To: From: Eric Christian Date: Mon, 4 Oct 2021 12:59:52 -0400 Message-ID: To: Elad Nachman Cc: Ferruh Yigit , dev , Igor Ryzhov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" I am not sure that only we can recreate the KNI request overwrite. We may be the only ones with a current use case that exposes the vulnerability. It is possible for any KNI operation to encounter this issue with the new async mechanism. As long as the call to kni_net_process_request() is a separate thread from rte_kni_handle_request() this has the potential to occur with the use of async requests. All you need is one async KNI request followed closely by a second KNI request before the rte_kni_handle_request() has had a chance to process the first request. The kernel dev driver simply returns the error value back to the caller if it is less than zero. Eric On Mon, Oct 4, 2021 at 12:19 PM Elad Nachman wrote: > > > On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit > wrote: > >> On 10/4/2021 3:58 PM, Elad Nachman wrote: >> > =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3, = 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:51, =D7=9E=D7=90=D7=AA Ferruh Yig= it =E2=80=8F< >> > ferruh.yigit@intel.com>: >> > >> >> On 10/4/2021 3:25 PM, Elad Nachman wrote: >> >> >> >> Can you please try to not top post, it will make impossible to follow >> this >> >> discussion later from the mail archives. >> >> >> >>> 1. Userspace will get an error >> >> >> >> So there is nothing special with returning '-EAGAIN', user will only >> >> observe an >> >> error. >> >> Wasn't initial intention to use '-EAGAIN' to try request again? >> >> >> > To signal user-space to retry the operation. >> > >> >> Not sure if it will reach to the end user. If user is calling "ifconfig >> >> down", it will just fail right, it won't recognize the error type. >> >> Unless this is common usage by the Linux network drivers, having this >> usage in >> KNI won't help much. I am for handling this in the kernel side if we can= . >> >> > If user calls ifconfig down it will not happen. It requires some > multi-core race condition only Eric can recreate. > > >> >> >> >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl >> unlocked >> >>> for interface down command causes a crash because of a race conditio= n >> in >> >>> the device delete/unregister list in the kernel. >> >>> >> >> >> >> Why waiting with rthnl lock causes a deadlock? As said below we are >> already >> >> doing it, why it is different with retry logic? >> >> >> > Because it can be interface down request. >> > >> >> (sure you like short answers) >> >> Please help me to see why "interface down" is special. Isn't it point of >> your >> patch to wait the request execution in the userspace even it is an async >> request? >> >> And yet again, number of retry can be limited. >> >> > No, it is not. Please look again: > https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-eladv6= @gmail.com/ > > > >> >> > >> >> I agree to not wait with rtnl unlocked. >> >> >> >>> FYI, >> >>> >> >>> Elad. >> >>> >> >>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3= , 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:13, =D7=9E=D7=90=D7=AA Ferruh Y= igit =E2=80=8F< >> >>> ferruh.yigit@intel.com>: >> >>> >> >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote: >> >>>>> Hi, >> >>>>> >> >>>>> EAGAIN is propogated back to the kernel and to the caller. >> >>>>> >> >>>> >> >>>> So will the user get an error, or it will be handled by the kernel >> and >> >>>> retried? >> >>>> >> >>>>> We cannot retry from the kni kernel module since we hold the rtnl >> lock. >> >>>>> >> >>>> >> >>>> Why not? We are already waiting until a command time out, like >> >>>> 'kni_net_open()' >> >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we >> can >> >>>> limit the >> >>>> number of retry for safety. >> >>>> >> >>>>> FYI, >> >>>>> >> >>>>> Elad >> >>>>> >> >>>>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7= =B3, 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 16:05, =D7=9E=D7=90=D7=AA Ferru= h Yigit =E2=80=8F< >> >>>>> ferruh.yigit@intel.com>: >> >>>>> >> >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote: >> >>>>>>> Fix lack of multiple KNI requests handling support by introducin= g >> a >> >>>>>>> request in progress flag which will fail additional requests wit= h >> >>>>>>> EAGAIN return code if the original request has not been processe= d >> >>>>>>> by user-space. >> >>>>>>> >> >>>>>>> Bugzilla ID: 809 >> >>>>>> >> >>>>>> Hi Eric, >> >>>>>> >> >>>>>> Can you please test this patch, if it solves the issue you >> reported? >> >>>>>> >> >>>>>>> >> >>>>>>> Signed-off-by: Elad Nachman >> >>>>>>> --- >> >>>>>>> kernel/linux/kni/kni_net.c | 9 +++++++++ >> >>>>>>> lib/kni/rte_kni.c | 2 ++ >> >>>>>>> lib/kni/rte_kni_common.h | 1 + >> >>>>>>> 3 files changed, 12 insertions(+) >> >>>>>>> >> >>>>>> >> >>>>>> <...> >> >>>>>> >> >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device >> *dev, >> >>>>>> struct rte_kni_request *req) >> >>>>>>> >> >>>>>>> mutex_lock(&kni->sync_lock); >> >>>>>>> >> >>>>>>> + /* Check that existing request has been processed: */ >> >>>>>>> + cur_req =3D (struct rte_kni_request *)kni->sync_kva; >> >>>>>>> + if (cur_req->req_in_progress) { >> >>>>>>> + ret =3D -EAGAIN; >> >>>>>> >> >>>>>> Overall logic in the KNI looks good to me, this helps to serializ= e >> the >> >>>>>> requests >> >>>>>> even for async ones. >> >>>>>> >> >>>>>> But can you please clarify how it behaves in the kernel side with >> >>>> '-EAGAIN' >> >>>>>> return type? Will linux call the ndo again, or will it just fail. >> >>>>>> >> >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within >> the >> >> kni >> >>>>>> module? >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >> >> >> Elad. >> >>