From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 179EAA0C52; Mon, 4 Oct 2021 20:27:50 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8BC6F41336; Mon, 4 Oct 2021 20:27:49 +0200 (CEST) Received: from mail-io1-f48.google.com (mail-io1-f48.google.com [209.85.166.48]) by mails.dpdk.org (Postfix) with ESMTP id 4C2A54068F for ; Mon, 4 Oct 2021 20:27:48 +0200 (CEST) Received: by mail-io1-f48.google.com with SMTP id y197so21414627iof.11 for ; Mon, 04 Oct 2021 11:27:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m8UWdp9PuLjFVX/ZKTfZzTNzYdSZCCQshCtE2lGrS4o=; b=IW5grMZAKJlbXCtTjrwQhOsEEs/wgo5zLOfyF21Or+wnu4NCX8cfL5irLsLaMnCPr+ GHJf+0waF4cun7Xh6eImQDk7DVI5D+lF2e61ulnHHe0A+G55t2CDurL7tK+vXp/6Q0Y9 pUBcgECBaBjJunG4cASxdeb+2J7cjAC8NQjt2+1kJ14BFDSgscCk5xLF0o7KhNqulIZS E40wUrkzCa5wlwb7l/6ef10N1Va081Ebthd0Y0cLKy/xlUMAId3y3y4So0vPqRTiZTsm QlFt/uXoGTnTytgIuKGqEXJ9p00gPXdEOJPM3dcjF0MBxbE1Us6FmeJo7pEqHfh68uiY oH8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m8UWdp9PuLjFVX/ZKTfZzTNzYdSZCCQshCtE2lGrS4o=; b=C0kzCwPaAR5V8t9bM9YIYg6FrnKWb1851i/mWhAbdDFBNR79Q8drqubKAhNSIdnEO/ gXd91AJMFl3ggaH3AyprCMjTx+xy19QrjjYqNtbQjaUVhJP0v/MPa5QAG71U/tNkd/74 HBjrVF8k2NCC70AJtia2N3vHSkm+EXh8v9YOFJSZA6903qmrSbveImxpIy9nPYSs0yxO fb4xYnhN4qHVMiDaxQIIlaeiEacn+Z1pOkokQo4gQhd5cFsIC5mk7NHl8y8ZwmhppNjb NsfwM1GqXOK8hSmejLpCu8OzO37ZjR6OX9P1DpKMq7h9SxTS498zqfKKr8WidoOX281Q mw5Q== X-Gm-Message-State: AOAM531C2eY7eFZv29CGyR0T3OQ4e/GkzUQOBpjen8QF5Qs2r2u4oUaB R/Z5T9Rx86fAN+W1ln1lmPx2Wf4hsc6r30aI96mgchmRs38= X-Google-Smtp-Source: ABdhPJzI21EK3Yq43FWhN/4lfadjCQNYfYd5Rd0jf2B91DVIxuUCH2pYCr6fPJYg3JqpQtyoVLqMiJe+X3q2s8W2Gn4= X-Received: by 2002:a6b:c9d8:: with SMTP id z207mr10469955iof.150.1633372067517; Mon, 04 Oct 2021 11:27:47 -0700 (PDT) MIME-Version: 1.0 References: <20210924105409.21711-1-eladv6@gmail.com> <3ae193df-292c-4907-df4a-88ce3d6735fc@intel.com> <1a17d552-8b81-04f9-7594-61e84ea7990f@intel.com> <8525082f-eb28-92db-11d3-ef4d24144be4@intel.com> In-Reply-To: From: Elad Nachman Date: Mon, 4 Oct 2021 21:27:35 +0300 Message-ID: To: Eric Christian Cc: Ferruh Yigit , dev , Igor Ryzhov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3, 4 =D7= =91=D7=90=D7=95=D7=A7=D7=B3 2021, 20:00, =D7=9E=D7=90=D7=AA Eric Christian = =E2=80=8F: > I am not sure that only we can recreate the KNI request overwrite. We ma= y > be the only ones with a current use case that exposes the vulnerability. > It is possible for any KNI operation to encounter this issue with the ne= w > async mechanism. As long as the call to kni_net_process_request() is a > separate thread from rte_kni_handle_request() this has the potential to > occur with the use of async requests. > > All you need is one async KNI request followed closely by a second KNI > request before the rte_kni_handle_request() has had a chance to process t= he > first request. > > The kernel dev driver simply returns the error value back to the caller i= f > it is less than zero. > > > Eric > What I did in order to attempt to recreate this issue was: Create a qemu VM with four cores. Run the KNI sample application. Then: 1. Run ifconfig vEth0_0 down followed immediately by ifconfig vEth0_0 up 2. Same as above but in a sample userspace C application which calls ioctl twice in a row to achieve the same. Both did not recreate the problem. The only way to recreate it IMHO is to either: run the ioctl from two parallel threads or perhaps assign RT fifo priority to the application calling ioctl in a row. Elad > > On Mon, Oct 4, 2021 at 12:19 PM Elad Nachman wrote: > >> >> >> On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit >> wrote: >> >>> On 10/4/2021 3:58 PM, Elad Nachman wrote: >>> > =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3,= 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:51, =D7=9E=D7=90=D7=AA Ferruh Yi= git =E2=80=8F< >>> > ferruh.yigit@intel.com>: >>> > >>> >> On 10/4/2021 3:25 PM, Elad Nachman wrote: >>> >> >>> >> Can you please try to not top post, it will make impossible to follo= w >>> this >>> >> discussion later from the mail archives. >>> >> >>> >>> 1. Userspace will get an error >>> >> >>> >> So there is nothing special with returning '-EAGAIN', user will only >>> >> observe an >>> >> error. >>> >> Wasn't initial intention to use '-EAGAIN' to try request again? >>> >> >>> > To signal user-space to retry the operation. >>> > >>> >>> Not sure if it will reach to the end user. If user is calling "ifconfig >>> >>> down", it will just fail right, it won't recognize the error type. >>> >>> Unless this is common usage by the Linux network drivers, having this >>> usage in >>> KNI won't help much. I am for handling this in the kernel side if we ca= n. >>> >>> >> If user calls ifconfig down it will not happen. It requires some >> multi-core race condition only Eric can recreate. >> >> >>> >> >>> >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl >>> unlocked >>> >>> for interface down command causes a crash because of a race >>> condition in >>> >>> the device delete/unregister list in the kernel. >>> >>> >>> >> >>> >> Why waiting with rthnl lock causes a deadlock? As said below we are >>> already >>> >> doing it, why it is different with retry logic? >>> >> >>> > Because it can be interface down request. >>> > >>> >>> (sure you like short answers) >>> >>> Please help me to see why "interface down" is special. Isn't it point o= f >>> your >>> patch to wait the request execution in the userspace even it is an asyn= c >>> request? >>> >>> And yet again, number of retry can be limited. >>> >>> >> No, it is not. Please look again: >> https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-eladv= 6@gmail.com/ >> >> >> >>> >>> > >>> >> I agree to not wait with rtnl unlocked. >>> >> >>> >>> FYI, >>> >>> >>> >>> Elad. >>> >>> >>> >>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7= =B3, 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:13, =D7=9E=D7=90=D7=AA Ferru= h Yigit =E2=80=8F< >>> >>> ferruh.yigit@intel.com>: >>> >>> >>> >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote: >>> >>>>> Hi, >>> >>>>> >>> >>>>> EAGAIN is propogated back to the kernel and to the caller. >>> >>>>> >>> >>>> >>> >>>> So will the user get an error, or it will be handled by the kernel >>> and >>> >>>> retried? >>> >>>> >>> >>>>> We cannot retry from the kni kernel module since we hold the rtnl >>> lock. >>> >>>>> >>> >>>> >>> >>>> Why not? We are already waiting until a command time out, like >>> >>>> 'kni_net_open()' >>> >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we >>> can >>> >>>> limit the >>> >>>> number of retry for safety. >>> >>>> >>> >>>>> FYI, >>> >>>>> >>> >>>>> Elad >>> >>>>> >>> >>>>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7= =B3, 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 16:05, =D7=9E=D7=90=D7=AA Ferru= h Yigit =E2=80=8F< >>> >>>>> ferruh.yigit@intel.com>: >>> >>>>> >>> >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote: >>> >>>>>>> Fix lack of multiple KNI requests handling support by >>> introducing a >>> >>>>>>> request in progress flag which will fail additional requests wi= th >>> >>>>>>> EAGAIN return code if the original request has not been process= ed >>> >>>>>>> by user-space. >>> >>>>>>> >>> >>>>>>> Bugzilla ID: 809 >>> >>>>>> >>> >>>>>> Hi Eric, >>> >>>>>> >>> >>>>>> Can you please test this patch, if it solves the issue you >>> reported? >>> >>>>>> >>> >>>>>>> >>> >>>>>>> Signed-off-by: Elad Nachman >>> >>>>>>> --- >>> >>>>>>> kernel/linux/kni/kni_net.c | 9 +++++++++ >>> >>>>>>> lib/kni/rte_kni.c | 2 ++ >>> >>>>>>> lib/kni/rte_kni_common.h | 1 + >>> >>>>>>> 3 files changed, 12 insertions(+) >>> >>>>>>> >>> >>>>>> >>> >>>>>> <...> >>> >>>>>> >>> >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device >>> *dev, >>> >>>>>> struct rte_kni_request *req) >>> >>>>>>> >>> >>>>>>> mutex_lock(&kni->sync_lock); >>> >>>>>>> >>> >>>>>>> + /* Check that existing request has been processed: */ >>> >>>>>>> + cur_req =3D (struct rte_kni_request *)kni->sync_kva; >>> >>>>>>> + if (cur_req->req_in_progress) { >>> >>>>>>> + ret =3D -EAGAIN; >>> >>>>>> >>> >>>>>> Overall logic in the KNI looks good to me, this helps to >>> serialize the >>> >>>>>> requests >>> >>>>>> even for async ones. >>> >>>>>> >>> >>>>>> But can you please clarify how it behaves in the kernel side wit= h >>> >>>> '-EAGAIN' >>> >>>>>> return type? Will linux call the ndo again, or will it just fail= . >>> >>>>>> >>> >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within >>> the >>> >> kni >>> >>>>>> module? >>> >>>>>> >>> >>>>>> >>> >>>> >>> >>>> >>> >> >>> >> Elad. >>> >>>