From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 179EAA0C52;
	Mon,  4 Oct 2021 20:27:50 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 8BC6F41336;
	Mon,  4 Oct 2021 20:27:49 +0200 (CEST)
Received: from mail-io1-f48.google.com (mail-io1-f48.google.com
 [209.85.166.48]) by mails.dpdk.org (Postfix) with ESMTP id 4C2A54068F
 for <dev@dpdk.org>; Mon,  4 Oct 2021 20:27:48 +0200 (CEST)
Received: by mail-io1-f48.google.com with SMTP id y197so21414627iof.11
 for <dev@dpdk.org>; Mon, 04 Oct 2021 11:27:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=m8UWdp9PuLjFVX/ZKTfZzTNzYdSZCCQshCtE2lGrS4o=;
 b=IW5grMZAKJlbXCtTjrwQhOsEEs/wgo5zLOfyF21Or+wnu4NCX8cfL5irLsLaMnCPr+
 GHJf+0waF4cun7Xh6eImQDk7DVI5D+lF2e61ulnHHe0A+G55t2CDurL7tK+vXp/6Q0Y9
 pUBcgECBaBjJunG4cASxdeb+2J7cjAC8NQjt2+1kJ14BFDSgscCk5xLF0o7KhNqulIZS
 E40wUrkzCa5wlwb7l/6ef10N1Va081Ebthd0Y0cLKy/xlUMAId3y3y4So0vPqRTiZTsm
 QlFt/uXoGTnTytgIuKGqEXJ9p00gPXdEOJPM3dcjF0MBxbE1Us6FmeJo7pEqHfh68uiY
 oH8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=m8UWdp9PuLjFVX/ZKTfZzTNzYdSZCCQshCtE2lGrS4o=;
 b=C0kzCwPaAR5V8t9bM9YIYg6FrnKWb1851i/mWhAbdDFBNR79Q8drqubKAhNSIdnEO/
 gXd91AJMFl3ggaH3AyprCMjTx+xy19QrjjYqNtbQjaUVhJP0v/MPa5QAG71U/tNkd/74
 HBjrVF8k2NCC70AJtia2N3vHSkm+EXh8v9YOFJSZA6903qmrSbveImxpIy9nPYSs0yxO
 fb4xYnhN4qHVMiDaxQIIlaeiEacn+Z1pOkokQo4gQhd5cFsIC5mk7NHl8y8ZwmhppNjb
 NsfwM1GqXOK8hSmejLpCu8OzO37ZjR6OX9P1DpKMq7h9SxTS498zqfKKr8WidoOX281Q
 mw5Q==
X-Gm-Message-State: AOAM531C2eY7eFZv29CGyR0T3OQ4e/GkzUQOBpjen8QF5Qs2r2u4oUaB
 R/Z5T9Rx86fAN+W1ln1lmPx2Wf4hsc6r30aI96mgchmRs38=
X-Google-Smtp-Source: ABdhPJzI21EK3Yq43FWhN/4lfadjCQNYfYd5Rd0jf2B91DVIxuUCH2pYCr6fPJYg3JqpQtyoVLqMiJe+X3q2s8W2Gn4=
X-Received: by 2002:a6b:c9d8:: with SMTP id
 z207mr10469955iof.150.1633372067517; 
 Mon, 04 Oct 2021 11:27:47 -0700 (PDT)
MIME-Version: 1.0
References: <20210924105409.21711-1-eladv6@gmail.com>
 <f52600ed-3d80-457d-b096-8814850990f3@intel.com>
 <CACXF7qkpzZiSbtDf8YFU_vVNLc6ytmjXh4yUe+xHhYUnOdzyRQ@mail.gmail.com>
 <3ae193df-292c-4907-df4a-88ce3d6735fc@intel.com>
 <CACXF7qna7NexGNtpmjoC6k75V4=iOXQA7nTuwAC0cWwOEZu_DA@mail.gmail.com>
 <1a17d552-8b81-04f9-7594-61e84ea7990f@intel.com>
 <CACXF7q=p_GSsQEbRwW+547vEsGT2j+8TyB=ViOuzOp6HNm7Ksw@mail.gmail.com>
 <8525082f-eb28-92db-11d3-ef4d24144be4@intel.com>
 <CACXF7qnqp=hdbvZSq5psbmcaTdYTPvXV0N7rqirvpEGPKBNpGg@mail.gmail.com>
 <CAFbSC55tJhKD0KzMbpbxeDwAj=SKj=vZJcm6Zf0dWCQL7AY=pQ@mail.gmail.com>
In-Reply-To: <CAFbSC55tJhKD0KzMbpbxeDwAj=SKj=vZJcm6Zf0dWCQL7AY=pQ@mail.gmail.com>
From: Elad Nachman <eladv6@gmail.com>
Date: Mon, 4 Oct 2021 21:27:35 +0300
Message-ID: <CACXF7qkoC-8EkYY7WrAqAK7pUDUYZV4yONc-VZVUr-zXtnfdFg@mail.gmail.com>
To: Eric Christian <erclists@gmail.com>
Cc: Ferruh Yigit <ferruh.yigit@intel.com>, dev <dev@dpdk.org>, 
 Igor Ryzhov <iryzhov@nfware.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
Subject: Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3, 4 =D7=
=91=D7=90=D7=95=D7=A7=D7=B3 2021, 20:00, =D7=9E=D7=90=D7=AA Eric Christian =
=E2=80=8F<erclists@gmail.com
>:

> I am not sure that only we can recreate the KNI request overwrite.  We ma=
y
> be the only ones with a current use case that exposes the vulnerability.
>  It is possible for any KNI operation to encounter this issue with the ne=
w
> async mechanism.  As long as the call to kni_net_process_request() is a
> separate thread from rte_kni_handle_request() this has the potential to
> occur with the use of async requests.
>
> All you need is one async KNI request followed closely by a second KNI
> request before the rte_kni_handle_request() has had a chance to process t=
he
> first request.
>
> The kernel dev driver simply returns the error value back to the caller i=
f
> it is less than zero.
>
>
> Eric
>

What I did in order to attempt to recreate this issue was:

Create a qemu VM with four cores.
Run the KNI sample application.

Then:

1. Run ifconfig vEth0_0 down followed immediately by ifconfig vEth0_0 up

2. Same as above but in a sample userspace C application which calls ioctl
twice in a row to achieve the same.


Both did not recreate the problem.

The only way to recreate it IMHO is to either:
 run the ioctl from two parallel threads or perhaps assign RT fifo priority
to the application calling ioctl in a row.

Elad


>
> On Mon, Oct 4, 2021 at 12:19 PM Elad Nachman <eladv6@gmail.com> wrote:
>
>>
>>
>> On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit <ferruh.yigit@intel.com>
>> wrote:
>>
>>> On 10/4/2021 3:58 PM, Elad Nachman wrote:
>>> > =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3,=
 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:51, =D7=9E=D7=90=D7=AA Ferruh Yi=
git =E2=80=8F<
>>> > ferruh.yigit@intel.com>:
>>> >
>>> >> On 10/4/2021 3:25 PM, Elad Nachman wrote:
>>> >>
>>> >> Can you please try to not top post, it will make impossible to follo=
w
>>> this
>>> >> discussion later from the mail archives.
>>> >>
>>> >>> 1. Userspace will get an error
>>> >>
>>> >> So there is nothing special with returning '-EAGAIN', user will only
>>> >> observe an
>>> >> error.
>>> >> Wasn't initial intention to use '-EAGAIN' to try request again?
>>> >>
>>> > To signal user-space to retry the operation.
>>> >
>>>
>>> Not sure if it will reach to the end user. If user is calling "ifconfig
>>> <iface>
>>> down", it will just fail right, it won't recognize the error type.
>>>
>>> Unless this is common usage by the Linux network drivers, having this
>>> usage in
>>> KNI won't help much. I am for handling this in the kernel side if we ca=
n.
>>>
>>>
>> If user calls ifconfig <iface> down it will not happen. It requires some
>> multi-core race condition only Eric can recreate.
>>
>>
>>> >>
>>> >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl
>>> unlocked
>>> >>> for interface down command causes a crash because of a race
>>> condition in
>>> >>> the device delete/unregister list in the kernel.
>>> >>>
>>> >>
>>> >> Why waiting with rthnl lock causes a deadlock? As said below we are
>>> already
>>> >> doing it, why it is different with retry logic?
>>> >>
>>> > Because it can be interface down request.
>>> >
>>>
>>> (sure you like short answers)
>>>
>>> Please help me to see why "interface down" is special. Isn't it point o=
f
>>> your
>>> patch to wait the request execution in the userspace even it is an asyn=
c
>>> request?
>>>
>>> And yet again, number of retry can be limited.
>>>
>>>
>> No, it is not. Please look again:
>> https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-eladv=
6@gmail.com/
>>
>>
>>
>>>
>>> >
>>> >> I agree to not wait with rtnl unlocked.
>>> >>
>>> >>> FYI,
>>> >>>
>>> >>> Elad.
>>> >>>
>>> >>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=
=B3, 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 17:13, =D7=9E=D7=90=D7=AA Ferru=
h Yigit =E2=80=8F<
>>> >>> ferruh.yigit@intel.com>:
>>> >>>
>>> >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote:
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>> EAGAIN is propogated back to the kernel and to the caller.
>>> >>>>>
>>> >>>>
>>> >>>> So will the user get an error, or it will be handled by the kernel
>>> and
>>> >>>> retried?
>>> >>>>
>>> >>>>> We cannot retry from the kni kernel module since we hold the rtnl
>>> lock.
>>> >>>>>
>>> >>>>
>>> >>>> Why not? We are already waiting until a command time out, like
>>> >>>> 'kni_net_open()'
>>> >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we
>>> can
>>> >>>> limit the
>>> >>>> number of retry for safety.
>>> >>>>
>>> >>>>> FYI,
>>> >>>>>
>>> >>>>> Elad
>>> >>>>>
>>> >>>>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=
=B3, 4 =D7=91=D7=90=D7=95=D7=A7=D7=B3 2021, 16:05, =D7=9E=D7=90=D7=AA Ferru=
h Yigit =E2=80=8F<
>>> >>>>> ferruh.yigit@intel.com>:
>>> >>>>>
>>> >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote:
>>> >>>>>>> Fix lack of multiple KNI requests handling support by
>>> introducing a
>>> >>>>>>> request in progress flag which will fail additional requests wi=
th
>>> >>>>>>> EAGAIN return code if the original request has not been process=
ed
>>> >>>>>>> by user-space.
>>> >>>>>>>
>>> >>>>>>> Bugzilla ID: 809
>>> >>>>>>
>>> >>>>>> Hi Eric,
>>> >>>>>>
>>> >>>>>> Can you please test this patch, if it solves the issue you
>>> reported?
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Signed-off-by: Elad Nachman <eladv6@gmail.com>
>>> >>>>>>> ---
>>> >>>>>>>  kernel/linux/kni/kni_net.c | 9 +++++++++
>>> >>>>>>>  lib/kni/rte_kni.c          | 2 ++
>>> >>>>>>>  lib/kni/rte_kni_common.h   | 1 +
>>> >>>>>>>  3 files changed, 12 insertions(+)
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> <...>
>>> >>>>>>
>>> >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device
>>> *dev,
>>> >>>>>> struct rte_kni_request *req)
>>> >>>>>>>
>>> >>>>>>>       mutex_lock(&kni->sync_lock);
>>> >>>>>>>
>>> >>>>>>> +     /* Check that existing request has been processed: */
>>> >>>>>>> +     cur_req =3D (struct rte_kni_request *)kni->sync_kva;
>>> >>>>>>> +     if (cur_req->req_in_progress) {
>>> >>>>>>> +             ret =3D -EAGAIN;
>>> >>>>>>
>>> >>>>>> Overall logic in the KNI looks good to me, this helps to
>>> serialize the
>>> >>>>>> requests
>>> >>>>>> even for async ones.
>>> >>>>>>
>>> >>>>>> But can you please clarify how it behaves in the kernel side wit=
h
>>> >>>> '-EAGAIN'
>>> >>>>>> return type? Will linux call the ndo again, or will it just fail=
.
>>> >>>>>>
>>> >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within
>>> the
>>> >> kni
>>> >>>>>> module?
>>> >>>>>>
>>> >>>>>>
>>> >>>>
>>> >>>>
>>> >>
>>> >> Elad.
>>>
>>>