From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 3A72CA054F;
	Mon,  1 Mar 2021 22:26:55 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id DFBEA4069E;
	Mon,  1 Mar 2021 22:26:54 +0100 (CET)
Received: from mail-io1-f44.google.com (mail-io1-f44.google.com
 [209.85.166.44]) by mails.dpdk.org (Postfix) with ESMTP id 04BFD40041
 for <dev@dpdk.org>; Mon,  1 Mar 2021 22:26:53 +0100 (CET)
Received: by mail-io1-f44.google.com with SMTP id o9so4502994iow.6
 for <dev@dpdk.org>; Mon, 01 Mar 2021 13:26:53 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=mz2OQJv30T3xRhS8OxrAIBz1TedFm5VmysiCO9+6QxM=;
 b=kOAKv3OGqyoiHYdfvhzqqHzBqDoS8YUXJN9Ctqjj9WD/W2TGQ5iSKtRHeoQTmE6CVP
 6s+RDcDVr53FdaZ59rDpmtuQ1wKCagIVFN59rR01ltHhT6XdYAqra7BDXxWn74X6CiHr
 ZWuPzdbj1exImifEe+53JXnRArn9yQwpWILZrvHaBPaYrcY+4ggDsNlVrc4qWlijQgpD
 2TefGcXUizHxq7+3Z/Mhv6cIVrN2acRtp93Ph4u9xVACfKlPAiR9sVPwLPs71PGVPXGY
 0ec1ZWHTw7x0tH6N0CccuoozkxLva2xv0MN8T2fOjqT7nZE6xEE+zHsAsj5kAw6dUkwa
 lRsQ==
X-Gm-Message-State: AOAM532VWLNrxLRFSe2LM6FV1EU2OOKr/M9UJywtJl49QQWX/rBNHEAt
 OwExIy+ne+T4MykXkGESWF3KKYbi/ZJLr2eXKso=
X-Google-Smtp-Source: ABdhPJzUJGkrnAdsiH9SRn530KcDDAGCF2MAkD9ClSEeg5hoVLpKA7YpelzGIo4OIjcuVBXtd44O9Q3XEniAHkT5gLA=
X-Received: by 2002:a5d:93c2:: with SMTP id j2mr15829208ioo.166.1614634013222; 
 Mon, 01 Mar 2021 13:26:53 -0800 (PST)
MIME-Version: 1.0
References: <20201126144613.4986-1-eladv6@gmail.com>
 <20210225143239.14220-1-eladv6@gmail.com>
 <20210225143239.14220-2-eladv6@gmail.com>
 <CAF+s_Fx54vCh5dR-HGGmBZVbOHZkAez+aW4W74KBMC62qp2VTg@mail.gmail.com>
 <20210226074817.7e9e0a71@hermes.local>
 <CACXF7qk=Jozo9ahfmjX5yNdBvFBNeaEKCpCCq5KoyZVuyknbjA@mail.gmail.com>
 <CAF+s_FwHiHs9TkS90UpLBtzowFo-c667nWVOF+y1p8Qn_-_5WQ@mail.gmail.com>
 <CAGyogRbUzB9s4FuEbD+85KSepQGzTAE9M2MwzicKEpbQzp2X9g@mail.gmail.com>
In-Reply-To: <CAGyogRbUzB9s4FuEbD+85KSepQGzTAE9M2MwzicKEpbQzp2X9g@mail.gmail.com>
From: Dan Gora <dg@adax.com>
Date: Mon, 1 Mar 2021 18:26:17 -0300
Message-ID: <CAGyogRZDMfAfM8_RbqCJeqbJk=jkrcPe-EdoSmHKmEbMW60s9Q@mail.gmail.com>
To: Igor Ryzhov <iryzhov@nfware.com>
Cc: Elad Nachman <eladv6@gmail.com>,
 Stephen Hemminger <stephen@networkplumber.org>, 
 Ferruh Yigit <ferruh.yigit@intel.com>, dev <dev@dpdk.org>
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race
 conditions v4
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

This is from my git commit fixing this:

kni: separate releasing netdev from freeing KNI interface

    Currently the rte_kni kernel driver suffers from a problem where
    when the interface is released, it generates a callback to the DPDK
    application to change the interface state to Down.  However, after the
    DPDK application handles the callback and generates a response back to
    the kernel, the rte_kni driver cannot wake the thread which is asleep
    waiting for the response, because it is holding the kni_link_lock
    semaphore and it has already removed the 'struct kni_dev' from the
    list of interfaces to poll for responses.

    This means that if the KNI interface is in the Up state when
    rte_kni_release() is called, it will always sleep for three seconds
    until kni_net_release gives up waiting for a response from the DPDK
    application.

    To fix this, we must separate the step to release the kernel network
    interface from the steps to remove the KNI interface from the list
    of interfaces to poll.

    When the kernel network interface is removed with unregister_netdev(),
    if the interface is up, it will generate a callback to mark the
    interface down, which calls kni_net_release().  kni_net_release() will
    block waiting for the DPDK application to call rte_kni_handle_request()
    to handle the callback, but it also needs the thread in the KNI driver
    (either the per-dev thread for multi-thread or the per-driver thread)
    to call kni_net_poll_resp() in order to wake the thread sleeping in
    kni_net_release (actually kni_net_process_request()).

    So now, KNI interfaces should be removed as such:

    1) The user calls rte_kni_release().  This only unregisters the
    netdev in the kernel, but touches nothing else.  This allows all the
    threads to run which are necessary to handle the callback into the
    DPDK application to mark the interface down.

    2) The user stops the thread running rte_kni_handle_request().
    After rte_kni_release() has been called, there will be no more
    callbacks for that interface so it is not necessary.  It cannot be
    running at the same time that rte_kni_free() frees all of the FIFOs
    and DPDK memory for that KNI interface.

    3) The user calls rte_kni_free().  This performs the RTE_KNI_IOCTL_FREE
    ioctl which calls kni_ioctl_free().  This function removes the struct
    kni_dev from the list of interfaces to poll (and kills the per-dev
    kthread, if configured for multi-thread), then frees the memory in
    the FIFOs.

    Signed-off-by: Dan Gora <dg@adax.com>

I'm not sure that this is exactly the problem that you're seeing, but
it sounds like it to me.

thanks
dan

On Mon, Mar 1, 2021 at 5:27 PM Dan Gora <dg@adax.com> wrote:
>
> Hi All,
>
> Sorry to butt in on this, but I fixed this same issue about 3 years
> ago in my application, but I was never able to get the changes
> integrated and eventually just gave up trying.
>
> The rule with KNI is:
> 1) The app should have a separate control thread per rte_kni which
> just spins calling rte_kni_handle_request().  This ensures that other
> threads calling rte_kni_XXX functions will always get a response.
>
> 2) In order to deal with lockups and timeouts when closing the device, I sent
> patches which separated the closing process into two steps:
> rte_kni_release() which would unregister the underlying netdev, then
> rte_kni_free() which would free the KNI portions of the KNI device.
> When rte_kni_release() is called the kernel netdev is unregistered and
> a response is sent back to the application, the control thread calling
> rte_kni_handle_request() is still running, so the application will
> still get a response back from the kernel and not lock up, the
> application then kills the control thread so that
> rte_kni_handle_request() is not called again, then the application
> calls rte_kni_free() which frees all of the FIFOs and closes the
> device.
>
> If anyone is interested the patches are probably still floating around
> patchwork.  If not you can check them out here:
>
> https://github.com/danielgora/dpdk.git
>
> thanks-
> dan
>
> On Mon, Mar 1, 2021 at 5:10 AM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > Stephen,
> >
> > No, I don't have a better proposal, but I think it is not correct to change
> > the behavior of KNI (making link down without a real response).
> > Even though we know that communicating with userspace under rtnl_lock is a
> > bad idea, it works as it is for many years already.
> >
> > Elad,
> >
> > I agree with you that KNI should be removed from the main tree if it is not
> > possible to fix this __dev_close_many issue.
> > There were discussions about this multiple times already, but no one is
> > working on this AFAIK.
> > Last time the discussion was a month ago:
> > https://www.mail-archive.com/dev@dpdk.org/msg196033.html
> >
> > Igor
> >
> > On Fri, Feb 26, 2021 at 8:43 PM Elad Nachman <eladv6@gmail.com> wrote:
> >
> > > The way the kernel handles its locks and lists for the dev close many
> > > path, there is no way you can go around this with rtnl unlocked :
> > > "
> > >
> > > There is a race condition in __dev_close_many() processing the
> > > close_list while the application terminates.
> > > It looks like if two vEth devices are terminating,
> > > and one releases the rtnl lock, the other takes it,
> > > updating the close_list in an unstable state,
> > > causing the close_list to become a circular linked list,
> > > hence list_for_each_entry() will endlessly loop inside
> > > __dev_close_many() .
> > >
> > > "
> > > And I don't expect David Miller will bend the kernel networking for DPDK
> > > or KNI.
> > >
> > > But - Stephen - if you can personally convince David to accept a
> > > kernel patch which will separate the close_list locking mechanism to a
> > > separate (RCU?) lock, then I can introduce first a patch to the kernel
> > > which will add a lock for the close_list, this way rtnl can be
> > > unlocked for the if down case.
> > >
> > > After that kernel patch, your original patch + relocation of the sync
> > > mutex locking will do the job .
> > >lockups
> > > Otherwise, rtnl has to be kept locked all of the way for the if down
> > > case in order to prevent corruption causing a circular linked list out
> > > of the close_list, causing a hang in the kernel.
> > >lockups
> > > Currently, the rtnl lock is the only thing keeping the close_list from
> > > corruption.
> > >
> > > If you doubt rtnl cannot be unlocked for dev close path, you can
> > > consult David for his opinion, as I think it is critical to understand
> > > what the kernel can or cannot do, or expects to be done before we can
> > > unlock its locks as we wish inside rte_kni.ko .
> > >
> > > Otherwise, if we are still in disagreement on how to patch this set of
> > > problems, I think the responsible way around it is to completely
> > > remove kni from the main dpdk tree and move it to dpdk-kmods
> > > repository.
> > >
> > > I know BSD style open-source does not carry legal responsibility from
> > > the developers, but I think when a bunch of developers know a piece of
> > > code is highly buggy, they should not leave it for countless new users
> > > to bounce their head desperately against, if they cannot agree on a
> > > correct way to solve the bunch of problems, of which I think we all
> > > agree exist (we just do not agree on the proper solution or patch)...
> > >
> > > That's my two cents,
> > >
> > > Elad.
> > >
> > > On Fri, Feb 26, 2021 at 5:49 PM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:
> > > >
> > > > On Fri, 26 Feb 2021 00:01:01 +0300
> > > > Igor Ryzhov <iryzhov@nfware.com> wrote:
> > > >
> > > > > Hi Elad,
> > > > >
> > > > > Thanks for the patch, but this is still NACK from me.
> > > > >
> > > > > The only real advantage of KNI over other exceptional-path techniques
> > > > > like virtio-user is the ability to configure DPDK-managed interfaces
> > > > > directly
> > > > > from the kernel using well-known utils like iproute2. A very important
> > > part
> > > > > of this is getting responses from the DPDK app and knowing the actual
> > > > > result of command execution.
> > > > > If you're making async requests to the application and you don't know
> > > > > the result, then what's the point of using KNI at all?
> > > > >
> > > > > Igor
> > > >
> > > > Do you have a better proposal that keeps the request result but does not
> > > > call userspace with lock held.
> > > >
> > > > PS: I also have strong dislike of KNI, as designed it would have been
> > > rejected
> > > > by Linux kernel developers.  A better solution would be userspace
> > > version of
> > > > something like devlink devices. But doing control operations by proxy is
> > > > a locking nightmare.
> > >


-- 
Dan Gora
Software Engineer

Adax, Inc.
Rua Dona Maria Alves, 1070 Casa 5
Centro
Ubatuba, SP
CEP 11680-000
Brasil

Tel: +55 (12) 3833-1021  (Brazil and outside of US)
    : +1 (510) 859-4801  (Inside of US)
    : dan_gora (Skype)

email: dg@adax.com