DPDK patches and discussions
 help / color / mirror / Atom feed
From: Dan Gora <dg@adax.com>
To: Igor Ryzhov <iryzhov@nfware.com>
Cc: Elad Nachman <eladv6@gmail.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	 Ferruh Yigit <ferruh.yigit@intel.com>, dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4
Date: Mon, 1 Mar 2021 18:26:17 -0300
Message-ID: <CAGyogRZDMfAfM8_RbqCJeqbJk=jkrcPe-EdoSmHKmEbMW60s9Q@mail.gmail.com> (raw)
In-Reply-To: <CAGyogRbUzB9s4FuEbD+85KSepQGzTAE9M2MwzicKEpbQzp2X9g@mail.gmail.com>

This is from my git commit fixing this:

kni: separate releasing netdev from freeing KNI interface

    Currently the rte_kni kernel driver suffers from a problem where
    when the interface is released, it generates a callback to the DPDK
    application to change the interface state to Down.  However, after the
    DPDK application handles the callback and generates a response back to
    the kernel, the rte_kni driver cannot wake the thread which is asleep
    waiting for the response, because it is holding the kni_link_lock
    semaphore and it has already removed the 'struct kni_dev' from the
    list of interfaces to poll for responses.

    This means that if the KNI interface is in the Up state when
    rte_kni_release() is called, it will always sleep for three seconds
    until kni_net_release gives up waiting for a response from the DPDK
    application.

    To fix this, we must separate the step to release the kernel network
    interface from the steps to remove the KNI interface from the list
    of interfaces to poll.

    When the kernel network interface is removed with unregister_netdev(),
    if the interface is up, it will generate a callback to mark the
    interface down, which calls kni_net_release().  kni_net_release() will
    block waiting for the DPDK application to call rte_kni_handle_request()
    to handle the callback, but it also needs the thread in the KNI driver
    (either the per-dev thread for multi-thread or the per-driver thread)
    to call kni_net_poll_resp() in order to wake the thread sleeping in
    kni_net_release (actually kni_net_process_request()).

    So now, KNI interfaces should be removed as such:

    1) The user calls rte_kni_release().  This only unregisters the
    netdev in the kernel, but touches nothing else.  This allows all the
    threads to run which are necessary to handle the callback into the
    DPDK application to mark the interface down.

    2) The user stops the thread running rte_kni_handle_request().
    After rte_kni_release() has been called, there will be no more
    callbacks for that interface so it is not necessary.  It cannot be
    running at the same time that rte_kni_free() frees all of the FIFOs
    and DPDK memory for that KNI interface.

    3) The user calls rte_kni_free().  This performs the RTE_KNI_IOCTL_FREE
    ioctl which calls kni_ioctl_free().  This function removes the struct
    kni_dev from the list of interfaces to poll (and kills the per-dev
    kthread, if configured for multi-thread), then frees the memory in
    the FIFOs.

    Signed-off-by: Dan Gora <dg@adax.com>

I'm not sure that this is exactly the problem that you're seeing, but
it sounds like it to me.

thanks
dan

On Mon, Mar 1, 2021 at 5:27 PM Dan Gora <dg@adax.com> wrote:
>
> Hi All,
>
> Sorry to butt in on this, but I fixed this same issue about 3 years
> ago in my application, but I was never able to get the changes
> integrated and eventually just gave up trying.
>
> The rule with KNI is:
> 1) The app should have a separate control thread per rte_kni which
> just spins calling rte_kni_handle_request().  This ensures that other
> threads calling rte_kni_XXX functions will always get a response.
>
> 2) In order to deal with lockups and timeouts when closing the device, I sent
> patches which separated the closing process into two steps:
> rte_kni_release() which would unregister the underlying netdev, then
> rte_kni_free() which would free the KNI portions of the KNI device.
> When rte_kni_release() is called the kernel netdev is unregistered and
> a response is sent back to the application, the control thread calling
> rte_kni_handle_request() is still running, so the application will
> still get a response back from the kernel and not lock up, the
> application then kills the control thread so that
> rte_kni_handle_request() is not called again, then the application
> calls rte_kni_free() which frees all of the FIFOs and closes the
> device.
>
> If anyone is interested the patches are probably still floating around
> patchwork.  If not you can check them out here:
>
> https://github.com/danielgora/dpdk.git
>
> thanks-
> dan
>
> On Mon, Mar 1, 2021 at 5:10 AM Igor Ryzhov <iryzhov@nfware.com> wrote:
> >
> > Stephen,
> >
> > No, I don't have a better proposal, but I think it is not correct to change
> > the behavior of KNI (making link down without a real response).
> > Even though we know that communicating with userspace under rtnl_lock is a
> > bad idea, it works as it is for many years already.
> >
> > Elad,
> >
> > I agree with you that KNI should be removed from the main tree if it is not
> > possible to fix this __dev_close_many issue.
> > There were discussions about this multiple times already, but no one is
> > working on this AFAIK.
> > Last time the discussion was a month ago:
> > https://www.mail-archive.com/dev@dpdk.org/msg196033.html
> >
> > Igor
> >
> > On Fri, Feb 26, 2021 at 8:43 PM Elad Nachman <eladv6@gmail.com> wrote:
> >
> > > The way the kernel handles its locks and lists for the dev close many
> > > path, there is no way you can go around this with rtnl unlocked :
> > > "
> > >
> > > There is a race condition in __dev_close_many() processing the
> > > close_list while the application terminates.
> > > It looks like if two vEth devices are terminating,
> > > and one releases the rtnl lock, the other takes it,
> > > updating the close_list in an unstable state,
> > > causing the close_list to become a circular linked list,
> > > hence list_for_each_entry() will endlessly loop inside
> > > __dev_close_many() .
> > >
> > > "
> > > And I don't expect David Miller will bend the kernel networking for DPDK
> > > or KNI.
> > >
> > > But - Stephen - if you can personally convince David to accept a
> > > kernel patch which will separate the close_list locking mechanism to a
> > > separate (RCU?) lock, then I can introduce first a patch to the kernel
> > > which will add a lock for the close_list, this way rtnl can be
> > > unlocked for the if down case.
> > >
> > > After that kernel patch, your original patch + relocation of the sync
> > > mutex locking will do the job .
> > >lockups
> > > Otherwise, rtnl has to be kept locked all of the way for the if down
> > > case in order to prevent corruption causing a circular linked list out
> > > of the close_list, causing a hang in the kernel.
> > >lockups
> > > Currently, the rtnl lock is the only thing keeping the close_list from
> > > corruption.
> > >
> > > If you doubt rtnl cannot be unlocked for dev close path, you can
> > > consult David for his opinion, as I think it is critical to understand
> > > what the kernel can or cannot do, or expects to be done before we can
> > > unlock its locks as we wish inside rte_kni.ko .
> > >
> > > Otherwise, if we are still in disagreement on how to patch this set of
> > > problems, I think the responsible way around it is to completely
> > > remove kni from the main dpdk tree and move it to dpdk-kmods
> > > repository.
> > >
> > > I know BSD style open-source does not carry legal responsibility from
> > > the developers, but I think when a bunch of developers know a piece of
> > > code is highly buggy, they should not leave it for countless new users
> > > to bounce their head desperately against, if they cannot agree on a
> > > correct way to solve the bunch of problems, of which I think we all
> > > agree exist (we just do not agree on the proper solution or patch)...
> > >
> > > That's my two cents,
> > >
> > > Elad.
> > >
> > > On Fri, Feb 26, 2021 at 5:49 PM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:
> > > >
> > > > On Fri, 26 Feb 2021 00:01:01 +0300
> > > > Igor Ryzhov <iryzhov@nfware.com> wrote:
> > > >
> > > > > Hi Elad,
> > > > >
> > > > > Thanks for the patch, but this is still NACK from me.
> > > > >
> > > > > The only real advantage of KNI over other exceptional-path techniques
> > > > > like virtio-user is the ability to configure DPDK-managed interfaces
> > > > > directly
> > > > > from the kernel using well-known utils like iproute2. A very important
> > > part
> > > > > of this is getting responses from the DPDK app and knowing the actual
> > > > > result of command execution.
> > > > > If you're making async requests to the application and you don't know
> > > > > the result, then what's the point of using KNI at all?
> > > > >
> > > > > Igor
> > > >
> > > > Do you have a better proposal that keeps the request result but does not
> > > > call userspace with lock held.
> > > >
> > > > PS: I also have strong dislike of KNI, as designed it would have been
> > > rejected
> > > > by Linux kernel developers.  A better solution would be userspace
> > > version of
> > > > something like devlink devices. But doing control operations by proxy is
> > > > a locking nightmare.
> > >



-- 
Dan Gora
Software Engineer

Adax, Inc.
Rua Dona Maria Alves, 1070 Casa 5
Centro
Ubatuba, SP
CEP 11680-000
Brasil

Tel: +55 (12) 3833-1021  (Brazil and outside of US)
    : +1 (510) 859-4801  (Inside of US)
    : dan_gora (Skype)

email: dg@adax.com

  reply	other threads:[~2021-03-01 21:26 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-26 14:46 [dpdk-dev] [PATCH] kni: fix rtnl deadlocks and race conditions Elad Nachman
2021-02-19 18:41 ` Ferruh Yigit
2021-02-21  8:03   ` Elad Nachman
2021-02-22 15:58     ` Ferruh Yigit
2021-02-23 12:05 ` [dpdk-dev] [PATCH V2] kni: fix rtnl deadlocks and race conditions v2 Elad Nachman
2021-02-23 12:53   ` Ferruh Yigit
2021-02-23 13:44 ` [dpdk-dev] [PATCH 1/2] kni: fix rtnl deadlocks and race conditions v3 Elad Nachman
2021-02-23 13:45 ` [dpdk-dev] [PATCH 2/2] " Elad Nachman
2021-02-24 12:49   ` Igor Ryzhov
2021-02-24 13:33     ` Elad Nachman
2021-02-24 14:04       ` Igor Ryzhov
2021-02-24 14:06         ` Elad Nachman
2021-02-24 14:41           ` Igor Ryzhov
2021-02-24 14:56             ` Elad Nachman
2021-02-24 15:18               ` Igor Ryzhov
     [not found]                 ` <CACXF7qkhkzFc-=v=iiBzh2V7rLjk1U34VUfPbNrnYJND_0TKHQ@mail.gmail.com>
2021-02-24 16:31                   ` Igor Ryzhov
2021-02-24 15:54     ` Stephen Hemminger
2021-02-25 14:32 ` [dpdk-dev] [PATCH 1/2] kni: fix kernel deadlock when using mlx devices Elad Nachman
2021-02-25 14:32   ` [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4 Elad Nachman
2021-02-25 21:01     ` Igor Ryzhov
2021-02-26 15:48       ` Stephen Hemminger
2021-02-26 17:43         ` Elad Nachman
2021-03-01  8:10           ` Igor Ryzhov
2021-03-01 16:38             ` Stephen Hemminger
2021-03-15 16:58               ` Ferruh Yigit
2021-03-01 20:27             ` Dan Gora
2021-03-01 21:26               ` Dan Gora [this message]
2021-03-02 16:44                 ` Elad Nachman
2021-03-15 17:17     ` Ferruh Yigit
2021-03-16 18:35       ` Elad Nachman
2021-03-16 18:42         ` Ferruh Yigit
2021-03-15 17:17   ` [dpdk-dev] [PATCH 1/2] kni: fix kernel deadlock when using mlx devices Ferruh Yigit
2021-03-29 14:36 ` [dpdk-dev] [PATCH v5 1/3] kni: refactor user request processing Ferruh Yigit
2021-03-29 14:36   ` [dpdk-dev] [PATCH v5 2/3] kni: support async user request Ferruh Yigit
2021-03-29 14:36   ` [dpdk-dev] [PATCH v5 3/3] kni: fix kernel deadlock when using mlx devices Ferruh Yigit
2021-04-09 14:56     ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2021-04-12 14:35       ` Elad Nachman
2021-04-20 23:07         ` Thomas Monjalon
2021-04-23  8:41           ` Igor Ryzhov
2021-04-23  8:59             ` Ferruh Yigit
2021-04-23 12:43               ` Igor Ryzhov
2021-04-23 12:58                 ` Igor Ryzhov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGyogRZDMfAfM8_RbqCJeqbJk=jkrcPe-EdoSmHKmEbMW60s9Q@mail.gmail.com' \
    --to=dg@adax.com \
    --cc=dev@dpdk.org \
    --cc=eladv6@gmail.com \
    --cc=ferruh.yigit@intel.com \
    --cc=iryzhov@nfware.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git