DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jay Rolette <rolette@infiniteio.com>
To: Matthew Hall <mhall@mhcomputing.net>
Cc: Dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
Date: Tue, 17 Feb 2015 09:57:38 -0600	[thread overview]
Message-ID: <CADNuJVr0X4fDn6=gqFvKHKvm2=Jyo77754wm4+WO9ihjCLc_Ag@mail.gmail.com> (raw)
In-Reply-To: <20150217010024.GB30617@mhcomputing.net>

On Mon, Feb 16, 2015 at 7:00 PM, Matthew Hall <mhall@mhcomputing.net> wrote:

> On Mon, Feb 16, 2015 at 10:33:52AM -0600, Jay Rolette wrote:
> > In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
> > netif_rx(). The source for netif_receive_skb() point out that it should
> > only be called from soft-irq context, which isn't the case for KNI.
>
> For the uninitiated among us, what was the practical effect of the coding
> error? Waiting forever for a lock which will never be available in IRQ
> context, or causing unintended re-entrancy, or what?
>

Sadly, I'm not really one of the enlightened ones when it comes to the
Linux kernel. VxWorks? sure. Linux kernel? learning as required.

I didn't chase it down to the specific mechanism in this case. Unusual for
me, but this time I took the expedient route of finding a likely
explanation plus Yao's fix on that same code with his explanation of a
deadlock and went with it. It'll be a few more days before we've had enough
run time on it to absolutely confirm (not an easy bug to repro).

If I get hand-wavy about it, my assumption is that the requirement for
netif_receive_skb() be called in soft-irq context means it doesn't expect
to be pre-empted or rentrant.  When you call netif_rx() instead, it puts
the skb on the backlog and it gets processed from there. Part of that code
disables interrupts during part of the processing. Not sure what else is
coming in and actually deadlocking things.

Honestly, I don't understand enough details of how everything works in the
Linux network stack yet. I've done tons of work on the network path of
stack-less systems, a bit of work in device drivers, but have only touched
the surface of the internals of Linux network stack. The meat of my product
avoids that like the plague because it is too slow.

Sorry, lots of words but not much light being shed this time...
Jay

      reply	other threads:[~2015-02-17 15:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-11  1:33 Jay Rolette
2015-02-11 16:25 ` Alejandro Lucero
2015-02-16 13:32   ` Jay Rolette
2015-02-16 16:33 ` Jay Rolette
2015-02-17  1:00   ` Matthew Hall
2015-02-17 15:57     ` Jay Rolette [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADNuJVr0X4fDn6=gqFvKHKvm2=Jyo77754wm4+WO9ihjCLc_Ag@mail.gmail.com' \
    --to=rolette@infiniteio.com \
    --cc=dev@dpdk.org \
    --cc=mhall@mhcomputing.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).