DPDK patches and discussions
 help / color / mirror / Atom feed
From: Paul Barrette <paul.barrette@windriver.com>
To: Stefan Baranoff <sbaranoff@gmail.com>, <dev@dpdk.org>
Subject: Re: [dpdk-dev] Random mbuf corruption
Date: Fri, 20 Jun 2014 09:59:58 -0400	[thread overview]
Message-ID: <53A43E5E.3030809@windriver.com> (raw)
In-Reply-To: <CAHzKxpZUOVKbCYTb66D8cQbm0ceSt7rfYo6VU3f2qhi2ZBvytQ@mail.gmail.com>


On 06/20/2014 07:20 AM, Stefan Baranoff wrote:
> All,
>
> We are seeing 'random' memory corruption in mbufs coming from the ixgbe UIO
> driver and I am looking for some pointers on debugging it. Our software was
> running flawlessly for weeks at a time on our old Westmere systems (CentOS
> 6.4) but since moving to a new Sandy Bridge v2 server (also CentOS 6.4) it
> runs for 1-2 minutes and then at least one mbuf is overwritten with
> arbitrary data (pointers/lengths/RSS value/num segs/etc. are all
> ridiculous). Both servers are using the 82599EB chipset (x520) and the DPDK
> version (1.6.0r2) is identical. We recently also tested on a third server
> running RHEL 6.4 with the same hardware as the failing Sandy Bridge based
> system and it is fine (days of runtime no failures).
>
> Running all of this in GDB with 'record' enabled and setting a watchpoint
> on the address which contains the corrupted data and executing a
> 'reverse-continue' never hits the watchpoint [GDB newbie here -- assuming
> 'watch *(uint64_t*)0x7FB.....' should work]. My first thought was memory
> corruption but the BIOS memcheck on the ECC RAM shows no issues.
>
> Also looking at mbuf->pkt.data, as an example, the corrupt value was the
> same 6/12 trials but I could not find that value elsewhere in the processes
> memory. This doesn't seem "random" and points to a software bug but I
> cannot for the life of me get GDB to tell me where the program is when that
> memory is written to. Incidentally trying this with the PCAP driver and
> --no-huge to run valgrind shows no memory access errors/uninitialized
> values/etc.
>
> Thoughts? Pointers? Ways to rule in/out hardware other than going 1 by 1
> removing each of the 24 DIMMs?
>
> Thanks so much in advance!
> Stefan
Run memtest to rule out bad ram.

Pb

  reply	other threads:[~2014-06-20 13:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHzKxpaxCbt9d+njdBBpwSy069zLfsOvQ5Dx0CzXLNVMKQ9AaQ@mail.gmail.com>
     [not found] ` <CAHzKxpaNvZkH9h0kqYJd8VoYEXqBUfhSX9V_zUro2oX_-ioAAw@mail.gmail.com>
2014-06-20 11:20   ` Stefan Baranoff
2014-06-20 13:59     ` Paul Barrette [this message]
2014-06-23 21:43       ` Stefan Baranoff
2014-06-24  8:05         ` Gray, Mark D
2014-06-24 10:48         ` Neil Horman
2014-06-24 11:01           ` Olivier MATZ
2014-06-25  1:31             ` Stefan Baranoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A43E5E.3030809@windriver.com \
    --to=paul.barrette@windriver.com \
    --cc=dev@dpdk.org \
    --cc=sbaranoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).