DPDK usage discussions
 help / color / mirror / Atom feed
From: Sushil Adhikari <sushil446@gmail.com>
To: "Wiles, Keith" <keith.wiles@intel.com>, ferruh.yigit@intel.com
Cc: "users@dpdk.org" <users@dpdk.org>
Subject: Re: [dpdk-users] BUG: unable to handle kernel paging request
Date: Wed, 8 Mar 2017 08:23:34 -0600	[thread overview]
Message-ID: <CAPO9LfS5v6jA9e0WNfbrEDQpDwkTT+2=HR-7jJModXz6Wzm00A@mail.gmail.com> (raw)
In-Reply-To: <CAPO9LfRAFCuqVfMmr5ek-Fh_uzfreOowz5KdJ84qDcjHMYTh3w@mail.gmail.com>

Another update on this issue, the problem seems to because of 1024
hugepages of 2MB not being contiguous memory, I don't see the problem if I
use 4 1GB hugpeages. Any idea on what could be the reason for 1024
hugepages not being contiguous? Any suggestion on system configuration or
kernel configuration that might resolve the problem?

Thank you
Sushil

On Tue, Feb 28, 2017 at 10:49 AM, Sushil Adhikari <sushil446@gmail.com>
wrote:

> I tried to print a byte @ the data_kva address and it fails right there
> without printing anything.
>
> Thank you Keith for your help and support.
>
> @ferruh.yigit can you please look in to this problem and suggest me some
> idea on what could be wrong.
>
> I will summarize my progress and finding so far
>
> I was trying to run DPDK KNI application with dpdk version 16.07.2,
>
> For that I first unbinded the ports from ixgbe and binded them to igb_uio
> module with following command
>
> echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
> echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
> echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id
>
> I compiled the kni application for target machine with Linux version
> 4.4.20 (sushila@dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP
> Fri Feb 24 14:32:28 CST 2017
>
> and when I ran the application it hung with the following message
>
> Feb 28 10:09:37 (none) user.alert kernel: [   87.029554] BUG: unable to
> handle kernel paging request at 0000077e1d012900
> Feb 28 10:09:37 (none) user.alert kernel: [   87.029695] IP:
> [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.029801] PGD 0
> Feb 28 10:09:37 (none) user.warn kernel: [   87.029889] Oops: 0000 [#1] SMP
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030010] Modules linked in:
> rte_kni(O) igb_uio(O)
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030167] CPU: 7 PID: 709
> Comm: kni_single Tainted: G          IO    4.4.20 #1
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030242] Hardware name:
>              /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030320] task:
> ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030395] RIP:
> 0010:[<ffffffffa0033722>]  [<ffffffffa0033722>]
> kni_net_rx_normal+0x2e2/0x440 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030517] RSP:
> 0018:ffff8805a7ae3d30  EFLAGS: 00010286
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030576] RAX:
> 0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030639] RDX:
> 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030701] RBP:
> ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030766] R10:
> 00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030829] R13:
> ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030893] FS:
>  0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000
> Feb 28 10:09:37 (none) user.warn kernel: [   87.030971] CS:  0010 DS: 0000
> ES: 0000 CR0: 000000008005003b
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031031] CR2:
> 0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031094] Stack:
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031148]  ffff88062fcf5940
> ffff8805a8ad8560 0000000000000000 ffff88060000054e
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031367]  0000077e1d012900
> 00000000b8c12800 00000000b8c11ec0 00000000b8c11580
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031587]  00000000b8c10c40
> 00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031811] Call Trace:
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031871]
>  [<ffffffffa00343af>] kni_net_rx+0xf/0x20 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.031937]
>  [<ffffffffa0032f05>] kni_thread_single+0x45/0xb0 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032004]
>  [<ffffffffa0032ec0>] ? kni_init_net+0x50/0x50 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032067]
>  [<ffffffff8107b7cb>] kthread+0xdb/0x100
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032125]
>  [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032186]
>  [<ffffffff81834c2f>] ret_from_fork+0x3f/0x70
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032246]
>  [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
> Feb 28 10:09:37 (none) user.warn kernel: [   87.032306] Code: 48 89 85 d0
> fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e1
> 48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6
> 01 e8 e4 e8 11 e1 e9 5e fe ff ff
> Feb 28 10:09:37 (none) user.alert kernel: [   87.034742] RIP
>  [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
> Feb 28 10:09:37 (none) user.warn kernel: [   87.034844]  RSP
> <ffff8805a7ae3d30>
> Feb 28 10:09:37 (none) user.warn kernel: [   87.034900] CR2:
> 0000077e1d012900
> Feb 28 10:09:37 (none) user.warn kernel: [   87.034956] ---[ end trace
> 5b31765eb0372d51 ]---
>
> In there I saw it was failing somewhere in kni_net_rx_normal() function of
> kni_net.c file.
>
> So I narrowed down the line of code where it was failing and it came to
> line 169 where the memcpy happens
> Next I tried to print some addresses in that function and it gave me
> kva data addresses: data_kva 0000077e1d012900 kva->buff_add
> 00007f7e1d012880 kva->data_off 128 kni->mbuf_va  (null) and kni->mbuf_kva
> ffff880000000000
> Next I tried to see if I can print the data in data_kva address and it
> failed there, so it looks like it fails when I try to access data_kva @
> 0000077e1d012900, I guess address is wrong, I dont know why, Can you give
> me some idea on this or some things to try out to debug the problem.
>
>
> Thank you
>
> Sushil
>
> On Tue, Feb 28, 2017 at 10:00 AM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
>>
>> > On Feb 28, 2017, at 9:42 AM, Sushil Adhikari <sushil446@gmail.com>
>> wrote:
>> >
>> > Since printf is not working here I'm using printk, do you mean whether
>> skb_put(skb, len) fails or the memcpy fails( line 169)? I had separated the
>> line 169 in to two one for skb_put and another just memcpy, and it doesn't
>> fail on skb_put so its memory copy that what causing the fail. Since the
>> memory location of data_skv and the location in "BUG: unable to handle
>> kernel paging request at 000007529d212900" matches I thought the data_skv
>> address is not correct or something.
>>
>> If you try printing the a byte or word at the data_kva address does the
>> printf fail?
>>
>> If not try looping on the address read every 128 bytes and see how far
>> you get. If it is the first address then I am guessing the mbuf_kva address
>> is bad. Then we need to look in the MAINTAINERS file and email the
>> maintainer directly to see if he knows what is happening.
>>
>> >
>> > On Tue, Feb 28, 2017 at 9:34 AM, Wiles, Keith <keith.wiles@intel.com>
>> wrote:
>> >
>> > > On Feb 28, 2017, at 9:30 AM, Sushil Adhikari <sushil446@gmail.com>
>> wrote:
>> > >
>> > > its failing at data_kva address because this is where I'm getting the
>> kernel paging request fail
>> > > BUG: unable to handle kernel paging request at 000007529d212900
>> > > and this is what my debug shows
>> > > kva data addresses: data_kva 000007529d212900, kva->buff_add
>> 00007f529d212880, kva->data_off 128, kni->mbuf_va  (null), and
>> kni->mbuf_kva ffff880000000000
>> >
>> > I was thinking of using GDB to dump memory or use printf to see which
>> one is failing.
>> >
>> > >
>> > > I'm not sure how to verify that these are normal
>> > >
>> > > On Mon, Feb 27, 2017 at 4:41 PM, Wiles, Keith <keith.wiles@intel.com>
>> wrote:
>> > >
>> > > > On Feb 27, 2017, at 4:22 PM, Sushil Adhikari <sushil446@gmail.com>
>> wrote:
>> > > >
>> > > > I narrowed it to location where it was failing, its coming from
>> http://dpdk.org/browse/dpdk-stable/tree/lib/librte_eal/linux
>> app/kni/kni_net.c?h=v16.07.2 line 169, I am getting the value of len to
>> be 1358 from len=kva->pkt_len; which seems right for ip packet and the
>> memory allocation from line 157 also seems to be working fine. when I print
>> the sizeof(*skb) or sizeof(struct sk_buff) its giving me 208, I guess I
>> dont know whether it should be the size we allocate from line 157, which is
>> len + 2 = 1360 or its fixed size structure of 208 byte. I would appreciate
>> any insight.
>> > > > Linux version 4.4.20 (sushila@dev03) (gcc version 4.9.2
>> (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017
>> > >
>> > > Looks like we need to determine which address is failing the
>> skb_put() or data_kva address. If the address that fails is at the end of
>> the skb_put() then I would think the len is wrong, meaning we are stepping
>> on memory just passed a page for the skb. If the address that fails is in
>> the data_kva then the calculations for that address are wrong in line 154.
>> You may want to printout the kva->data_off, buf_addr, mbuf_va and mbuf_kva
>> to verify these values seem normal. The data_off value should be reasonable
>> (I guess) meaning with a 2K range.
>> > >
>> > > Also print out the two values skb_put() and data_kva. You can use gdb
>> to example the memory using the dump memory command. (is it x/nn <address>)
>> nn is the read width, but you could leave off the ‘/nn’ for the default.
>> > >
>> > > >
>> > > > Thank you
>> > > > Sushil
>> > > >
>> > > > On Sat, Feb 25, 2017 at 10:31 AM, Wiles, Keith <
>> keith.wiles@intel.com> wrote:
>> > > >
>> > > > > On Feb 24, 2017, at 8:07 AM, Sushil Adhikari <sushil446@gmail.com>
>> wrote:
>> > > > >
>> > > > > Resending because of unsupported email content type
>> > > > >
>> > > > >
>> > > > > yes hanging is the better word I guess,  ctrl + c is not working
>> to actually stop the program. I also had display connected to the target
>> manchine and I have attached a picture that shows the messages in that
>> display that is where I saw "BUG:Unable to handle kernel paging request at
>> xxxxxx", which made me think that the program is in bad state.
>> > > >
>> > > > Sorry, I do not see why you are getting this message. All I can
>> suggest is to use GDB and see if you can determine why the message is
>> happening.
>> > > >
>> > > > >
>> > > > > info thread in gdb shows one thread running
>> > > > > Id   Target Id         Frame
>> > > > > * 1    LWP 843 "dpdkKni" 0x000000000044eaee in rte_kni_tx_burst ()
>> > > > >
>> > > > > On Thu, Feb 23, 2017 at 5:41 PM, Wiles, Keith <
>> keith.wiles@intel.com> wrote:
>> > > > >
>> > > > > > On Feb 23, 2017, at 2:38 PM, Sushil Adhikari <
>> sushil446@gmail.com> wrote:
>> > > > > >
>> > > > > > While trying to run dpdk Kni application I ran in to a problem,
>> with
>> > > > > > following error message
>> > > > > > BUG: unable to handle kernel paging request at 000007ffe2b92780
>> > > > > >
>> > > > > > To run the application I first unbinded the ports from kernel
>> module and
>> > > > > > binded them to igb_uio
>> > > > > >> echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
>> > > > > >> echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
>> > > > > >> echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id
>> > > > > >
>> > > > > > I ran the application using gdb as
>> > > > > >
>> > > > > > [~]$ /root/gdb dpdkKni
>> > > > > > GNU gdb (crosstool-NG 1.20.0) 7.8
>> > > > > > Copyright (C) 2014 Free Software Foundation, Inc.
>> > > > > > License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html
>> > > > > >>
>> > > > > > This is free software: you are free to change and redistribute
>> it.
>> > > > > > There is NO WARRANTY, to the extent permitted by law.  Type
>> "show copying"
>> > > > > > and "show warranty" for details.
>> > > > > > This GDB was configured as "x86_64-unknown-linux-gnu".
>> > > > > > Type "show configuration" for configuration details.
>> > > > > > For bug reporting instructions, please see:
>> > > > > > <http://www.gnu.org/software/gdb/bugs/>.
>> > > > > > Find the GDB manual and other documentation resources online at:
>> > > > > > <http://www.gnu.org/software/gdb/documentation/>.
>> > > > > > For help, type "help".
>> > > > > > Type "apropos word" to search for commands related to "word"...
>> > > > > > Reading symbols from dpdkKni...(no debugging symbols
>> found)...done.
>> > > > > > (gdb) Run dpdkKni -c 0x0f -n 4 -- -P -p 0x3
>> --config="(0,0,1),(1,2,3)"
>> > > > > > Starting program: /root/dpdkKni dpdkKni -c 0x0f -n 4 -- -P -p
>> 0x3
>> > > > > > --config="(0,0,1),(1,2,3)"
>> > > > > > warning: Could not load shared library symbols for
>> linux-vdso.so.1.
>> > > > > > Do you need "set solib-search-path" or "set sysroot"?
>> > > > > > warning: Unable to find libthread_db matching inferior's thread
>> library,
>> > > > > > thread debugging will not be available.
>> > > > > > EAL: Detected 4 lcore(s)
>> > > > > > EAL: Probing VFIO support...
>> > > > > > EAL: PCI device 0000:05:00.0 on NUMA socket -1
>> > > > > > EAL:   probe driver: 8086:1528 net_ixgbe
>> > > > > > EAL: PCI device 0000:05:00.1 on NUMA socket -1
>> > > > > > EAL:   probe driver: 8086:1528 net_ixgbe
>> > > > > > Address of pktmbuf_pool 0x7ffff5a7dec0
>> > > > > > APP: Initialising port 0 ...
>> > > > > > KNI: pci: 05:00:00       8086:1528
>> > > > > > kni created for port 0 with kni[i] address 0x7fff75638280 with
>> i 0
>> > > > > > APP: Initialising port 1 ...
>> > > > > > KNI: pci: 05:00:01       8086:1528
>> > > > > > kni created for port 1 with kni[i] address 0x7fff75629e00 with
>> i 0
>> > > > > > APP: Lcore 1 is writing to port 0
>> > > > > > APP: Lcore 2 is reading from port 1
>> > > > > > APP: Lcore 3 is writing to port 1
>> > > > > > APP: Lcore 0 is reading from port 0
>> > > > > > ^C
>> > > > > > Program received signal SIGINT, Interrupt.
>> > > > >
>> > > > > The program did not crash or get a segfault, but you hit
>> control-c which stopped the application. When you ran the application you
>> started 4 threads and this is why it would appear in different places when
>> stopped.
>> > > > >
>> > > > > If the application is hanging then you can use control-C and then
>> do ‘info threads’ command to see the location of all threads. You can use
>> the ‘thread X’ command to switch between threads. Please check the command
>> usage here I am going from memory.
>> > > > >
>> > > > > I am not sure if the application has a -i option to get a command
>> line if so that maybe useful to enable, check the application to see if it
>> used cmdline feature.
>> > > > >
>> > > > > It maybe the application just sits running and you have to use
>> other tools or apps to send traffic on the KNI application, sorry I have
>> not really used the KNI example.
>> > > > >
>> > > > > > 0x000000000044e916 in rte_kni_tx_burst ()
>> > > > > > (gdb) backtrace
>> > > > > > #0  0x000000000044e916 in rte_kni_tx_burst ()
>> > > > > > #1  0x0000000000619758 in main_loop(void*) ()
>> > > > > > #2  0x0000000000431183 in rte_eal_mp_remote_launch ()
>> > > > > > #3  0x000000000040d312 in main ()
>> > > > > >
>> > > > > > (this is where the program crashes)
>> > > > > >
>> > > > > > I tried to trace the crash with gdb(I am new to gdb)
>> > > > > >
>> > > > > > and when I do the backtrace it ends up in different functions
>> each time:
>> > > > > > this time it gave me rte_kni_tx_burst()
>> > > > > >
>> > > > > > I'm running latest dpdk version 17.02 and linux kernel is
>> > > > > > Linux version 4.4.20 (tcuser@cibuild08) (gcc version 4.9.2
>> (crosstool-NG
>> > > > > > 1.20.0) ) #1 SMP
>> > > > > >
>> > > > > > I would appreciate any suggestion or insight regarding this
>> issue.
>> > > > >
>> > > > > Regards,
>> > > > > Keith
>> > > > >
>> > > > >
>> > > > > <kni.jpg>
>> > > >
>> > > > Regards,
>> > > > Keith
>> > > >
>> > > >
>> > >
>> > > Regards,
>> > > Keith
>> > >
>> > >
>> >
>> > Regards,
>> > Keith
>> >
>> >
>>
>> Regards,
>> Keith
>>
>>
>

      reply	other threads:[~2017-03-08 14:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 22:38 Sushil Adhikari
2017-02-23 23:41 ` Wiles, Keith
2017-02-24 15:35   ` Sushil Adhikari
2017-02-24 16:07   ` Sushil Adhikari
2017-02-25 16:31     ` Wiles, Keith
2017-02-27 22:22       ` Sushil Adhikari
2017-02-27 22:41         ` Wiles, Keith
2017-02-28 15:30           ` Sushil Adhikari
2017-02-28 15:34             ` Wiles, Keith
2017-02-28 15:42               ` Sushil Adhikari
2017-02-28 16:00                 ` Wiles, Keith
2017-02-28 16:49                   ` Sushil Adhikari
2017-03-08 14:23                     ` Sushil Adhikari [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPO9LfS5v6jA9e0WNfbrEDQpDwkTT+2=HR-7jJModXz6Wzm00A@mail.gmail.com' \
    --to=sushil446@gmail.com \
    --cc=ferruh.yigit@intel.com \
    --cc=keith.wiles@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).