From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f194.google.com (mail-ot0-f194.google.com [74.125.82.194]) by dpdk.org (Postfix) with ESMTP id 59BE52BB1 for ; Tue, 28 Feb 2017 17:50:04 +0100 (CET) Received: by mail-ot0-f194.google.com with SMTP id k4so1689760otc.0 for ; Tue, 28 Feb 2017 08:50:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tL/VE1nqUfbo/B++jSyigsFijArCdK9OnReWWE9ZAoE=; b=TxLS0TV3ayBw1+rP6zwT6ZVGSLWWH8UP7F30gSxOWeMuDa9bc+xlYJsCUYQBvGzby7 2ObAesLsYdFokPEn0pFBf/7zoV6jNYqxkOQkyhGbj9jV9NYGQjlWG31iNjTujoftYx6A thxvXafcCrtMsl24/ufiaEuUeZGF82nnknM+NuDAuX2Jf2IZxzybMnyU885I4qiXgl9P AtajERDPfqZ4rwDm8KkwHHwcrAdkcrfejz65ZSjAeF2nTwmoyqKY9rz6Xif9sgtTZ3VI UeP1ojsOs72YybOe2lu82Pc11rtA5RqIAOg0cdrqn9J78RFUo6N+U2hOJmQZ2hT3a4ra vF5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tL/VE1nqUfbo/B++jSyigsFijArCdK9OnReWWE9ZAoE=; b=gTr9ebD40cmn7cLByaNHjq6zSnKe143MUJt94maxMt72klRG4JpdBp5qrIqo3NqeLa 9CdxN/fl+GxwdfuNXU7+aGXV4/a9ASxH8BPX6xUs3OD9FRTYVizFzNuAD5fAFu/38k8u eOC1/fePXwYYD5ln36i5+8A2zmTkARlHFW3k42o8DGkGecsaf6c5xwXD4toFyRd5eDXw Nw2zkC5cdlieS6VOUFGPqUJk1QYCfSrRFR5hH/zs1j0bpriFsyfcj2oCJiWgDVMCL5/z QUjTgok0sV0fGGOMPny069ohm7pBqQT6CLJ8BlzOT8Kuu4JHeTcd0mboAgDSuX8Wyoik YnvQ== X-Gm-Message-State: AMke39mBxxP6S6a9Oh0J9+iXdF1vI0RRVTJYL1MpJwRQHhu0cGAQN6RCMgm5BhP3z+plDJ68s+ySMJFIdmC6gw== X-Received: by 10.157.54.149 with SMTP id h21mr1952358otc.213.1488300603070; Tue, 28 Feb 2017 08:50:03 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.110.38 with HTTP; Tue, 28 Feb 2017 08:49:42 -0800 (PST) In-Reply-To: <860B4104-E1E7-4C9D-811A-7F2D5CF399E0@intel.com> References: <6F62EB32-671C-458D-B24B-268491C2DA1E@intel.com> <860B4104-E1E7-4C9D-811A-7F2D5CF399E0@intel.com> From: Sushil Adhikari Date: Tue, 28 Feb 2017 10:49:42 -0600 Message-ID: To: "Wiles, Keith" , ferruh.yigit@intel.com Cc: "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] BUG: unable to handle kernel paging request X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2017 16:50:04 -0000 I tried to print a byte @ the data_kva address and it fails right there without printing anything. Thank you Keith for your help and support. @ferruh.yigit can you please look in to this problem and suggest me some idea on what could be wrong. I will summarize my progress and finding so far I was trying to run DPDK KNI application with dpdk version 16.07.2, For that I first unbinded the ports from ixgbe and binded them to igb_uio module with following command echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id I compiled the kni application for target machine with Linux version 4.4.20 (sushila@dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017 and when I ran the application it hung with the following message Feb 28 10:09:37 (none) user.alert kernel: [ 87.029554] BUG: unable to handle kernel paging request at 0000077e1d012900 Feb 28 10:09:37 (none) user.alert kernel: [ 87.029695] IP: [] kni_net_rx_normal+0x2e2/0x440 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.029801] PGD 0 Feb 28 10:09:37 (none) user.warn kernel: [ 87.029889] Oops: 0000 [#1] SMP Feb 28 10:09:37 (none) user.warn kernel: [ 87.030010] Modules linked in: rte_kni(O) igb_uio(O) Feb 28 10:09:37 (none) user.warn kernel: [ 87.030167] CPU: 7 PID: 709 Comm: kni_single Tainted: G IO 4.4.20 #1 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030242] Hardware name: /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030320] task: ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030395] RIP: 0010:[] [] kni_net_rx_normal+0x2e2/0x440 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.030517] RSP: 0018:ffff8805a7ae3d30 EFLAGS: 00010286 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030576] RAX: 0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030639] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030701] RBP: ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe Feb 28 10:09:37 (none) user.warn kernel: [ 87.030766] R10: 00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030829] R13: ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030893] FS: 0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000 Feb 28 10:09:37 (none) user.warn kernel: [ 87.030971] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 28 10:09:37 (none) user.warn kernel: [ 87.031031] CR2: 0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0 Feb 28 10:09:37 (none) user.warn kernel: [ 87.031094] Stack: Feb 28 10:09:37 (none) user.warn kernel: [ 87.031148] ffff88062fcf5940 ffff8805a8ad8560 0000000000000000 ffff88060000054e Feb 28 10:09:37 (none) user.warn kernel: [ 87.031367] 0000077e1d012900 00000000b8c12800 00000000b8c11ec0 00000000b8c11580 Feb 28 10:09:37 (none) user.warn kernel: [ 87.031587] 00000000b8c10c40 00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080 Feb 28 10:09:37 (none) user.warn kernel: [ 87.031811] Call Trace: Feb 28 10:09:37 (none) user.warn kernel: [ 87.031871] [] kni_net_rx+0xf/0x20 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.031937] [] kni_thread_single+0x45/0xb0 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.032004] [] ? kni_init_net+0x50/0x50 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.032067] [] kthread+0xdb/0x100 Feb 28 10:09:37 (none) user.warn kernel: [ 87.032125] [] ? kthread_park+0x60/0x60 Feb 28 10:09:37 (none) user.warn kernel: [ 87.032186] [] ret_from_fork+0x3f/0x70 Feb 28 10:09:37 (none) user.warn kernel: [ 87.032246] [] ? kthread_park+0x60/0x60 Feb 28 10:09:37 (none) user.warn kernel: [ 87.032306] Code: 48 89 85 d0 fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e1 48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6 01 e8 e4 e8 11 e1 e9 5e fe ff ff Feb 28 10:09:37 (none) user.alert kernel: [ 87.034742] RIP [] kni_net_rx_normal+0x2e2/0x440 [rte_kni] Feb 28 10:09:37 (none) user.warn kernel: [ 87.034844] RSP Feb 28 10:09:37 (none) user.warn kernel: [ 87.034900] CR2: 0000077e1d012900 Feb 28 10:09:37 (none) user.warn kernel: [ 87.034956] ---[ end trace 5b31765eb0372d51 ]--- In there I saw it was failing somewhere in kni_net_rx_normal() function of kni_net.c file. So I narrowed down the line of code where it was failing and it came to line 169 where the memcpy happens Next I tried to print some addresses in that function and it gave me kva data addresses: data_kva 0000077e1d012900 kva->buff_add 00007f7e1d012880 kva->data_off 128 kni->mbuf_va (null) and kni->mbuf_kva ffff880000000000 Next I tried to see if I can print the data in data_kva address and it failed there, so it looks like it fails when I try to access data_kva @ 0000077e1d012900, I guess address is wrong, I dont know why, Can you give me some idea on this or some things to try out to debug the problem. Thank you Sushil On Tue, Feb 28, 2017 at 10:00 AM, Wiles, Keith wrote: > > > On Feb 28, 2017, at 9:42 AM, Sushil Adhikari > wrote: > > > > Since printf is not working here I'm using printk, do you mean whether > skb_put(skb, len) fails or the memcpy fails( line 169)? I had separated t= he > line 169 in to two one for skb_put and another just memcpy, and it doesn'= t > fail on skb_put so its memory copy that what causing the fail. Since the > memory location of data_skv and the location in "BUG: unable to handle > kernel paging request at 000007529d212900" matches I thought the data_skv > address is not correct or something. > > If you try printing the a byte or word at the data_kva address does the > printf fail? > > If not try looping on the address read every 128 bytes and see how far yo= u > get. If it is the first address then I am guessing the mbuf_kva address i= s > bad. Then we need to look in the MAINTAINERS file and email the maintaine= r > directly to see if he knows what is happening. > > > > > On Tue, Feb 28, 2017 at 9:34 AM, Wiles, Keith > wrote: > > > > > On Feb 28, 2017, at 9:30 AM, Sushil Adhikari > wrote: > > > > > > its failing at data_kva address because this is where I'm getting the > kernel paging request fail > > > BUG: unable to handle kernel paging request at 000007529d212900 > > > and this is what my debug shows > > > kva data addresses: data_kva 000007529d212900, kva->buff_add > 00007f529d212880, kva->data_off 128, kni->mbuf_va (null), and > kni->mbuf_kva ffff880000000000 > > > > I was thinking of using GDB to dump memory or use printf to see which > one is failing. > > > > > > > > I'm not sure how to verify that these are normal > > > > > > On Mon, Feb 27, 2017 at 4:41 PM, Wiles, Keith > wrote: > > > > > > > On Feb 27, 2017, at 4:22 PM, Sushil Adhikari > wrote: > > > > > > > > I narrowed it to location where it was failing, its coming from > http://dpdk.org/browse/dpdk-stable/tree/lib/librte_eal/ > linuxapp/kni/kni_net.c?h=3Dv16.07.2 line 169, I am getting the value of l= en > to be 1358 from len=3Dkva->pkt_len; which seems right for ip packet and t= he > memory allocation from line 157 also seems to be working fine. when I pri= nt > the sizeof(*skb) or sizeof(struct sk_buff) its giving me 208, I guess I > dont know whether it should be the size we allocate from line 157, which = is > len + 2 =3D 1360 or its fixed size structure of 208 byte. I would appreci= ate > any insight. > > > > Linux version 4.4.20 (sushila@dev03) (gcc version 4.9.2 > (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017 > > > > > > Looks like we need to determine which address is failing the skb_put(= ) > or data_kva address. If the address that fails is at the end of the > skb_put() then I would think the len is wrong, meaning we are stepping on > memory just passed a page for the skb. If the address that fails is in th= e > data_kva then the calculations for that address are wrong in line 154. Yo= u > may want to printout the kva->data_off, buf_addr, mbuf_va and mbuf_kva to > verify these values seem normal. The data_off value should be reasonable = (I > guess) meaning with a 2K range. > > > > > > Also print out the two values skb_put() and data_kva. You can use gdb > to example the memory using the dump memory command. (is it x/nn ) > nn is the read width, but you could leave off the =E2=80=98/nn=E2=80=99 f= or the default. > > > > > > > > > > > Thank you > > > > Sushil > > > > > > > > On Sat, Feb 25, 2017 at 10:31 AM, Wiles, Keith < > keith.wiles@intel.com> wrote: > > > > > > > > > On Feb 24, 2017, at 8:07 AM, Sushil Adhikari > wrote: > > > > > > > > > > Resending because of unsupported email content type > > > > > > > > > > > > > > > yes hanging is the better word I guess, ctrl + c is not working > to actually stop the program. I also had display connected to the target > manchine and I have attached a picture that shows the messages in that > display that is where I saw "BUG:Unable to handle kernel paging request a= t > xxxxxx", which made me think that the program is in bad state. > > > > > > > > Sorry, I do not see why you are getting this message. All I can > suggest is to use GDB and see if you can determine why the message is > happening. > > > > > > > > > > > > > > info thread in gdb shows one thread running > > > > > Id Target Id Frame > > > > > * 1 LWP 843 "dpdkKni" 0x000000000044eaee in rte_kni_tx_burst (= ) > > > > > > > > > > On Thu, Feb 23, 2017 at 5:41 PM, Wiles, Keith < > keith.wiles@intel.com> wrote: > > > > > > > > > > > On Feb 23, 2017, at 2:38 PM, Sushil Adhikari < > sushil446@gmail.com> wrote: > > > > > > > > > > > > While trying to run dpdk Kni application I ran in to a problem, > with > > > > > > following error message > > > > > > BUG: unable to handle kernel paging request at 000007ffe2b92780 > > > > > > > > > > > > To run the application I first unbinded the ports from kernel > module and > > > > > > binded them to igb_uio > > > > > >> echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind > > > > > >> echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind > > > > > >> echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id > > > > > > > > > > > > I ran the application using gdb as > > > > > > > > > > > > [~]$ /root/gdb dpdkKni > > > > > > GNU gdb (crosstool-NG 1.20.0) 7.8 > > > > > > Copyright (C) 2014 Free Software Foundation, Inc. > > > > > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html > > > > > >> > > > > > > This is free software: you are free to change and redistribute > it. > > > > > > There is NO WARRANTY, to the extent permitted by law. Type > "show copying" > > > > > > and "show warranty" for details. > > > > > > This GDB was configured as "x86_64-unknown-linux-gnu". > > > > > > Type "show configuration" for configuration details. > > > > > > For bug reporting instructions, please see: > > > > > > . > > > > > > Find the GDB manual and other documentation resources online at= : > > > > > > . > > > > > > For help, type "help". > > > > > > Type "apropos word" to search for commands related to "word"... > > > > > > Reading symbols from dpdkKni...(no debugging symbols > found)...done. > > > > > > (gdb) Run dpdkKni -c 0x0f -n 4 -- -P -p 0x3 > --config=3D"(0,0,1),(1,2,3)" > > > > > > Starting program: /root/dpdkKni dpdkKni -c 0x0f -n 4 -- -P -p 0= x3 > > > > > > --config=3D"(0,0,1),(1,2,3)" > > > > > > warning: Could not load shared library symbols for > linux-vdso.so.1. > > > > > > Do you need "set solib-search-path" or "set sysroot"? > > > > > > warning: Unable to find libthread_db matching inferior's thread > library, > > > > > > thread debugging will not be available. > > > > > > EAL: Detected 4 lcore(s) > > > > > > EAL: Probing VFIO support... > > > > > > EAL: PCI device 0000:05:00.0 on NUMA socket -1 > > > > > > EAL: probe driver: 8086:1528 net_ixgbe > > > > > > EAL: PCI device 0000:05:00.1 on NUMA socket -1 > > > > > > EAL: probe driver: 8086:1528 net_ixgbe > > > > > > Address of pktmbuf_pool 0x7ffff5a7dec0 > > > > > > APP: Initialising port 0 ... > > > > > > KNI: pci: 05:00:00 8086:1528 > > > > > > kni created for port 0 with kni[i] address 0x7fff75638280 with = i > 0 > > > > > > APP: Initialising port 1 ... > > > > > > KNI: pci: 05:00:01 8086:1528 > > > > > > kni created for port 1 with kni[i] address 0x7fff75629e00 with = i > 0 > > > > > > APP: Lcore 1 is writing to port 0 > > > > > > APP: Lcore 2 is reading from port 1 > > > > > > APP: Lcore 3 is writing to port 1 > > > > > > APP: Lcore 0 is reading from port 0 > > > > > > ^C > > > > > > Program received signal SIGINT, Interrupt. > > > > > > > > > > The program did not crash or get a segfault, but you hit control-= c > which stopped the application. When you ran the application you started 4 > threads and this is why it would appear in different places when stopped. > > > > > > > > > > If the application is hanging then you can use control-C and then > do =E2=80=98info threads=E2=80=99 command to see the location of all thre= ads. You can use > the =E2=80=98thread X=E2=80=99 command to switch between threads. Please = check the command > usage here I am going from memory. > > > > > > > > > > I am not sure if the application has a -i option to get a command > line if so that maybe useful to enable, check the application to see if i= t > used cmdline feature. > > > > > > > > > > It maybe the application just sits running and you have to use > other tools or apps to send traffic on the KNI application, sorry I have > not really used the KNI example. > > > > > > > > > > > 0x000000000044e916 in rte_kni_tx_burst () > > > > > > (gdb) backtrace > > > > > > #0 0x000000000044e916 in rte_kni_tx_burst () > > > > > > #1 0x0000000000619758 in main_loop(void*) () > > > > > > #2 0x0000000000431183 in rte_eal_mp_remote_launch () > > > > > > #3 0x000000000040d312 in main () > > > > > > > > > > > > (this is where the program crashes) > > > > > > > > > > > > I tried to trace the crash with gdb(I am new to gdb) > > > > > > > > > > > > and when I do the backtrace it ends up in different functions > each time: > > > > > > this time it gave me rte_kni_tx_burst() > > > > > > > > > > > > I'm running latest dpdk version 17.02 and linux kernel is > > > > > > Linux version 4.4.20 (tcuser@cibuild08) (gcc version 4.9.2 > (crosstool-NG > > > > > > 1.20.0) ) #1 SMP > > > > > > > > > > > > I would appreciate any suggestion or insight regarding this > issue. > > > > > > > > > > Regards, > > > > > Keith > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > Keith > > > > > > > > > > > > > > Regards, > > > Keith > > > > > > > > > > Regards, > > Keith > > > > > > Regards, > Keith > >