From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f180.google.com (mail-ot0-f180.google.com [74.125.82.180]) by dpdk.org (Postfix) with ESMTP id C4AAF2C71 for ; Wed, 8 Mar 2017 15:23:55 +0100 (CET) Received: by mail-ot0-f180.google.com with SMTP id o24so31527507otb.1 for ; Wed, 08 Mar 2017 06:23:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=S/tJ8D2q0OgVG080CKs6AB0WqQHPCy07jvCwCvoO4e4=; b=j1mJn3Qx/8LOnwHNyEy4pzn0v7VBopibE0TM6bnmvMRZtVcgBzneXbcVNA5LJoJtT4 dMLApgXGSnH4GZIWv4zwHAbzz3JJj1UtcLTY0tRLp2Wa+XstVzc7XdaDI/aY07Skj3q1 zPxCwvQpWXQjkv7It6WHmxmf8lkRqRoX1eng9gAHYjqTwlJf2I7aWSNVse7Zlbod2HbC tDImATJu8I/x5+qRGnNNw9SeCGv9FB74r+XYpCd7eaGmq2B/0nEoNXtzOi/icsKCUtYF IhJROSVvw3a6rB0Y7ZC7atgU5jKtb+Cwf0jdDQtp/ADlbEAmLmb5CMy/yHODWrt4VFwH KK+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=S/tJ8D2q0OgVG080CKs6AB0WqQHPCy07jvCwCvoO4e4=; b=QQIMoKsqPsb4uj9GmAbPn7Abk7T8bnlOweX67UC+w1V8Y2qzwqQRwcVpcxdNuCBNa/ Tjp+DQ/J0J8JNUroIpSuCI2wHApIA3ZdcCf5MAe02gTotPELiP20sVPu044GdT3wGt0T Ecskcyg4UIhuniyRBORi8wBTx5yNfZv0vtk2ttMlsXqeaDljdfyxf4EVgVtj8ysSQYTP IPlYyDeB6Sw8BMFfsKfQlZm7wLdahe/22FrJLvFPIM34sxgf45EJqWzhVVsIJB48sYwX Ix5mQI7rgBSMvkAt5BsENOcJlw129zn1ym8QcfW4Ojc59kIAK6/RhfKAj6G/wrDDTdAN 9eKQ== X-Gm-Message-State: AMke39n87epJLg1IBm/+tEDrZIst9XlIgskj+aBQfztwkVvq8DaKBWc7lF+eBZLyqUGJqYSJL06qFb7TkmqPFw== X-Received: by 10.157.12.40 with SMTP id 37mr3430359otr.92.1488983034986; Wed, 08 Mar 2017 06:23:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.110.38 with HTTP; Wed, 8 Mar 2017 06:23:34 -0800 (PST) In-Reply-To: References: <6F62EB32-671C-458D-B24B-268491C2DA1E@intel.com> <860B4104-E1E7-4C9D-811A-7F2D5CF399E0@intel.com> From: Sushil Adhikari Date: Wed, 8 Mar 2017 08:23:34 -0600 Message-ID: To: "Wiles, Keith" , ferruh.yigit@intel.com Cc: "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] BUG: unable to handle kernel paging request X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2017 14:23:56 -0000 Another update on this issue, the problem seems to because of 1024 hugepages of 2MB not being contiguous memory, I don't see the problem if I use 4 1GB hugpeages. Any idea on what could be the reason for 1024 hugepages not being contiguous? Any suggestion on system configuration or kernel configuration that might resolve the problem? Thank you Sushil On Tue, Feb 28, 2017 at 10:49 AM, Sushil Adhikari wrote: > I tried to print a byte @ the data_kva address and it fails right there > without printing anything. > > Thank you Keith for your help and support. > > @ferruh.yigit can you please look in to this problem and suggest me some > idea on what could be wrong. > > I will summarize my progress and finding so far > > I was trying to run DPDK KNI application with dpdk version 16.07.2, > > For that I first unbinded the ports from ixgbe and binded them to igb_uio > module with following command > > echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind > echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind > echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id > > I compiled the kni application for target machine with Linux version > 4.4.20 (sushila@dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP > Fri Feb 24 14:32:28 CST 2017 > > and when I ran the application it hung with the following message > > Feb 28 10:09:37 (none) user.alert kernel: [ 87.029554] BUG: unable to > handle kernel paging request at 0000077e1d012900 > Feb 28 10:09:37 (none) user.alert kernel: [ 87.029695] IP: > [] kni_net_rx_normal+0x2e2/0x440 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.029801] PGD 0 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.029889] Oops: 0000 [#1] S= MP > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030010] Modules linked in= : > rte_kni(O) igb_uio(O) > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030167] CPU: 7 PID: 709 > Comm: kni_single Tainted: G IO 4.4.20 #1 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030242] Hardware name: > /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030320] task: > ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030395] RIP: > 0010:[] [] > kni_net_rx_normal+0x2e2/0x440 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030517] RSP: > 0018:ffff8805a7ae3d30 EFLAGS: 00010286 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030576] RAX: > 0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030639] RDX: > 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030701] RBP: > ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030766] R10: > 00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030829] R13: > ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030893] FS: > 0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.030971] CS: 0010 DS: 000= 0 > ES: 0000 CR0: 000000008005003b > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031031] CR2: > 0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031094] Stack: > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031148] ffff88062fcf5940 > ffff8805a8ad8560 0000000000000000 ffff88060000054e > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031367] 0000077e1d012900 > 00000000b8c12800 00000000b8c11ec0 00000000b8c11580 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031587] 00000000b8c10c40 > 00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031811] Call Trace: > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031871] > [] kni_net_rx+0xf/0x20 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.031937] > [] kni_thread_single+0x45/0xb0 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032004] > [] ? kni_init_net+0x50/0x50 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032067] > [] kthread+0xdb/0x100 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032125] > [] ? kthread_park+0x60/0x60 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032186] > [] ret_from_fork+0x3f/0x70 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032246] > [] ? kthread_park+0x60/0x60 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.032306] Code: 48 89 85 d0 > fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e= 1 > 48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6 > 01 e8 e4 e8 11 e1 e9 5e fe ff ff > Feb 28 10:09:37 (none) user.alert kernel: [ 87.034742] RIP > [] kni_net_rx_normal+0x2e2/0x440 [rte_kni] > Feb 28 10:09:37 (none) user.warn kernel: [ 87.034844] RSP > > Feb 28 10:09:37 (none) user.warn kernel: [ 87.034900] CR2: > 0000077e1d012900 > Feb 28 10:09:37 (none) user.warn kernel: [ 87.034956] ---[ end trace > 5b31765eb0372d51 ]--- > > In there I saw it was failing somewhere in kni_net_rx_normal() function o= f > kni_net.c file. > > So I narrowed down the line of code where it was failing and it came to > line 169 where the memcpy happens > Next I tried to print some addresses in that function and it gave me > kva data addresses: data_kva 0000077e1d012900 kva->buff_add > 00007f7e1d012880 kva->data_off 128 kni->mbuf_va (null) and kni->mbuf_kva > ffff880000000000 > Next I tried to see if I can print the data in data_kva address and it > failed there, so it looks like it fails when I try to access data_kva @ > 0000077e1d012900, I guess address is wrong, I dont know why, Can you give > me some idea on this or some things to try out to debug the problem. > > > Thank you > > Sushil > > On Tue, Feb 28, 2017 at 10:00 AM, Wiles, Keith > wrote: > >> >> > On Feb 28, 2017, at 9:42 AM, Sushil Adhikari >> wrote: >> > >> > Since printf is not working here I'm using printk, do you mean whether >> skb_put(skb, len) fails or the memcpy fails( line 169)? I had separated = the >> line 169 in to two one for skb_put and another just memcpy, and it doesn= 't >> fail on skb_put so its memory copy that what causing the fail. Since the >> memory location of data_skv and the location in "BUG: unable to handle >> kernel paging request at 000007529d212900" matches I thought the data_sk= v >> address is not correct or something. >> >> If you try printing the a byte or word at the data_kva address does the >> printf fail? >> >> If not try looping on the address read every 128 bytes and see how far >> you get. If it is the first address then I am guessing the mbuf_kva addr= ess >> is bad. Then we need to look in the MAINTAINERS file and email the >> maintainer directly to see if he knows what is happening. >> >> > >> > On Tue, Feb 28, 2017 at 9:34 AM, Wiles, Keith >> wrote: >> > >> > > On Feb 28, 2017, at 9:30 AM, Sushil Adhikari >> wrote: >> > > >> > > its failing at data_kva address because this is where I'm getting th= e >> kernel paging request fail >> > > BUG: unable to handle kernel paging request at 000007529d212900 >> > > and this is what my debug shows >> > > kva data addresses: data_kva 000007529d212900, kva->buff_add >> 00007f529d212880, kva->data_off 128, kni->mbuf_va (null), and >> kni->mbuf_kva ffff880000000000 >> > >> > I was thinking of using GDB to dump memory or use printf to see which >> one is failing. >> > >> > > >> > > I'm not sure how to verify that these are normal >> > > >> > > On Mon, Feb 27, 2017 at 4:41 PM, Wiles, Keith >> wrote: >> > > >> > > > On Feb 27, 2017, at 4:22 PM, Sushil Adhikari >> wrote: >> > > > >> > > > I narrowed it to location where it was failing, its coming from >> http://dpdk.org/browse/dpdk-stable/tree/lib/librte_eal/linux >> app/kni/kni_net.c?h=3Dv16.07.2 line 169, I am getting the value of len t= o >> be 1358 from len=3Dkva->pkt_len; which seems right for ip packet and the >> memory allocation from line 157 also seems to be working fine. when I pr= int >> the sizeof(*skb) or sizeof(struct sk_buff) its giving me 208, I guess I >> dont know whether it should be the size we allocate from line 157, which= is >> len + 2 =3D 1360 or its fixed size structure of 208 byte. I would apprec= iate >> any insight. >> > > > Linux version 4.4.20 (sushila@dev03) (gcc version 4.9.2 >> (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017 >> > > >> > > Looks like we need to determine which address is failing the >> skb_put() or data_kva address. If the address that fails is at the end o= f >> the skb_put() then I would think the len is wrong, meaning we are steppi= ng >> on memory just passed a page for the skb. If the address that fails is i= n >> the data_kva then the calculations for that address are wrong in line 15= 4. >> You may want to printout the kva->data_off, buf_addr, mbuf_va and mbuf_k= va >> to verify these values seem normal. The data_off value should be reasona= ble >> (I guess) meaning with a 2K range. >> > > >> > > Also print out the two values skb_put() and data_kva. You can use gd= b >> to example the memory using the dump memory command. (is it x/nn ) >> nn is the read width, but you could leave off the =E2=80=98/nn=E2=80=99 = for the default. >> > > >> > > > >> > > > Thank you >> > > > Sushil >> > > > >> > > > On Sat, Feb 25, 2017 at 10:31 AM, Wiles, Keith < >> keith.wiles@intel.com> wrote: >> > > > >> > > > > On Feb 24, 2017, at 8:07 AM, Sushil Adhikari >> wrote: >> > > > > >> > > > > Resending because of unsupported email content type >> > > > > >> > > > > >> > > > > yes hanging is the better word I guess, ctrl + c is not working >> to actually stop the program. I also had display connected to the target >> manchine and I have attached a picture that shows the messages in that >> display that is where I saw "BUG:Unable to handle kernel paging request = at >> xxxxxx", which made me think that the program is in bad state. >> > > > >> > > > Sorry, I do not see why you are getting this message. All I can >> suggest is to use GDB and see if you can determine why the message is >> happening. >> > > > >> > > > > >> > > > > info thread in gdb shows one thread running >> > > > > Id Target Id Frame >> > > > > * 1 LWP 843 "dpdkKni" 0x000000000044eaee in rte_kni_tx_burst = () >> > > > > >> > > > > On Thu, Feb 23, 2017 at 5:41 PM, Wiles, Keith < >> keith.wiles@intel.com> wrote: >> > > > > >> > > > > > On Feb 23, 2017, at 2:38 PM, Sushil Adhikari < >> sushil446@gmail.com> wrote: >> > > > > > >> > > > > > While trying to run dpdk Kni application I ran in to a problem= , >> with >> > > > > > following error message >> > > > > > BUG: unable to handle kernel paging request at 000007ffe2b9278= 0 >> > > > > > >> > > > > > To run the application I first unbinded the ports from kernel >> module and >> > > > > > binded them to igb_uio >> > > > > >> echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind >> > > > > >> echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind >> > > > > >> echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id >> > > > > > >> > > > > > I ran the application using gdb as >> > > > > > >> > > > > > [~]$ /root/gdb dpdkKni >> > > > > > GNU gdb (crosstool-NG 1.20.0) 7.8 >> > > > > > Copyright (C) 2014 Free Software Foundation, Inc. >> > > > > > License GPLv3+: GNU GPL version 3 or later < >> http://gnu.org/licenses/gpl.html >> > > > > >> >> > > > > > This is free software: you are free to change and redistribute >> it. >> > > > > > There is NO WARRANTY, to the extent permitted by law. Type >> "show copying" >> > > > > > and "show warranty" for details. >> > > > > > This GDB was configured as "x86_64-unknown-linux-gnu". >> > > > > > Type "show configuration" for configuration details. >> > > > > > For bug reporting instructions, please see: >> > > > > > . >> > > > > > Find the GDB manual and other documentation resources online a= t: >> > > > > > . >> > > > > > For help, type "help". >> > > > > > Type "apropos word" to search for commands related to "word"..= . >> > > > > > Reading symbols from dpdkKni...(no debugging symbols >> found)...done. >> > > > > > (gdb) Run dpdkKni -c 0x0f -n 4 -- -P -p 0x3 >> --config=3D"(0,0,1),(1,2,3)" >> > > > > > Starting program: /root/dpdkKni dpdkKni -c 0x0f -n 4 -- -P -p >> 0x3 >> > > > > > --config=3D"(0,0,1),(1,2,3)" >> > > > > > warning: Could not load shared library symbols for >> linux-vdso.so.1. >> > > > > > Do you need "set solib-search-path" or "set sysroot"? >> > > > > > warning: Unable to find libthread_db matching inferior's threa= d >> library, >> > > > > > thread debugging will not be available. >> > > > > > EAL: Detected 4 lcore(s) >> > > > > > EAL: Probing VFIO support... >> > > > > > EAL: PCI device 0000:05:00.0 on NUMA socket -1 >> > > > > > EAL: probe driver: 8086:1528 net_ixgbe >> > > > > > EAL: PCI device 0000:05:00.1 on NUMA socket -1 >> > > > > > EAL: probe driver: 8086:1528 net_ixgbe >> > > > > > Address of pktmbuf_pool 0x7ffff5a7dec0 >> > > > > > APP: Initialising port 0 ... >> > > > > > KNI: pci: 05:00:00 8086:1528 >> > > > > > kni created for port 0 with kni[i] address 0x7fff75638280 with >> i 0 >> > > > > > APP: Initialising port 1 ... >> > > > > > KNI: pci: 05:00:01 8086:1528 >> > > > > > kni created for port 1 with kni[i] address 0x7fff75629e00 with >> i 0 >> > > > > > APP: Lcore 1 is writing to port 0 >> > > > > > APP: Lcore 2 is reading from port 1 >> > > > > > APP: Lcore 3 is writing to port 1 >> > > > > > APP: Lcore 0 is reading from port 0 >> > > > > > ^C >> > > > > > Program received signal SIGINT, Interrupt. >> > > > > >> > > > > The program did not crash or get a segfault, but you hit >> control-c which stopped the application. When you ran the application yo= u >> started 4 threads and this is why it would appear in different places wh= en >> stopped. >> > > > > >> > > > > If the application is hanging then you can use control-C and the= n >> do =E2=80=98info threads=E2=80=99 command to see the location of all thr= eads. You can use >> the =E2=80=98thread X=E2=80=99 command to switch between threads. Please= check the command >> usage here I am going from memory. >> > > > > >> > > > > I am not sure if the application has a -i option to get a comman= d >> line if so that maybe useful to enable, check the application to see if = it >> used cmdline feature. >> > > > > >> > > > > It maybe the application just sits running and you have to use >> other tools or apps to send traffic on the KNI application, sorry I have >> not really used the KNI example. >> > > > > >> > > > > > 0x000000000044e916 in rte_kni_tx_burst () >> > > > > > (gdb) backtrace >> > > > > > #0 0x000000000044e916 in rte_kni_tx_burst () >> > > > > > #1 0x0000000000619758 in main_loop(void*) () >> > > > > > #2 0x0000000000431183 in rte_eal_mp_remote_launch () >> > > > > > #3 0x000000000040d312 in main () >> > > > > > >> > > > > > (this is where the program crashes) >> > > > > > >> > > > > > I tried to trace the crash with gdb(I am new to gdb) >> > > > > > >> > > > > > and when I do the backtrace it ends up in different functions >> each time: >> > > > > > this time it gave me rte_kni_tx_burst() >> > > > > > >> > > > > > I'm running latest dpdk version 17.02 and linux kernel is >> > > > > > Linux version 4.4.20 (tcuser@cibuild08) (gcc version 4.9.2 >> (crosstool-NG >> > > > > > 1.20.0) ) #1 SMP >> > > > > > >> > > > > > I would appreciate any suggestion or insight regarding this >> issue. >> > > > > >> > > > > Regards, >> > > > > Keith >> > > > > >> > > > > >> > > > > >> > > > >> > > > Regards, >> > > > Keith >> > > > >> > > > >> > > >> > > Regards, >> > > Keith >> > > >> > > >> > >> > Regards, >> > Keith >> > >> > >> >> Regards, >> Keith >> >> >