From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from prod-mail-xrelay08.akamai.com (prod-mail-xrelay08.akamai.com [96.6.114.112]) by dpdk.org (Postfix) with ESMTP id 3C02A2956 for ; Thu, 7 Apr 2016 00:30:31 +0200 (CEST) Received: from prod-mail-xrelay08.akamai.com (localhost.localdomain [127.0.0.1]) by postfix.imss70 (Postfix) with ESMTP id 54383200042; Wed, 6 Apr 2016 22:30:30 +0000 (GMT) Received: from prod-mail-relay08.akamai.com (prod-mail-relay08.akamai.com [172.27.22.71]) by prod-mail-xrelay08.akamai.com (Postfix) with ESMTP id 3DFBC20000D; Wed, 6 Apr 2016 22:30:30 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; s=a1; t=1459981830; bh=ZbXQuH7JMQdKdNuM+aikDLfyAaUaVEmzTr1Y0KkwbJw=; l=5322; h=From:To:Date:References:In-Reply-To:From; b=RrlNQmR4HIkHMhC3O8rcSaV1aN7MTxHjYv+qCRtoBmZT5PhfeZM3se2s3hGTQ9UEG 3/RQH2Z/aVxd2valWPDrEJwxKoai1tYaWocVbJeAEJb49CV3eWRQcq562SA0JGITN6 LYRgMLyeGVBfV6zXcgETo6SzG7VLGSFbryWNfwtM= Received: from email.msg.corp.akamai.com (ustx2ex-cas2.msg.corp.akamai.com [172.27.25.31]) by prod-mail-relay08.akamai.com (Postfix) with ESMTP id 3BF9D98082; Wed, 6 Apr 2016 22:30:30 +0000 (GMT) Received: from ustx2ex-dag1mb6.msg.corp.akamai.com (172.27.27.107) by ustx2ex-dag1mb4.msg.corp.akamai.com (172.27.27.104) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Wed, 6 Apr 2016 17:30:29 -0500 Received: from ustx2ex-dag1mb6.msg.corp.akamai.com ([172.27.27.107]) by ustx2ex-dag1mb6.msg.corp.akamai.com ([172.27.27.107]) with mapi id 15.00.1130.005; Wed, 6 Apr 2016 15:30:29 -0700 From: "Sanford, Robert" To: Jay Rolette , DPDK Thread-Topic: [dpdk-dev] Kernel panic in KNI Thread-Index: AQHRkEEtx66oNWbsPU2Kn57KTaUMfZ99uZ8A Date: Wed, 6 Apr 2016 22:30:29 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.3.140616 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.19.132.109] Content-Type: text/plain; charset="us-ascii" Content-ID: <3F8D6BBC1C428C4E83DA2DA2B24ECAF8@akamai.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Kernel panic in KNI X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Apr 2016 22:30:31 -0000 Hi Jay, I won't try to interpret your kernel stack trace. But, I'll tell you about a KNI-related problem that we once experienced, and the symptom was a kernel hang. The problem was that we were passing mbufs allocated out of one mempool, to a KNI context that we had set up with a different mempool (on a different CPU socket). The KNI kernel driver, converts the user-space mbuf virtual address (VA) to a kernel VA by adding the difference between the user and kernel VAs of the mempool used to create the KNI context. So, if an mbuf comes from a different mempool, the calculated address will probably be VERY BAD. Could this be your problem? -- Robert On 4/6/16 4:16 PM, "Jay Rolette" wrote: >I had a system lockup hard a couple of days ago and all we were able to >get >was a photo of the LCD monitor with most of the kernel panic on it. No way >to scroll back the buffer and nothing in the logs after we rebooted. Not >surprising with a kernel panic due to an exception during interrupt >processing. We have a serial console attached in case we are able to get >it >to happen again, but it's not easy to reproduce (hours of runtime for this >instance). > >Ran the photo through OCR software to get a text version of the dump, so >possible I missed some fixups in this: > >[39178.433262] RDX: 00000000000000ba RSI: ffff881fd2f350ee RDI: >a12520669126180a >[39178.464020] RBP: ffff880433966970 R08: a12520669126180a R09: >ffff881fd2f35000 >[39178.495091] R10: 000000000000ffff R11: ffff881fd2f88000 R12: >ffff883fdla75ee8 >[39178.526594] R13: 00000000000000ba R14: 00007fdad5a66780 R15: >ffff883715ab6780 >[39178.559011] FS: 00007ffff7fea740(0000) GS:ffff88lfffc00000(0000) >knlGS:0000000000000000 >[39178.592005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >[39178.623931] CR2: 00007ffff7ea2000 CR3: 0000001fd156f000 CR4: >00000000001407f0 >[39178.656187] Stack: >[39178.689025] ffffffffc067c7ef 00000000000000ba 00000000000000ba >ffff881fd2f88000 >[39178.722682] 0000000000004000 ffff8B3fd0bbd09c ffff883fdla75ee8 >ffff8804339bb9c8 >[39178.756525] ffffffff81658456 ffff881fcd2ec40c ffffffffc0680700 >ffff880436bad800 >[39178.790577] Call Trace: >[39178.824420] [] ? kni_net_tx+0xef/0x1a0 [rte_kni] >[39178.859190] [] dev_hard_start_xmit+0x316/0x5c0 >[39178.893426] [] sch_direct_xmit+0xee/0xic0 >[39178.927435] [l __dev_queue_xmit+0x200/0x4d0 >[39178.961684] [l dev_queue_xmit+0x10/0x20 >[39178.996194] [] neigh_connected_output+0x67/0x100 >[39179.031098] [] ip_finish_output+0xid8/0x850 >[39179.066709] [l ip_output+0x58/0x90 >[39179.101551] [] ip_local_out_sk+0x30/0x40 >[39179.136823] [] ip_queue_xmit+0xl3f/0x3d0 >[39179.171742] [] tcp_transmit_skb+0x47c/0x900 >[39179.206854] [l tcp_write_xmit+0x110/0xcb0 >[39179.242335] [] __tcp_push_pending_frames+0x2e/0xc0 >[39179.277632] [] tcp_push+0xec/0x120 >[39179.311768] [] tcp_sendmsg+0xb9/0xce0 >[39179.346934] [] ? tcp_recvmsg+0x6e2/0xba0 >[39179.385586] [] inet_sendmsg+0x64/0x60 >[39179.424228] [] ? apparmor_socket_sendmsg+0x21/0x30 >[39179.4586581 [] sock_sendmsg+0x86/0xc0 >[39179.493220] [] ? __inet_stream_connect+0xa5/0x320 >[39179.528033] [] ? __fdget+0x13/0x20 >[39179.561214] [] SYSC_sendto+0x121/0x1c0 >[39179.594665] [] ? aa_sk_perm.isra.4+0x6d/0x150 >[39179.6268931 [] ? read_tsc+0x9/0x20 >[39179.6586541 [] ? ktime_get_ts+0x48/0xe0 >[39179.689944] [] SyS_sendto+0xe/0x10 >[39179.719575] [] system_call_fastpath+0xia/0xif >[39179.748760] Code: 43 58 48 Zb 43 50 88 43 4e 5b 5d c3 66 Of if 84 00 00 >00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90 > 90 48 89 f8 48 89 d1 a4 c3 03 83 eZ 07 f3 48 .15 89 di f3 a4 c3 20 >4c >8b % 4c 86 >[39179.808690] RIP [] memcpy+0x6/0x110 >[39179.837238] RSP >[39179.933755] ---[ end trace 2971562f425e2cf8 ]--- >[39179.964856] Kernel panic - not syncing: Fatal exception in interrupt >[39179.992896] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation >range: 0xffffffff80000000-0xffffffffbfffffff) >[39180.024617] ---[ end Kernel panic - not syncing: Fatal exception in >interrupt > >It blew up when kni_net_tx() called memcpy() to copy data from the skb to >an mbuf. > >Disclosure: I'm not a Linux device driver guy. I dip into the kernel as >needed. Plenty of experience doing RTOS and bare metal development, but >not >a Linux kernel expert. > >What context does kni_net_tx() run in? On the receive path, my >understanding is that KNI always runs in process context on a kthread. >I've >been assuming that the transmit path was also in process context (albeit >on >the app's process), so the "Fatal exception in interrupt" is throwing me. > >Does kni_net_tx() ever run in interrupt (or soft-interrupt) context? > >Thanks, >Jay