From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk0-f42.google.com (mail-vk0-f42.google.com [209.85.213.42]) by dpdk.org (Postfix) with ESMTP id A2E642C1A for ; Wed, 6 Apr 2016 22:16:11 +0200 (CEST) Received: by mail-vk0-f42.google.com with SMTP id k1so73306863vkb.0 for ; Wed, 06 Apr 2016 13:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infinite-io.20150623.gappssmtp.com; s=20150623; h=mime-version:date:message-id:subject:from:to; bh=1QgjZ18WSgKW7YcGQ27L8xAXBzMkgV569MZWM4KWsGs=; b=TKKxu53QFhtXBBIX4ydNIG6dBtCY7m8XnUHdtrwTHFTBEJNNujYPSSOUUNFQmByWO8 9ykVtmYsuml1V7tZGWDO1RyhHWXD4QZ6q6cSyK5XrcJvJUMQ/uvl9JUDTNs1L4O9l+YY DledbD+yZ2joKRRCNUa7zUTrC3pMDWQx8jrEl/4p4jiXlZS/v0yLuoW880eB6DEu9ach 0xtkZv0PWWMKiOGEw3EkSbd+2HMD1pUJJdOjwXbnPgI7k1WGmDh/JZn7yKIkI0DnpqIL 9LI4HZAs2n7I6jifvaPBf2Ux5YKiIRU5oxjA/K4ZQDrffBU1UHnKWHDLuaSHZ7AwJQYN svTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=1QgjZ18WSgKW7YcGQ27L8xAXBzMkgV569MZWM4KWsGs=; b=WmuZUN832crCZKsDOEqJwJNIrKGUKt24q2/vHCFX3gxWMgEbXGZIq9z7UvqXydM6QY PTwQhkQyY+btRldASp61seJC3YSyieFY+TD90DkyefNVn1Ez5gw3929G4JIjOomAcLoZ y9uN+NERTgk3Bw7cmpk6y1RSFRKZc8O6CL5SC+5dp4eOwAGSSRtq4qAoUNNl29pGNFAS 9LUk7QPA72cU4k7I4uQubCdZ75srQP0L9Igg4GfeLjYKq1nL8qH1REKMwNpX9IrNBQcQ bPFhqmjCrO3RO2Eh72U9/iYSYja63eOiAUxG2KyvQ1jKFCeMeCMSaz8Ze1dkkqxARYBJ JvXA== X-Gm-Message-State: AD7BkJIVR2aJplZENYhDnUxByljL/uaBbbYU/lAxwnngFf9T0aJJkg1JaaxBlj1vpicSGJku6nsJRpHL2f/S3Q== MIME-Version: 1.0 X-Received: by 10.31.194.10 with SMTP id s10mr11198241vkf.72.1459973771031; Wed, 06 Apr 2016 13:16:11 -0700 (PDT) Received: by 10.103.86.14 with HTTP; Wed, 6 Apr 2016 13:16:10 -0700 (PDT) Date: Wed, 6 Apr 2016 15:16:10 -0500 Message-ID: From: Jay Rolette To: DPDK Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] Kernel panic in KNI X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Apr 2016 20:16:12 -0000 I had a system lockup hard a couple of days ago and all we were able to get was a photo of the LCD monitor with most of the kernel panic on it. No way to scroll back the buffer and nothing in the logs after we rebooted. Not surprising with a kernel panic due to an exception during interrupt processing. We have a serial console attached in case we are able to get it to happen again, but it's not easy to reproduce (hours of runtime for this instance). Ran the photo through OCR software to get a text version of the dump, so possible I missed some fixups in this: [39178.433262] RDX: 00000000000000ba RSI: ffff881fd2f350ee RDI: a12520669126180a [39178.464020] RBP: ffff880433966970 R08: a12520669126180a R09: ffff881fd2f35000 [39178.495091] R10: 000000000000ffff R11: ffff881fd2f88000 R12: ffff883fdla75ee8 [39178.526594] R13: 00000000000000ba R14: 00007fdad5a66780 R15: ffff883715ab6780 [39178.559011] FS: 00007ffff7fea740(0000) GS:ffff88lfffc00000(0000) knlGS:0000000000000000 [39178.592005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [39178.623931] CR2: 00007ffff7ea2000 CR3: 0000001fd156f000 CR4: 00000000001407f0 [39178.656187] Stack: [39178.689025] ffffffffc067c7ef 00000000000000ba 00000000000000ba ffff881fd2f88000 [39178.722682] 0000000000004000 ffff8B3fd0bbd09c ffff883fdla75ee8 ffff8804339bb9c8 [39178.756525] ffffffff81658456 ffff881fcd2ec40c ffffffffc0680700 ffff880436bad800 [39178.790577] Call Trace: [39178.824420] [] ? kni_net_tx+0xef/0x1a0 [rte_kni] [39178.859190] [] dev_hard_start_xmit+0x316/0x5c0 [39178.893426] [] sch_direct_xmit+0xee/0xic0 [39178.927435] [l __dev_queue_xmit+0x200/0x4d0 [39178.961684] [l dev_queue_xmit+0x10/0x20 [39178.996194] [] neigh_connected_output+0x67/0x100 [39179.031098] [] ip_finish_output+0xid8/0x850 [39179.066709] [l ip_output+0x58/0x90 [39179.101551] [] ip_local_out_sk+0x30/0x40 [39179.136823] [] ip_queue_xmit+0xl3f/0x3d0 [39179.171742] [] tcp_transmit_skb+0x47c/0x900 [39179.206854] [l tcp_write_xmit+0x110/0xcb0 [39179.242335] [] __tcp_push_pending_frames+0x2e/0xc0 [39179.277632] [] tcp_push+0xec/0x120 [39179.311768] [] tcp_sendmsg+0xb9/0xce0 [39179.346934] [] ? tcp_recvmsg+0x6e2/0xba0 [39179.385586] [] inet_sendmsg+0x64/0x60 [39179.424228] [] ? apparmor_socket_sendmsg+0x21/0x30 [39179.4586581 [] sock_sendmsg+0x86/0xc0 [39179.493220] [] ? __inet_stream_connect+0xa5/0x320 [39179.528033] [] ? __fdget+0x13/0x20 [39179.561214] [] SYSC_sendto+0x121/0x1c0 [39179.594665] [] ? aa_sk_perm.isra.4+0x6d/0x150 [39179.6268931 [] ? read_tsc+0x9/0x20 [39179.6586541 [] ? ktime_get_ts+0x48/0xe0 [39179.689944] [] SyS_sendto+0xe/0x10 [39179.719575] [] system_call_fastpath+0xia/0xif [39179.748760] Code: 43 58 48 Zb 43 50 88 43 4e 5b 5d c3 66 Of if 84 00 00 00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 a4 c3 03 83 eZ 07 f3 48 .15 89 di f3 a4 c3 20 4c 8b % 4c 86 [39179.808690] RIP [] memcpy+0x6/0x110 [39179.837238] RSP [39179.933755] ---[ end trace 2971562f425e2cf8 ]--- [39179.964856] Kernel panic - not syncing: Fatal exception in interrupt [39179.992896] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [39180.024617] ---[ end Kernel panic - not syncing: Fatal exception in interrupt It blew up when kni_net_tx() called memcpy() to copy data from the skb to an mbuf. Disclosure: I'm not a Linux device driver guy. I dip into the kernel as needed. Plenty of experience doing RTOS and bare metal development, but not a Linux kernel expert. What context does kni_net_tx() run in? On the receive path, my understanding is that KNI always runs in process context on a kthread. I've been assuming that the transmit path was also in process context (albeit on the app's process), so the "Fatal exception in interrupt" is throwing me. Does kni_net_tx() ever run in interrupt (or soft-interrupt) context? Thanks, Jay