From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f176.google.com (mail-ob0-f176.google.com [209.85.214.176]) by dpdk.org (Postfix) with ESMTP id D035E8E8F for ; Tue, 13 Oct 2015 14:27:11 +0200 (CEST) Received: by obbzf10 with SMTP id zf10so11925542obb.2 for ; Tue, 13 Oct 2015 05:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=ssnK+oEjbq80zpgMnfvG0v2pDFpolkAhNJJuFGNA3WM=; b=xN7o4SFG1dHwR/bRrU0Qo04qGAf+lZ4ruN1DSiTTSXt1dT1MRkJqQ4ha6zBJPoi3qR GbLtcgHxHMaMMg3Hm11oMsfz0dsz3XXTlCJN/vFyc3juXKFjmtOtL9fdzDUESpnFi1YC +LSnDYOGHlwT9kD9q3awbSOodBkT4zGIxBOK6HFc2dWnPRmf/FWOHTrPXUYvgN2kERpR 4kCQFCz99E2MtHh9xsVm6F3dfBa8N6Q/DmWfBZYl1J/BhQyz3qQggIsWYE9mwu18JvtD mtoZYeULRezPpu6vbrPkCmN553afYvT1XasKkmzro4T81Rr36i41VsL9l4xzVaEGGlSR zY8w== MIME-Version: 1.0 X-Received: by 10.60.41.9 with SMTP id b9mr18794225oel.37.1444739231312; Tue, 13 Oct 2015 05:27:11 -0700 (PDT) Received: by 10.76.175.72 with HTTP; Tue, 13 Oct 2015 05:27:11 -0700 (PDT) Date: Tue, 13 Oct 2015 08:27:11 -0400 Message-ID: From: Kyle Larose To: dev@dpdk.org Content-Type: text/plain; charset=UTF-8 Subject: [dpdk-dev] Host kernel panic when running ixgbe NIC in pci passthrough X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Oct 2015 12:27:12 -0000 Hello, I have a system using dpdk 1.8 with 82599ES ixgbe NICs. These are provided to a virtual guest via pci passthrough. Our dpdk application on the guest takes control of the NICs using igb_uio. On certain systems, under conditions we have not yet figured out, sending traffic causes the host to kernel panic. It looks like a pci device is reporting a fatal error. >>From the error, the issue looks to be either the bridge connected to the ixgbe, or the ixgbe itself; I cannot decipher the message beyond that. This has happened on three different machines, so I do not think it is bad hardware. I was wondering if anybody has run into this before, and if they have any solutions. I tried searching the mailing list, but couldn't find anything related. 3108395.524535] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 [3108395.533959] {1}[Hardware Error]: APEI generic hardware error status [3108395.541149] {1}[Hardware Error]: severity: 1, fatal [3108395.546785] {1}[Hardware Error]: section: 0, severity: 1, fatal [3108395.553586] {1}[Hardware Error]: flags: 0x01 [3108395.558543] {1}[Hardware Error]: primary [3108395.563113] {1}[Hardware Error]: section_type: PCIe error [3108395.569332] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.576715] {1}[Hardware Error]: version: 1.16 [3108395.581866] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.588763] {1}[Hardware Error]: device_id: 0000:05:01.0 [3108395.594886] {1}[Hardware Error]: slot: 0 [3108395.599455] {1}[Hardware Error]: secondary_bus: 0x06 [3108395.605189] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.612572] {1}[Hardware Error]: class_code: 000406 [3108395.618208] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003 [3108395.626853] {1}[Hardware Error]: section: 1, severity: 1, fatal [3108395.633653] {1}[Hardware Error]: flags: 0x01 [3108395.638611] {1}[Hardware Error]: primary [3108395.643179] {1}[Hardware Error]: section_type: PCIe error [3108395.649396] {1}[Hardware Error]: port_type: 6, downstream switch port [3108395.656778] {1}[Hardware Error]: version: 1.16 [3108395.661930] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [3108395.668829] {1}[Hardware Error]: device_id: 0000:05:09.0 [3108395.674951] {1}[Hardware Error]: slot: 0 [3108395.679521] {1}[Hardware Error]: secondary_bus: 0x09 [3108395.685254] {1}[Hardware Error]: vendor_id: 0x10b5, device_id: 0x8724 [3108395.692636] {1}[Hardware Error]: class_code: 000406 [3108395.698272] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0003 [3108395.706915] Kernel panic - not syncing: Fatal hardware error! 0000:05:01.0 is a PLX pci bridge. It has two ixgbe NICs connected to it. Likewise with 0000:05:09.0. Here is the boot cmdline on the host (we're using iommu): BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=57d79ff0-1152-46fb-a619-b2a102de3d5f ro console=ttyS0,115200n8 vconsole.font=latarcyrheb-sun16 crashkernel=auto rd.lvm.lv=VolGrp/Vol1 rd.lvm.lv=VolGrp/Vol0 vconsole.keymap=us LANG=en_US.UTF-8 intel_iommu=on Any help would be greatly appreciated. Thanks, Kyle