From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ferruh.yigit@intel.com>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id 179101B613
 for <dev@dpdk.org>; Fri, 10 Nov 2017 02:40:17 +0100 (CET)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga105.jf.intel.com with ESMTP; 09 Nov 2017 17:40:16 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.44,371,1505804400"; 
   d="scan'208";a="369135"
Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.241.224.59])
 ([10.241.224.59])
 by FMSMGA003.fm.intel.com with ESMTP; 09 Nov 2017 17:40:15 -0800
To: Chas Williams <3chas3@gmail.com>
Cc: Thomas Monjalon <thomas@monjalon.net>, dev@dpdk.org,
 Jianfeng Tan <jianfeng.tan@intel.com>, Jingjing Wu <jingjing.wu@intel.com>,
 Shijith Thotton <shijith.thotton@caviumnetworks.com>,
 Gregory Etelson <gregory@weka.io>, Harish Patil <harish.patil@cavium.com>,
 George Prekas <george.prekas@epfl.ch>,
 Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>,
 Rasesh Mody <rasesh.mody@cavium.com>, Lee Roberts <lee.roberts@hpe.com>,
 Stephen Hemminger <stephen@networkplumber.org>
References: <20171103223822.28852-1-ferruh.yigit@intel.com>
 <2004961.P5XXAOnQC2@xps>
 <CAG2-Gkk1Z3ZE3XpyK_FQVLnQb9qeY3tYnny6n9x2m2E0MkFRyA@mail.gmail.com>
 <1537486.rYyyKMFLXg@xps>
 <CAG2-GknAx_N_dbQAcVnmSDSF01ivfP7shZUd5ewJdZK1GCsbbA@mail.gmail.com>
 <3c0fb383-552b-b212-c0a0-9267a12afad9@intel.com>
 <CAG2-GkmO_kwy1NwBZ5VbfP1E7TU9-U2soaDfmjSB+FwFsHv=yg@mail.gmail.com>
 <d1e9ad62-0b2b-c67e-8425-551a0898a757@intel.com>
 <CAG2-Gkn1+Tqy_Cs-ybQ1HdoNNA8OzYEaptDfsVsRp+11t+-_bQ@mail.gmail.com>
From: Ferruh Yigit <ferruh.yigit@intel.com>
Message-ID: <b4359849-e079-23c9-2918-ad809e52addc@intel.com>
Date: Thu, 9 Nov 2017 17:40:13 -0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <CAG2-Gkn1+Tqy_Cs-ybQ1HdoNNA8OzYEaptDfsVsRp+11t+-_bQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v2] igb_uio: prevent reset for
 a list of devices
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Nov 2017 01:40:18 -0000

On 11/8/2017 4:00 AM, Chas Williams wrote:
> 
> 
> On Tue, Nov 7, 2017 at 5:26 PM, Ferruh Yigit <ferruh.yigit@intel.com
> <mailto:ferruh.yigit@intel.com>> wrote:
> 
>     On 11/7/2017 12:47 PM, Chas Williams wrote:
>     > I will confess I haven't looked into the issue too hard since I have a
>     > workaround.  My first guess is that there is something going on with the IOMMU
>     > and quiescing a PCI pass-through device/function from the guest (since I don't
>     > think the IOMMU is "visible" to the guest) seems iffy.
>     >
>     > Most devices have some sort of reset to put the device into a known state for
>     > setup/configuration (or enable/disable for the DMA engines).  If this is done at
>     > .dev_close(), shouldn't that be as sufficient as resetting the function?
> 
>     This is for the cases DPDK app terminated unexpectedly, proper exit path already
>     does cleanup.
> 
> 
> Call a usermode helper from igb_uio that does an open/close on the device about
> to be released?

Can a generic userspace code know how to cleaup various devices?
I guess driver required for this work and dpdk application that has drivers
already exit in that stage.

>  
> 
> 
>     >
>     > On Tue, Nov 7, 2017 at 1:49 PM, Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>
>     > <mailto:ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>>> wrote:
>     >
>     >     On 11/7/2017 10:12 AM, Chas Williams wrote:
>     >     > Environment: Dell PowerEdge R730, Intel Corporation 82599ES 10-Gigabit
>     >     SFI/SFP+
>     >     > Network Connection shared via PCI pass-through
>     >     > Host: Debian 8
>     >     > Guest: Custom Debian 8 with DPDK application based on 17.11
>     >     >
>     >     > When we shutdown the guest, the kernel panics with:
>     >     >
>     >     > [  279.021818] Do you have a strange power saving mode enabled?
>     >     > [  279.021819] Dazed and confused, but trying to continue
>     >     > [  279.021847] {1}[Hardware Error]: Hardware error from APEI Generic
>     Hardware
>     >     > Error Source: 3
>     >     > [  279.021849] {1}[Hardware Error]: event severity: fatal
>     >     > [  279.021850] {1}[Hardware Error]:  Error 0, type: fatal
>     >     > [  279.021851] {1}[Hardware Error]:   section_type: PCIe error
>     >     > [  279.021852] {1}[Hardware Error]:   port_type: 0, PCIe end point
>     >     > [  279.021853] {1}[Hardware Error]:   version: 1.16
>     >     > [  279.021854] {1}[Hardware Error]:   command: 0x0507, status: 0x4010
>     >     > [  279.021855] {1}[Hardware Error]:   device_id: 0000:03:00.0
>     >     > [  279.021855] {1}[Hardware Error]:   slot: 0
>     >     > [  279.021856] {1}[Hardware Error]:   secondary_bus: 0x00
>     >     > [  279.021857] {1}[Hardware Error]:   vendor_id: 0x8086, device_id:
>     0x10fb
>     >     > [  279.021858] {1}[Hardware Error]:   class_code: 000002
>     >     > [  279.021859] Kernel panic - not syncing: Fatal hardware error!
>     >     > [  279.021977] sched: Unexpected reschedule of offline CPU#1!
>     >     > [  279.021984] ------------[ cut here ]------------
>     >     > [  279.021992] WARNING: CPU: 43 PID: 2807 at
>     >     > /build/linux-fHlJSJ/linux-4.12.6/arch/x86/kernel/smp.c:128
>     >     > native_smp_send_reschedule+0x34/0x40
>     >     > [  279.021993] Modules linked in: vfio_pci vfio_virqfd
>     vfio_iommu_type1 vfio
>     >     > openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4
>     nf_defrag_ipv4
>     >     > nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c
>     crc32c_generic nfsd
>     >     > nfs_aclr
>     >     > pcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc
>     >     fscache tun
>     >     > intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
>     >     kvm_intel kvm
>     >     > irqbypass mgag200 ttm drm_kms_helper drm joydev crct10dif_pclmul
>     crc32_pclmu
>     >     > l ghash_clmulni_intel i2c_algo_bit ipmi_si ipmi_devintf iTCO_wdt
>     intel_cstate
>     >     > iTCO_vendor_support evdev intel_uncore mxm_wmi lpc_ich ipmi_msghandler
>     >     mfd_core
>     >     > ioatdma intel_rapl_perf dcdbas pcspkr shpchp mei_me button wmi mei
>     >     acpi_power_m
>     >     > eter tpm_crb autofs4 ext4 crc16 jbd2 fscrypto mbcache sr_mod cdrom sg
>     >     > hid_generic usbhid hid sd_mod
>     >     > [  279.022044]  crc32c_intel aesni_intel aes_x86_64 crypto_simd cryptd
>     >     > glue_helper ahci ehci_pci libahci ehci_hcd ixgbe libata megaraid_sas
>     >     usbcore dca
>     >     > i40e usb_common ptp pps_core scsi_mod mdio
>     >     > [  279.022060] CPU: 43 PID: 2807 Comm: revalidator85 Not tainted
>     >     4.12.0-1-amd64
>     >     > #1 Debian 4.12.6-1
>     >     > [  279.022061] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS
>     2.3.4
>     >     11/08/2016
>     >     > [  279.022062] task: ffff91d0473f7100 task.stack: ffffafef8f4a4000
>     >     > [  279.022066] RIP: 0010:native_smp_send_reschedule+0x34/0x40
>     >     > [  279.022067] RSP: 0018:ffffafef8f4a7c98 EFLAGS: 00010082
>     >     > [  279.022069] RAX: 000000000000002e RBX: ffff91d059d24080 RCX:
>     >     0000000000000001
>     >     > [  279.022070] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
>     >     0000000000000046
>     >     > [  279.022071] RBP: ffff91d04691d100 R08: 0000000000000000 R09:
>     >     000000000000002e
>     >     > [  279.022072] R10: ffffafef8f4a7c90 R11: 00000000001cbb78 R12:
>     >     ffff91d85d21ae80
>     >     > [  279.022073] R13: ffff91d059d24000 R14: 0000000000000002 R15:
>     >     0000000000000008
>     >     > [  279.022075] FS:  00007f726affd700(0000) GS:ffff91d85d740000(0000)
>     >     > knlGS:0000000000000000
>     >     > [  279.022076] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     >     > [  279.022077] CR2: 00007fd422a52c48 CR3: 000000042d90f000 CR4:
>     >     00000000003426e0
>     >     > [  279.022078] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>     >     0000000000000000
>     >     > [  279.022079] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>     >     0000000000000400
>     >     > [  279.022080] Call Trace:
>     >     > [  279.022086]  ? check_preempt_wakeup+0x181/0x220
>     >     > [  279.022091]  ? check_preempt_curr+0x74/0x80
>     >     > [  279.022094]  ? ttwu_do_wakeup+0x19/0x140
>     >     > [  279.022098]  ? try_to_wake_up+0x1b8/0x470
>     >     > [  279.022101]  ? wake_up_q+0x3f/0x70
>     >     > [  279.022106]  ? futex_wake+0x15a/0x170
>     >     > [  279.022108]  ? do_futex+0x2df/0xa90
>     >     > [  279.022111]  ? SyS_futex+0x7a/0x170
>     >     > [  279.022113]  ? SyS_read+0x76/0xc0
>     >     > [  279.022118]  ? system_call_fast_compare_end+0xc/0x97
>     >     > [  279.022119] Code: a3 05 51 fb cc 00 73 15 48 8b 05 28 74 a3 00 be
>     fd 00
>     >     00 00
>     >     > 48 8b 80 a0 00 00 00 ff e0 89 fe 48 c7 c7 88 5c de b6 e8 e2 c9 13 00
>     <0f>
>     >     ff c3
>     >     > 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 05 5d 00
>     >     > [  279.022151] ---[ end trace eddc980dc8648163 ]---
>     >     > [  279.454274] Kernel Offset: 0x35400000 from 0xffffffff81000000
>     (relocation
>     >     > range: 0xffffffff80000000-0xffffffffbfffffff)
>     >     >
>     >     > The test engineer says this doesn't happen if we use SRIOV (which
>     makes sense
>     >     > since the device isn't directly shared between the guest and the
>     host).  If I
>     >     > remove the pci_reset_function() from igb_uio's .release, then all is
>     well.
>     >
>     >     This was tougher than expected, so many unexpected behavior. Why resetting
>     >     pass-through device in guest cause a crash in the host?
>     >
>     >     Finally, I will send a patch to remove the reset. Hopefully no more
>     surprises
>     >     for release.
>     >
>     >     Still there will remain two improvement in igb_uio for better security,
>     >     disabling device interrupt on exit and clear master on exit.
>     >
>     >     >
>     >     >
>     >     > On Tue, Nov 7, 2017 at 8:02 AM, Thomas Monjalon <thomas@monjalon.net
>     <mailto:thomas@monjalon.net> <mailto:thomas@monjalon.net
>     <mailto:thomas@monjalon.net>>
>     >     > <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>
>     <mailto:thomas@monjalon.net <mailto:thomas@monjalon.net>>>> wrote:
>     >     >
>     >     >     07/11/2017 12:50, Chas Williams:
>     >     >     > We still have an issue with this and PCI pass-through.  If a
>     guest is
>     >     >     > restarted while using PCI pass-through and igb_uio issues a
>     >     >     > pci_reset_function(), this causes the host to crash.
>     >     >
>     >     >     Please, could you better explain the exact scenario and the cause of
>     >     the crash?
>     >     >     Thanks
>     >     >
>     >     >
>     >
>     >
> 
>