From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 275361B26C; Fri, 20 Oct 2017 18:32:14 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Oct 2017 09:32:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.43,405,1503385200"; d="scan'208";a="911972764" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.241.224.220]) ([10.241.224.220]) by FMSMGA003.fm.intel.com with ESMTP; 20 Oct 2017 09:32:04 -0700 To: "Tan, Jianfeng" , "Patil, Harish" , Thomas Monjalon Cc: "dev@dpdk.org" , Jingjing Wu , "Thotton, Shijith" , Gregory Etelson , George Prekas , "stable@dpdk.org" , Sergio Gonzalez Monroy , "Glynn, Michael J" References: <20171017201436.65270-1-ferruh.yigit@intel.com> <2156540.XK3Rr5JXAh@xps> <1f37e107-eb01-3a23-04a5-ba4c6e60d63d@intel.com> From: Ferruh Yigit Message-ID: Date: Fri, 20 Oct 2017 09:32:04 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <1f37e107-eb01-3a23-04a5-ba4c6e60d63d@intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-stable] [PATCH] igb_uio: revert open and release operations X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Oct 2017 16:32:15 -0000 On 10/20/2017 8:26 AM, Tan, Jianfeng wrote: > Hi Ferruh & Harish, > > > On 10/20/2017 9:15 AM, Ferruh Yigit wrote: >> On 10/19/2017 3:43 PM, Patil, Harish wrote: >>> -----Original Message----- >>> From: Harish Patil >>> Date: Tuesday, October 17, 2017 at 9:50 PM >>> To: Thomas Monjalon , Ferruh Yigit >>> >>> Cc: "dev@dpdk.org" , Jianfeng Tan , >>> Jingjing Wu , "Thotton, Shijith" >>> , Gregory Etelson , George >>> Prekas , "stable@dpdk.org" >>> Subject: Re: [PATCH] igb_uio: revert open and release operations >>>> >>>> -----Original Message----- >>>> From: Thomas Monjalon >>>> Date: Tuesday, October 17, 2017 at 1:33 PM >>>> To: Ferruh Yigit , Harish Patil >>>> >>>> Cc: "dev@dpdk.org" , Jianfeng Tan , >>>> Jingjing Wu , "Thotton, Shijith" >>>> , Gregory Etelson , George >>>> Prekas , "stable@dpdk.org" >>>> Subject: Re: [PATCH] igb_uio: revert open and release operations >>>> >>>>> 17/10/2017 22:14, Ferruh Yigit: >>>>>> There were bug reports about terminated application may leave device in >>>>>> undesired state: >>>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html >>>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html >>>>>> >>>>>> And a proposal to fix: >>>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html >>>>>> >>>>>> Later another proposal triggered the discussion: >>>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html >>>>>> >>>>>> Finally a fix patch pushed into v17.08: >>>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >>>>>> device file") >>>>>> >>>>>> Later a regression report sent related to the pushed patch: >>>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html >>>>>> >>>>>> And a fix for regression integrated into v17.11-rc1: >>>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html >>>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >>>>>> VM") >>>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17") >>>>>> >>>>>> Even after the fix qede PMD reported to be broken: >>>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html >>>>>> >>>>>> So this patch reverts original fix and related commits. The related >>>>>> igb_uio code part turns back to v17.05 base. >>>>> [...] >>>>>> --- >>>>>> It would be nice to solve this issue in LTS release, but being close to >>>>>> the release and the error report without details makes it hard to work >>>>>> more on this issue. >>>>> With this revert, we are back to the original issue. >>>>> We must really try the proposed solution. >>>>> Harish, please describe your issue and think how it could be fixed. >>>>> Jingjing made it work for i40e. >>>>> I know it is less effort to request a simple revert. >>>>> Please let's try to fix it once for all. >>>> Hi Ferruh/Thomas, >>>> I’m discussing it internally, so please hold on committing this patch till >>>> I revert back to you. >>>> >>>> Thanks. >>>> [Harish] >>>> 1) With the introduction of: >>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >>>> device file”) >>> We saw failures with qede PF & SR-IOV VF initialization in PCI passthru >>> mode. >>> >>>> PF PCI passthru mode initialization failure was resolved by: >>>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >>>> VM") >> Thank you for the update. >> >>>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and >>>> related device cleanup is not completed by the time VF driver starts >>>> loading. It results in the mbox command failure sent over the HW channel >>>> between VF and PF. >> This seems same reason why i40e added a check and wait loop. >> >>>> Even though pci_reset_function() waits for the stipulated amount of time >>>> per standards, VF FLR takes longer than that and pci_reset_function() & >>>> igb_uio_open() call returns before FLR completes and VF PMD driver tries >>>> to load before FLR completes leading to VF PMD initialization failure. >>>> >>>> >>>> We can work around this problem by adding driver delay/retry logic since >>>> there is no deterministic way of detecting FLR completion. But this is >>>> going to increase the driver load time. >>>> >>>> 2) With the above patch ("igb_uio: issue FLR during open and release of >>>> device file), FLR is going to be issued unconditionally on all devices >>>> during igb_uio_open. We think it’s an over kill. FLR is required only for >>>> previous abnormal app termination. We already handle the abnormal app >>>> termination by doing necessary cleanup in the driver during load. This >>>> cleanup is more efficient as it is done only when required. So we feel >>>> that the drivers/devices needing such cleanup (the two cases listed >>>> below) should do it conditionally when required rather than >>>> igb_uio_open() unconditionally performing FLR all the time. >>>> >>>> e.g., >>> - cdb166963cae ("net/liquidio: add API for VF FLR”) >> Both 1) and 2) related to the pci_reset during open(). >> But the main functionality we are looking for is the pci_reset in release(). >> So we can remove reset during open() [1]. >> >> Will disabling pci_reset in open() [2] solve your problems? >> >> Jingjing, Jianfeng, >> What do you think? > > If [2] can work for all drivers, I agree with this option. OK, I will send this as a patch to make it easy to test and discuss. > > Thanks, > Jianfeng > >> >> >> [1] >> Perhaps will need to revert or partially revert liquidio patch after this. >> >> [2] >> disable line "pci_reset_function(dev);" in the igbuio_pci_open() >> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339 >> >