From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id BC0671B23C; Fri, 20 Oct 2017 17:26:20 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Oct 2017 08:26:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.43,405,1503385200"; d="scan'208";a="325659613" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.30.27]) ([10.255.30.27]) by fmsmga004.fm.intel.com with ESMTP; 20 Oct 2017 08:26:16 -0700 To: Ferruh Yigit , "Patil, Harish" , Thomas Monjalon References: <20171017201436.65270-1-ferruh.yigit@intel.com> <2156540.XK3Rr5JXAh@xps> Cc: "dev@dpdk.org" , Jingjing Wu , "Thotton, Shijith" , Gregory Etelson , George Prekas , "stable@dpdk.org" , Sergio Gonzalez Monroy , "Glynn, Michael J" From: "Tan, Jianfeng" Message-ID: <1f37e107-eb01-3a23-04a5-ba4c6e60d63d@intel.com> Date: Fri, 20 Oct 2017 23:26:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-stable] [PATCH] igb_uio: revert open and release operations X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Oct 2017 15:26:21 -0000 Hi Ferruh & Harish, On 10/20/2017 9:15 AM, Ferruh Yigit wrote: > On 10/19/2017 3:43 PM, Patil, Harish wrote: >> -----Original Message----- >> From: Harish Patil >> Date: Tuesday, October 17, 2017 at 9:50 PM >> To: Thomas Monjalon , Ferruh Yigit >> >> Cc: "dev@dpdk.org" , Jianfeng Tan , >> Jingjing Wu , "Thotton, Shijith" >> , Gregory Etelson , George >> Prekas , "stable@dpdk.org" >> Subject: Re: [PATCH] igb_uio: revert open and release operations >>> >>> -----Original Message----- >>> From: Thomas Monjalon >>> Date: Tuesday, October 17, 2017 at 1:33 PM >>> To: Ferruh Yigit , Harish Patil >>> >>> Cc: "dev@dpdk.org" , Jianfeng Tan , >>> Jingjing Wu , "Thotton, Shijith" >>> , Gregory Etelson , George >>> Prekas , "stable@dpdk.org" >>> Subject: Re: [PATCH] igb_uio: revert open and release operations >>> >>>> 17/10/2017 22:14, Ferruh Yigit: >>>>> There were bug reports about terminated application may leave device in >>>>> undesired state: >>>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html >>>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html >>>>> >>>>> And a proposal to fix: >>>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html >>>>> >>>>> Later another proposal triggered the discussion: >>>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html >>>>> >>>>> Finally a fix patch pushed into v17.08: >>>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >>>>> device file") >>>>> >>>>> Later a regression report sent related to the pushed patch: >>>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html >>>>> >>>>> And a fix for regression integrated into v17.11-rc1: >>>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html >>>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >>>>> VM") >>>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17") >>>>> >>>>> Even after the fix qede PMD reported to be broken: >>>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html >>>>> >>>>> So this patch reverts original fix and related commits. The related >>>>> igb_uio code part turns back to v17.05 base. >>>> [...] >>>>> --- >>>>> It would be nice to solve this issue in LTS release, but being close to >>>>> the release and the error report without details makes it hard to work >>>>> more on this issue. >>>> With this revert, we are back to the original issue. >>>> We must really try the proposed solution. >>>> Harish, please describe your issue and think how it could be fixed. >>>> Jingjing made it work for i40e. >>>> I know it is less effort to request a simple revert. >>>> Please let's try to fix it once for all. >>> Hi Ferruh/Thomas, >>> I’m discussing it internally, so please hold on committing this patch till >>> I revert back to you. >>> >>> Thanks. >>> [Harish] >>> 1) With the introduction of: >>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >>> device file”) >> We saw failures with qede PF & SR-IOV VF initialization in PCI passthru >> mode. >> >>> PF PCI passthru mode initialization failure was resolved by: >>> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >>> VM") > Thank you for the update. > >>> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and >>> related device cleanup is not completed by the time VF driver starts >>> loading. It results in the mbox command failure sent over the HW channel >>> between VF and PF. > This seems same reason why i40e added a check and wait loop. > >>> Even though pci_reset_function() waits for the stipulated amount of time >>> per standards, VF FLR takes longer than that and pci_reset_function() & >>> igb_uio_open() call returns before FLR completes and VF PMD driver tries >>> to load before FLR completes leading to VF PMD initialization failure. >>> >>> >>> We can work around this problem by adding driver delay/retry logic since >>> there is no deterministic way of detecting FLR completion. But this is >>> going to increase the driver load time. >>> >>> 2) With the above patch ("igb_uio: issue FLR during open and release of >>> device file), FLR is going to be issued unconditionally on all devices >>> during igb_uio_open. We think it’s an over kill. FLR is required only for >>> previous abnormal app termination. We already handle the abnormal app >>> termination by doing necessary cleanup in the driver during load. This >>> cleanup is more efficient as it is done only when required. So we feel >>> that the drivers/devices needing such cleanup (the two cases listed >>> below) should do it conditionally when required rather than >>> igb_uio_open() unconditionally performing FLR all the time. >>> >>> e.g., >> - cdb166963cae ("net/liquidio: add API for VF FLR”) > Both 1) and 2) related to the pci_reset during open(). > But the main functionality we are looking for is the pci_reset in release(). > So we can remove reset during open() [1]. > > Will disabling pci_reset in open() [2] solve your problems? > > Jingjing, Jianfeng, > What do you think? If [2] can work for all drivers, I agree with this option. Thanks, Jianfeng > > > [1] > Perhaps will need to revert or partially revert liquidio patch after this. > > [2] > disable line "pci_reset_function(dev);" in the igbuio_pci_open() > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339 >