From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id A4BB34CBB for ; Wed, 14 Nov 2018 10:32:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Nov 2018 01:32:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,231,1539673200"; d="scan'208";a="107951670" Received: from jguo15x-mobl.ccr.corp.intel.com (HELO [10.67.68.76]) ([10.67.68.76]) by fmsmga001.fm.intel.com with ESMTP; 14 Nov 2018 01:32:11 -0800 To: Thomas Monjalon Cc: dev@dpdk.org, Matan Azrad , "Ananyev, Konstantin" , "Burakov, Anatoly" , "Iremonger, Bernard" , "Wu, Jingjing" , "Lu, Wenzhuo" , "Yigit, Ferruh" , "Zhang, Helin" , "He, Shaopeng" References: <1541484436-91320-1-git-send-email-jia.guo@intel.com> <8bac8c10-d30b-ab93-1d6c-03e7d93b97c3@intel.com> <4481513.QvlXUHQEVQ@xps> From: Jeff Guo Message-ID: <0c5b7cbe-59e8-1e85-3205-f046d5a6dd0e@intel.com> Date: Wed, 14 Nov 2018 17:32:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <4481513.QvlXUHQEVQ@xps> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2018 09:32:16 -0000 On 11/12/2018 9:35 AM, Thomas Monjalon wrote: > 11/11/2018 08:31, Matan Azrad: >> From: Jeff Guo >>> On 11/9/2018 1:24 PM, Matan Azrad wrote: >>>> From: Jeff Guo >>>>> On 11/8/2018 5:35 PM, Matan Azrad wrote: >>>>>> From: Jeff Guo >>>>>>> On 11/8/2018 3:28 PM, Matan Azrad wrote: >>>>>>>> From: Ananyev, Konstantin >>>>>>>>> From: Guo, Jia >>>>>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote: >>>>>>>>>>> Hi Jeff >>>>>>>>>>> >>>>>>>>>>> From: Jeff Guo >>>>>>>>>>>> Before detach device when device be hot-unplugged, the failure >>>>>>>>>>>> process in user space and kernel space both need to be >>>>>>>>>>>> finished, such as eal interrupt callback need to be inactive >>>>>>>>>>>> before the callback be unregistered when device is being >>>>>>>>>>>> cleaned. This patch add rte alarm for device detaching, with >>>>>>>>>>>> that it could finish interrupt callback soon and give time to >>>>>>>>>>>> let the failure process done >>>>>>>>> before detaching. >>>>>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure >>>>>>>>>>>> handler") >>>>>>>>>>>> Signed-off-by: Jeff Guo >>>>>>>>>>>> --- >>>>>>>>>>>> app/test-pmd/testpmd.c | 13 ++++++++++++- >>>>>>>>>>>> 1 file changed, 12 insertions(+), 1 deletion(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c >>>>>>>>>>>> index 9c0edca..9c673cf 100644 >>>>>>>>>>>> --- a/app/test-pmd/testpmd.c >>>>>>>>>>>> +++ b/app/test-pmd/testpmd.c >>>>>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char >>>>>>>>>>>> *device_name, enum rte_dev_event_type type, >>>>>>>>>>>> device_name); >>>>>>>>>>>> return; >>>>>>>>>>>> } >>>>>>>>>>>> - rmv_event_callback((void *)(intptr_t)port_id); >>>>>>>>>>>> + /* >>>>>>>>>>>> + * Before detach device, the hot-unplug failure >>>>>>> process in >>>>>>>>>>>> + * user space and kernel space both need to be >>>>>>> finished, >>>>>>>>>>>> + * such as eal interrupt callback need to be inactive >>>>>>> before >>>>>>>>>>>> + * the callback be unregistered when device is being >>>>>>> cleaned. >>>>>>>>>>>> + * So finished interrupt callback soon here and give >>>>>>> time to >>>>>>>>>>>> + * let the work done before detaching. >>>>>>>>>>>> + */ >>>>>>>>>>>> + if (rte_eal_alarm_set(100000, >>>>>>>>>>>> + rmv_event_callback, (void >>>>>>>>>>>> *)(intptr_t)port_id)) >>>>>>>>>>>> + RTE_LOG(ERR, EAL, >>>>>>>>>>>> + "Could not set up deferred device >>>>>>>>>>> It looks me strange to use callback and alarm to remove a device: >>>>>>>>>>> Why not to use callback and that is it? >>>>>>>>>>> >>>>>>>>>>> I think that it's better to let the EAL to detach the device >>>>>>>>>>> after all the >>>>>>>>> callbacks were done and not to do it by the user callback. >>>>>>>>>>> So the application\callback owners just need to clean its >>>>>>>>>>> resources with understanding that after the callback the >>>>>>>>>>> device(and the callback >>>>>>>>>> itself) will be detached by the EAL. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Firstly, at the currently framework and solution, such as >>>>>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the >>>>>>>>>> deferred device >>>>>>>>> removal, >>>>>>>>>> we tend to give the control of detaching device to the >>>>>>>>>> application, and the whole process is located on the user's >>>>>>>>>> callback. Notify app to detach device by callback but make it >>>>>>>>>> deferred, >>>>> i think it is fine. >>>>>>>> But the device must be detached in remove event, why not to do it >>>>>>>> in >>>>> EAL? >>>>>>> I think it because of before detached the device, application >>>>>>> should be stop the forwarding at first, then stop the device, then >>>>>>> close >>>>>>> >>>>>>> the device, finally call eal unplug API to detach device. If eal >>>>>>> directly detach device at the first step, there will be mountain >>>>>>> user space error need to handle, so that is one reason why need to >>>>>>> provider the remove notification to app, and let app to process it. >>>>>> This is why the EAL need to detach the device only after all the >>>>>> user >>>>> callbacks were done. >>>>> >>>>> >>>>> If i correctly got your meaning, you suppose to let eal to mandatory >>>>> detach device but not app, app just need to stop/close port, right? >>>> Yes, the app should stop,close,clean its own resources of the removed >>>> device, Then, EAL to detach the device. >>>> >>>>> If so, it will need to modify rmv_event_callback by not invoke the >>>>> detaching func and add some detaching logic to hotplug framework in eal. >>>>> >>>> rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV >>> event), so you need to use another one\ add parameter to it. >>>> And yes, you need to detach the device from EAL, should be simple. >>> >>> I think rmv_event_callback is original use for other hotplug event (ETHDEV >>> RMV event), but it still use the common hotplug mechanism(app callback and >>> app detach), >>> >>> so i think it will still need to face this callback issue and you could check that >>> eth_event_callback also use the rte alarm to detach device. >> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device, >> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case. > Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed. > >> But, in the EAL event case this is different as I explained before. >> >> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable. > Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE. > >>> so my suggestion is that, you maybe propose a good idea but let we keep on >>> current mechanism until we come to a final good solution agreement, before >>> that, just let it functional. >> I still suggest to fix the issue by an EAL detaching. > I don't understand the issue, but yes we can call rte_dev_remove in EAL, > after the callback return. > Ideally, the callback should be able to return the decision of cleaning > the device or not. > > I suggest several steps (a roadmap for HW unplug): > > I mainly agree on that we need to modify currently hotplug framework for HW hot-unplug. But i think we might be think about that. 1) simple delete detach_port_device from rmv_event_callback will affect both currently ethdev event and eal event hot-unplug mechanism. 2) If we remove call to rmv_event_callback in eth_dev_event_callback() , we also need to remove call to rmv_event_callback in eth_event_callback(),  since it also use deferred device removal to waiting the interrupt callback be unregistered in mlx5_dev_interrupt_handler_uninstall. so i think that basically the multiple port free and the hotplug framework integrate should be an RFC at the begin of 19.02, but should not affect currently mechanism. And i suggest the several steps might be better as. - in 18.11, rename eth_dev_event_callback to dev_event_callback - in 18.11, rename rmv_event_callback() to rmv_port_callback(), keep current function calling when detach one port both for ethdev event and eal event. (we could document it for the limitation.) - in 19.02, remove call to detach_port_device() from rmv_event_callback() - in 19.02, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback - in 19.02, remove call to rmv_event_callback() in eth_dev_event_callback() - in 19.02, convert all PMDs to free port resources on rte_eth_dev_close() - in 19.05, automatically call rte_dev_remove() from PMD when closing last port - in 19.08, we may try to unbind the kernel driver in rte_dev_remove()