From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <thomas@monjalon.net>
Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com
 [66.111.4.29]) by dpdk.org (Postfix) with ESMTP id DCBC491
 for <dev@dpdk.org>; Mon, 12 Nov 2018 02:35:46 +0100 (CET)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id 5838D20EA9;
 Sun, 11 Nov 2018 20:35:45 -0500 (EST)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Sun, 11 Nov 2018 20:35:45 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=mesmtp;
 bh=vvZ2F3/9p3Nwf5pRszFzIYHDnZEGiswav8B3H5LCD6o=; b=k5OtEVPUK+jP
 649A0X5rGQxh+7OKjHMkyWHX1Bx+fjNn459/axzbV5uHrKc9AzfGmyqusBdmuzdv
 oS1r4WfbVnBzPU9SBKPrvwlMTm/Wr8+S+LDF3zLmui4f0MVvXtbJz+/7Ae9OFr/9
 YmrmRl9Fv11Egcjd6NJ/siuRU5ZDU8s=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm1; bh=vvZ2F3/9p3Nwf5pRszFzIYHDnZEGiswav8B3H5LCD
 6o=; b=AhLzv59aWRCX+t/B+X7all8hg872pi3qFjCf1txIPvqwi6kVMRrXrbvJx
 JpsIwA6InfcEoIu9TQjio11aA5Mglw6KMoPUiYcivSxV7RLOQcgr5m94PhuG0Zxr
 cccyv4xZLgRaTxxtOzBkD516j5LuqfPGedltubDRcyhdoIoG2eMgoFjqrlSawK+J
 QB3qJVWT1UxuuEQC4gWCh512LfGp2nwJVruQknfvI0aJAREkZctWS+HVTy+qoztl
 MYGD0Hevdom/lvwyEtxyPdzyJgtyjJi9d2HT76zonDy8ODCDqdJPxh60V35pFQkT
 FfMkTdwOddjzb5QOZEcgQZx026jSg==
X-ME-Sender: <xms:8NjoWwVC-F0JZVeX-OsKp8cputSIGF_8JT34pkvvwKEiojfGualm9g>
X-ME-Proxy: <xmx:8NjoWzrzVNSfGzXBP4DONGouJgBRy4blBcvbdgrPKnZekd4OJPyH9Q>
 <xmx:8NjoW92hcnSZB6anoRbXbRJP9o__q5JwFogwxB2T3uy-g5ceTxtXIw>
 <xmx:8NjoWyDC0ZcwMYiCAkL6hQ1ofvRBPj-KJF3Ie9BCFAM0Oo0HEjmmBQ>
 <xmx:8NjoW1fl_rfj3JxOHW5nc0AuDho4GfdAuGwsa6GRpNQPhKgtyfyPdA>
 <xmx:8NjoW3i7X9XnKXIPLeKCoENX8-qLoAoSNoe02oCjUNBkvWPW0iEhug>
 <xmx:8djoW99sNg_5P1KBvJrSCqJ--U3yniDwst_5hmT_uYfjxifDpU6frg>
Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184])
 by mail.messagingengine.com (Postfix) with ESMTPA id 78AEFE476E;
 Sun, 11 Nov 2018 20:35:43 -0500 (EST)
From: Thomas Monjalon <thomas@monjalon.net>
To: Jeff Guo <jia.guo@intel.com>
Cc: dev@dpdk.org, Matan Azrad <matan@mellanox.com>, "Ananyev,
 Konstantin" <konstantin.ananyev@intel.com>, "Burakov,
 Anatoly" <anatoly.burakov@intel.com>, "Iremonger,
 Bernard" <bernard.iremonger@intel.com>, "Wu, Jingjing" <jingjing.wu@intel.com>,
 "Lu, Wenzhuo" <wenzhuo.lu@intel.com>, "Yigit, Ferruh" <ferruh.yigit@intel.com>,
 "Zhang, Helin" <helin.zhang@intel.com>, "He, Shaopeng" <shaopeng.he@intel.com>
Date: Mon, 12 Nov 2018 02:35:42 +0100
Message-ID: <4481513.QvlXUHQEVQ@xps>
In-Reply-To: <AM0PR0502MB401938411A7E2BA9E76576A2D2C00@AM0PR0502MB4019.eurprd05.prod.outlook.com>
References: <1541484436-91320-1-git-send-email-jia.guo@intel.com>
 <8bac8c10-d30b-ab93-1d6c-03e7d93b97c3@intel.com>
 <AM0PR0502MB401938411A7E2BA9E76576A2D2C00@AM0PR0502MB4019.eurprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Subject: Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for
	hot-unplug
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Nov 2018 01:35:47 -0000

11/11/2018 08:31, Matan Azrad:
> From: Jeff Guo 
> > On 11/9/2018 1:24 PM, Matan Azrad wrote:
> > >   From: Jeff Guo
> > >> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> > >>> From: Jeff Guo
> > >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> > >>>>> From: Ananyev, Konstantin
> > >>>>>> From: Guo, Jia
> > >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > >>>>>>>> Hi Jeff
> > >>>>>>>>
> > >>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> Before detach device when device be hot-unplugged, the failure
> > >>>>>>>>> process in user space and kernel space both need to be
> > >>>>>>>>> finished, such as eal interrupt callback need to be inactive
> > >>>>>>>>> before the callback be unregistered when device is being
> > >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
> > >>>>>>>>> that it could finish interrupt callback soon and give time to
> > >>>>>>>>> let the failure process done
> > >>>>>> before detaching.
> > >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
> > >>>>>>>>> handler")
> > >>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> ---
> > >>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
> > >>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > >>>>>>>>> index 9c0edca..9c673cf 100644
> > >>>>>>>>> --- a/app/test-pmd/testpmd.c
> > >>>>>>>>> +++ b/app/test-pmd/testpmd.c
> > >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> > >>>>>>>>> *device_name, enum rte_dev_event_type type,
> > >>>>>>>>>      				device_name);
> > >>>>>>>>>      			return;
> > >>>>>>>>>      		}
> > >>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> > >>>>>>>>> +		/*
> > >>>>>>>>> +		 * Before detach device, the hot-unplug failure
> > >>>> process in
> > >>>>>>>>> +		 * user space and kernel space both need to be
> > >>>> finished,
> > >>>>>>>>> +		 * such as eal interrupt callback need to be inactive
> > >>>> before
> > >>>>>>>>> +		 * the callback be unregistered when device is being
> > >>>> cleaned.
> > >>>>>>>>> +		 * So finished interrupt callback soon here and give
> > >>>> time to
> > >>>>>>>>> +		 * let the work done before detaching.
> > >>>>>>>>> +		 */
> > >>>>>>>>> +		if (rte_eal_alarm_set(100000,
> > >>>>>>>>> +				rmv_event_callback, (void
> > >>>>>>>>> *)(intptr_t)port_id))
> > >>>>>>>>> +			RTE_LOG(ERR, EAL,
> > >>>>>>>>> +				"Could not set up deferred device
> > >>>>>>>> It looks me strange to use callback and alarm to remove a device:
> > >>>>>>>> Why not to use callback and that is it?
> > >>>>>>>>
> > >>>>>>>> I think that it's better to let the EAL to detach the device
> > >>>>>>>> after all the
> > >>>>>> callbacks were done and not to do it by the user callback.
> > >>>>>>>> So the application\callback owners just need to clean its
> > >>>>>>>> resources with understanding that after the callback the
> > >>>>>>>> device(and the callback
> > >>>>>>> itself) will be detached by the EAL.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Firstly, at the currently framework and solution, such as
> > >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
> > >>>>>>> deferred device
> > >>>>>> removal,
> > >>>>>>> we tend to give the control of detaching device to the
> > >>>>>>> application, and the whole process is located on the user's
> > >>>>>>> callback. Notify app to detach device by callback but make it
> > >>>>>>> deferred,
> > >> i think it is fine.
> > >>>>> But the device must be detached in remove event, why not to do it
> > >>>>> in
> > >> EAL?
> > >>>> I think it because of before detached the device, application
> > >>>> should be stop the forwarding at first, then stop the device, then
> > >>>> close
> > >>>>
> > >>>> the device, finally call eal unplug API to detach device. If eal
> > >>>> directly detach device at the first step, there will be mountain
> > >>>> user space error need to handle, so that is one reason why need to
> > >>>> provider the remove notification to app, and let app to process it.
> > >>> This is why the EAL need to detach the device only after all the
> > >>> user
> > >> callbacks were done.
> > >>
> > >>
> > >> If i correctly got your meaning, you suppose to let eal to mandatory
> > >> detach device but not app, app just need to stop/close port, right?
> > > Yes, the app should stop,close,clean its own resources of the removed
> > > device, Then, EAL to detach the device.
> > >
> > >> If so, it will need to modify rmv_event_callback by not invoke the
> > >> detaching func and add some detaching logic to hotplug framework in eal.
> > >>
> > > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
> > event), so you need to use another one\ add parameter to it.
> > > And yes, you need to detach the device from EAL, should be simple.
> > 
> > 
> > I think rmv_event_callback is original use for other hotplug event (ETHDEV
> > RMV event), but it still use the common hotplug mechanism(app callback and
> > app detach),
> > 
> > so i think it will still need to face this callback issue and you could check that
> > eth_event_callback also use the rte alarm to detach device.
> 
> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.

Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed.

> But, in the EAL event case this is different as I explained before.
> 
> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable.

Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE.

> > so my suggestion is that, you maybe propose a good idea but let we keep on
> > current mechanism until we come to a final good solution agreement, before
> > that, just let it functional.
> 
> I still suggest to fix the issue by an EAL detaching.

I don't understand the issue, but yes we can call rte_dev_remove in EAL,
after the callback return.
Ideally, the callback should be able to return the decision of cleaning
the device or not.

I suggest several steps (a roadmap for HW unplug):
	- in 18.11, remove call to detach_port_device() from rmv_event_callback()
	- in 18.11, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback
	- in 18.11, remove call to rmv_event_callback() in eth_dev_event_callback()
	- in 18.11, rename eth_dev_event_callback to dev_event_callback
	- in 19.02, convert all PMDs to free port resources on rte_eth_dev_close()
	- in 19.05, automatically call rte_dev_remove() from PMD when closing last port
	- in 19.08, we may try to unbind the kernel driver in rte_dev_remove()