DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue
@ 2018-11-06  6:07 Jeff Guo
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug Jeff Guo
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-06  6:07 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

This patch set aim to fix some dead lock issue when device
be hotplug-in after device be hot-unplugged for vfio.

Jeff Guo (3):
  eal: fix lock issue for hot-unplug
  vfio: fix to add handler lock for hot-unplug
  app/testpmd: fix callback issue for hot-unplug

 app/test-pmd/testpmd.c                | 13 ++++++++++++-
 drivers/bus/pci/linux/pci_vfio.c      | 14 +++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
 3 files changed, 33 insertions(+), 6 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug
  2018-11-06  6:07 [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue Jeff Guo
@ 2018-11-06  6:07 ` Jeff Guo
  2018-11-06  6:22   ` Matan Azrad
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock " Jeff Guo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-06  6:07 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

This patch will add missing unlock for hot-unplug handler, without this
patch potential dead lock will occur when device be hotplug-in after
device be hot-unplugged.

Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index d589c69..2830c86 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -258,7 +258,7 @@ dev_uev_handler(__rte_unused void *param)
 			if (bus == NULL) {
 				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
 					busname);
-				return;
+				goto failure_handle_err;
 			}
 
 			dev = bus->find_device(NULL, cmp_dev_name,
@@ -266,19 +266,23 @@ dev_uev_handler(__rte_unused void *param)
 			if (dev == NULL) {
 				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
 					"bus (%s)\n", uevent.devname, busname);
-				return;
+				goto failure_handle_err;
 			}
 
 			ret = bus->hot_unplug_handler(dev);
-			rte_spinlock_unlock(&failure_handle_lock);
 			if (ret) {
 				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
 					"for device (%s)\n", dev->name);
-				return;
 			}
+			rte_spinlock_unlock(&failure_handle_lock);
 		}
 		rte_dev_event_callback_process(uevent.devname, uevent.type);
 	}
+
+	return;
+
+failure_handle_err:
+	rte_spinlock_unlock(&failure_handle_lock);
 }
 
 int __rte_experimental
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-06  6:07 [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue Jeff Guo
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug Jeff Guo
@ 2018-11-06  6:07 ` Jeff Guo
  2018-11-06  6:23   ` Matan Azrad
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue " Jeff Guo
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-06  6:07 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

This patch add hot-unplug handler lock and unlock in device request
handler when process bus and device resource, in order to avoid the
synchronization issue when device be hot-unplugged.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 305cc06..d2c8410 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -19,6 +19,7 @@
 #include <rte_vfio.h>
 #include <rte_eal.h>
 #include <rte_bus.h>
+#include <rte_spinlock.h>
 
 #include "eal_filesystem.h"
 
@@ -35,6 +36,14 @@
  * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
  */
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
 #ifdef VFIO_PRESENT
 
 #ifndef PAGE_SIZE
@@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
 	int ret;
 	struct rte_device *device = (struct rte_device *)param;
 
+	rte_spinlock_lock(&failure_handle_lock);
 	bus = rte_bus_find_by_device(device);
 	if (bus == NULL) {
 		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
 			device->name);
-		return;
+		goto handle_end;
 	}
 
 	/*
@@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
 		RTE_LOG(ERR, EAL,
 			"Can not handle hot-unplug for device (%s)\n",
 			device->name);
+handle_end:
+	rte_spinlock_unlock(&failure_handle_lock);
 }
 
 /* enable notifier (only enable req now) */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-06  6:07 [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue Jeff Guo
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug Jeff Guo
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock " Jeff Guo
@ 2018-11-06  6:07 ` Jeff Guo
  2018-11-06  6:36   ` Matan Azrad
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-06  6:07 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

Before detach device when device be hot-unplugged, the failure process
in user space and kernel space both need to be finished, such as eal
interrupt callback need to be inactive before the callback be unregistered
when device is being cleaned. This patch add rte alarm for device
detaching, with that it could finish interrupt callback soon and
give time to let the failure process done before detaching.

Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 app/test-pmd/testpmd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9c0edca..9c673cf 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 				device_name);
 			return;
 		}
-		rmv_event_callback((void *)(intptr_t)port_id);
+		/*
+		 * Before detach device, the hot-unplug failure process in
+		 * user space and kernel space both need to be finished,
+		 * such as eal interrupt callback need to be inactive before
+		 * the callback be unregistered when device is being cleaned.
+		 * So finished interrupt callback soon here and give time to
+		 * let the work done before detaching.
+		 */
+		if (rte_eal_alarm_set(100000,
+				rmv_event_callback, (void *)(intptr_t)port_id))
+			RTE_LOG(ERR, EAL,
+				"Could not set up deferred device removal\n");
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug Jeff Guo
@ 2018-11-06  6:22   ` Matan Azrad
  2018-11-07  5:49     ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-06  6:22 UTC (permalink / raw)
  To: Jeff Guo, konstantin.ananyev, anatoly.burakov, Thomas Monjalon,
	bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he

Hi Jeff

Can you detail more in the commit log that we can understand the deadlock scenario. And how does this commit fix it? 

> -----Original Message-----
> From: Jeff Guo <jia.guo@intel.com>
> Sent: Tuesday, November 6, 2018 8:07 AM
> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> shaopeng.he@intel.com
> Subject: [PATCH 1/3] eal: fix lock issue for hot-unplug
> 
> This patch will add missing unlock for hot-unplug handler, without this patch
> potential dead lock will occur when device be hotplug-in after device be hot-
> unplugged.
> 
> Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index d589c69..2830c86 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -258,7 +258,7 @@ dev_uev_handler(__rte_unused void *param)
>  			if (bus == NULL) {
>  				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>  					busname);
> -				return;
> +				goto failure_handle_err;
>  			}
> 
>  			dev = bus->find_device(NULL, cmp_dev_name, @@
> -266,19 +266,23 @@ dev_uev_handler(__rte_unused void *param)
>  			if (dev == NULL) {
>  				RTE_LOG(ERR, EAL, "Cannot find device (%s)
> on "
>  					"bus (%s)\n", uevent.devname,
> busname);
> -				return;
> +				goto failure_handle_err;
>  			}
> 
>  			ret = bus->hot_unplug_handler(dev);
> -			rte_spinlock_unlock(&failure_handle_lock);
>  			if (ret) {
>  				RTE_LOG(ERR, EAL, "Can not handle hot-
> unplug "
>  					"for device (%s)\n", dev->name);
> -				return;
>  			}
> +			rte_spinlock_unlock(&failure_handle_lock);
>  		}
>  		rte_dev_event_callback_process(uevent.devname,
> uevent.type);
>  	}
> +
> +	return;
> +
> +failure_handle_err:
> +	rte_spinlock_unlock(&failure_handle_lock);
>  }
> 
>  int __rte_experimental
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock " Jeff Guo
@ 2018-11-06  6:23   ` Matan Azrad
  2018-11-07  6:15     ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-06  6:23 UTC (permalink / raw)
  To: Jeff Guo, konstantin.ananyev, anatoly.burakov, Thomas Monjalon,
	bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he

Hi Jeff

Can you detail more in the commit log that we can understand the synchronization problematic scenario. And how does this commit fix it? 


> -----Original Message-----
> From: Jeff Guo <jia.guo@intel.com>
> Sent: Tuesday, November 6, 2018 8:07 AM
> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> shaopeng.he@intel.com
> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
> 
> This patch add hot-unplug handler lock and unlock in device request handler
> when process bus and device resource, in order to avoid the synchronization
> issue when device be hot-unplugged.
> 
> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 305cc06..d2c8410 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -19,6 +19,7 @@
>  #include <rte_vfio.h>
>  #include <rte_eal.h>
>  #include <rte_bus.h>
> +#include <rte_spinlock.h>
> 
>  #include "eal_filesystem.h"
> 
> @@ -35,6 +36,14 @@
>   * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>   */
> 
> +/*
> + * spinlock for device hot-unplug failure handling. If it try to access
> +bus or
> + * device, such as handle sigbus on bus or handle memory failure for
> +device
> + * just need to use this lock. It could protect the bus and the device
> +to avoid
> + * race condition.
> + */
> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
> +
>  #ifdef VFIO_PRESENT
> 
>  #ifndef PAGE_SIZE
> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>  	int ret;
>  	struct rte_device *device = (struct rte_device *)param;
> 
> +	rte_spinlock_lock(&failure_handle_lock);
>  	bus = rte_bus_find_by_device(device);
>  	if (bus == NULL) {
>  		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>  			device->name);
> -		return;
> +		goto handle_end;
>  	}
> 
>  	/*
> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>  		RTE_LOG(ERR, EAL,
>  			"Can not handle hot-unplug for device (%s)\n",
>  			device->name);
> +handle_end:
> +	rte_spinlock_unlock(&failure_handle_lock);
>  }
> 
>  /* enable notifier (only enable req now) */
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue " Jeff Guo
@ 2018-11-06  6:36   ` Matan Azrad
  2018-11-07  7:30     ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-06  6:36 UTC (permalink / raw)
  To: Jeff Guo, konstantin.ananyev, anatoly.burakov, Thomas Monjalon,
	bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he

Hi Jeff

 From: Jeff Guo <jia.guo@intel.com>
> Before detach device when device be hot-unplugged, the failure process in
> user space and kernel space both need to be finished, such as eal interrupt
> callback need to be inactive before the callback be unregistered when device
> is being cleaned. This patch add rte alarm for device detaching, with that it
> could finish interrupt callback soon and give time to let the failure process
> done before detaching.
> 
> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  app/test-pmd/testpmd.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 9c0edca..9c673cf 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> *device_name, enum rte_dev_event_type type,
>  				device_name);
>  			return;
>  		}
> -		rmv_event_callback((void *)(intptr_t)port_id);
> +		/*
> +		 * Before detach device, the hot-unplug failure process in
> +		 * user space and kernel space both need to be finished,
> +		 * such as eal interrupt callback need to be inactive before
> +		 * the callback be unregistered when device is being cleaned.
> +		 * So finished interrupt callback soon here and give time to
> +		 * let the work done before detaching.
> +		 */
> +		if (rte_eal_alarm_set(100000,
> +				rmv_event_callback, (void
> *)(intptr_t)port_id))
> +			RTE_LOG(ERR, EAL,
> +				"Could not set up deferred device


It looks me strange to use callback and alarm to remove a device:
Why not to use callback and that is it?

I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback itself) will be detached by the EAL. 


> removal\n");
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug
  2018-11-06  6:22   ` Matan Azrad
@ 2018-11-07  5:49     ` Jeff Guo
  2018-11-08  7:08       ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-07  5:49 UTC (permalink / raw)
  To: Matan Azrad, konstantin.ananyev, anatoly.burakov,
	Thomas Monjalon, bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he

hi matan

On 11/6/2018 2:22 PM, Matan Azrad wrote:
> Hi Jeff
>
> Can you detail more in the commit log that we can understand the deadlock scenario. And how does this commit fix it?


Before i add more detail in the commit log of next version, i would 
explain to you here at first here.

When the device be hot-unplugged,  the hot-unplug handler will be 
invoked and the device will be detached, at this time if the interrupt 
still not disable soon and the second

remove event come again(kernel will sent pci remove event after sent uio 
remove event) , the bus->find_device will return null and return, at 
this place lack of an unlock.

Without this unlock, it will block the next remove or add event 
detection. So it definitely need an unlock here to avoid dead lock.


>> -----Original Message-----
>> From: Jeff Guo <jia.guo@intel.com>
>> Sent: Tuesday, November 6, 2018 8:07 AM
>> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
>> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
>> jingjing.wu@intel.com; wenzhuo.lu@intel.com
>> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
>> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
>> shaopeng.he@intel.com
>> Subject: [PATCH 1/3] eal: fix lock issue for hot-unplug
>>
>> This patch will add missing unlock for hot-unplug handler, without this patch
>> potential dead lock will occur when device be hotplug-in after device be hot-
>> unplugged.
>>
>> Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index d589c69..2830c86 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -258,7 +258,7 @@ dev_uev_handler(__rte_unused void *param)
>>   			if (bus == NULL) {
>>   				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>>   					busname);
>> -				return;
>> +				goto failure_handle_err;
>>   			}
>>
>>   			dev = bus->find_device(NULL, cmp_dev_name, @@
>> -266,19 +266,23 @@ dev_uev_handler(__rte_unused void *param)
>>   			if (dev == NULL) {
>>   				RTE_LOG(ERR, EAL, "Cannot find device (%s)
>> on "
>>   					"bus (%s)\n", uevent.devname,
>> busname);
>> -				return;
>> +				goto failure_handle_err;
>>   			}
>>
>>   			ret = bus->hot_unplug_handler(dev);
>> -			rte_spinlock_unlock(&failure_handle_lock);
>>   			if (ret) {
>>   				RTE_LOG(ERR, EAL, "Can not handle hot-
>> unplug "
>>   					"for device (%s)\n", dev->name);
>> -				return;
>>   			}
>> +			rte_spinlock_unlock(&failure_handle_lock);
>>   		}
>>   		rte_dev_event_callback_process(uevent.devname,
>> uevent.type);
>>   	}
>> +
>> +	return;
>> +
>> +failure_handle_err:
>> +	rte_spinlock_unlock(&failure_handle_lock);
>>   }
>>
>>   int __rte_experimental
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-06  6:23   ` Matan Azrad
@ 2018-11-07  6:15     ` Jeff Guo
  2018-11-08  7:20       ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-07  6:15 UTC (permalink / raw)
  To: Matan Azrad, konstantin.ananyev, anatoly.burakov,
	Thomas Monjalon, bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he


On 11/6/2018 2:23 PM, Matan Azrad wrote:
> Hi Jeff
>
> Can you detail more in the commit log that we can understand the synchronization problematic scenario. And how does this commit fix it?
>

Please check my reply in the 1/3 mail. And explain more here is that, 
when device be hot-unplugged in vfio, the req notifier will invoked, 
then user space could release device resource in user space side,

then vfio check that the device be released out from the device group, 
it will take the device control again and trigger the device kernel 
release processing, at the mean time it will sent remove uevent to

user space. Here although the req handler seems will always process 
before uevent handler, but even for fast path and slow path protection 
of device accessing when device is removing , it should also be need.

what do you think about that?


>> -----Original Message-----
>> From: Jeff Guo <jia.guo@intel.com>
>> Sent: Tuesday, November 6, 2018 8:07 AM
>> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
>> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
>> jingjing.wu@intel.com; wenzhuo.lu@intel.com
>> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
>> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
>> shaopeng.he@intel.com
>> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
>>
>> This patch add hot-unplug handler lock and unlock in device request handler
>> when process bus and device resource, in order to avoid the synchronization
>> issue when device be hot-unplugged.
>>
>> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
>> index 305cc06..d2c8410 100644
>> --- a/drivers/bus/pci/linux/pci_vfio.c
>> +++ b/drivers/bus/pci/linux/pci_vfio.c
>> @@ -19,6 +19,7 @@
>>   #include <rte_vfio.h>
>>   #include <rte_eal.h>
>>   #include <rte_bus.h>
>> +#include <rte_spinlock.h>
>>
>>   #include "eal_filesystem.h"
>>
>> @@ -35,6 +36,14 @@
>>    * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>>    */
>>
>> +/*
>> + * spinlock for device hot-unplug failure handling. If it try to access
>> +bus or
>> + * device, such as handle sigbus on bus or handle memory failure for
>> +device
>> + * just need to use this lock. It could protect the bus and the device
>> +to avoid
>> + * race condition.
>> + */
>> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>>   #ifdef VFIO_PRESENT
>>
>>   #ifndef PAGE_SIZE
>> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>>   	int ret;
>>   	struct rte_device *device = (struct rte_device *)param;
>>
>> +	rte_spinlock_lock(&failure_handle_lock);
>>   	bus = rte_bus_find_by_device(device);
>>   	if (bus == NULL) {
>>   		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>>   			device->name);
>> -		return;
>> +		goto handle_end;
>>   	}
>>
>>   	/*
>> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>>   		RTE_LOG(ERR, EAL,
>>   			"Can not handle hot-unplug for device (%s)\n",
>>   			device->name);
>> +handle_end:
>> +	rte_spinlock_unlock(&failure_handle_lock);
>>   }
>>
>>   /* enable notifier (only enable req now) */
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-06  6:36   ` Matan Azrad
@ 2018-11-07  7:30     ` Jeff Guo
  2018-11-07 11:05       ` Ananyev, Konstantin
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-07  7:30 UTC (permalink / raw)
  To: Matan Azrad, konstantin.ananyev, anatoly.burakov,
	Thomas Monjalon, bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he

matan

On 11/6/2018 2:36 PM, Matan Azrad wrote:
> Hi Jeff
>
>   From: Jeff Guo <jia.guo@intel.com>
>> Before detach device when device be hot-unplugged, the failure process in
>> user space and kernel space both need to be finished, such as eal interrupt
>> callback need to be inactive before the callback be unregistered when device
>> is being cleaned. This patch add rte alarm for device detaching, with that it
>> could finish interrupt callback soon and give time to let the failure process
>> done before detaching.
>>
>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   app/test-pmd/testpmd.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 9c0edca..9c673cf 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>> *device_name, enum rte_dev_event_type type,
>>   				device_name);
>>   			return;
>>   		}
>> -		rmv_event_callback((void *)(intptr_t)port_id);
>> +		/*
>> +		 * Before detach device, the hot-unplug failure process in
>> +		 * user space and kernel space both need to be finished,
>> +		 * such as eal interrupt callback need to be inactive before
>> +		 * the callback be unregistered when device is being cleaned.
>> +		 * So finished interrupt callback soon here and give time to
>> +		 * let the work done before detaching.
>> +		 */
>> +		if (rte_eal_alarm_set(100000,
>> +				rmv_event_callback, (void
>> *)(intptr_t)port_id))
>> +			RTE_LOG(ERR, EAL,
>> +				"Could not set up deferred device
>
> It looks me strange to use callback and alarm to remove a device:
> Why not to use callback and that is it?
>
> I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
> So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback itself) will be detached by the EAL.


Firstly, at the currently framework and solution, such as callback for 
RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device removal, 
we tend to give the control of detaching device to the application, and 
the whole process is located on the user's callback. Notify app to 
detach device by callback but make it deferred, i think it is fine.

Secondly, the vfio is different with igb_uio for hot-unplug, it 
register/unregister hotplug interrupt callback for each device, so need 
to make  the callback done before unregister the callback.

So I think it should be considerate as an workaround here, before we 
find a better way.


>
>> removal\n");
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-07  7:30     ` Jeff Guo
@ 2018-11-07 11:05       ` Ananyev, Konstantin
  2018-11-08  7:28         ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Ananyev, Konstantin @ 2018-11-07 11:05 UTC (permalink / raw)
  To: Guo, Jia, Matan Azrad, Burakov, Anatoly, Thomas Monjalon,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, November 7, 2018 7:30 AM
> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Iremonger, Bernard <bernard.iremonger@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
> 
> matan
> 
> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> >   From: Jeff Guo <jia.guo@intel.com>
> >> Before detach device when device be hot-unplugged, the failure process in
> >> user space and kernel space both need to be finished, such as eal interrupt
> >> callback need to be inactive before the callback be unregistered when device
> >> is being cleaned. This patch add rte alarm for device detaching, with that it
> >> could finish interrupt callback soon and give time to let the failure process
> >> done before detaching.
> >>
> >> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >>   app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>   1 file changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> 9c0edca..9c673cf 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >> *device_name, enum rte_dev_event_type type,
> >>   				device_name);
> >>   			return;
> >>   		}
> >> -		rmv_event_callback((void *)(intptr_t)port_id);
> >> +		/*
> >> +		 * Before detach device, the hot-unplug failure process in
> >> +		 * user space and kernel space both need to be finished,
> >> +		 * such as eal interrupt callback need to be inactive before
> >> +		 * the callback be unregistered when device is being cleaned.
> >> +		 * So finished interrupt callback soon here and give time to
> >> +		 * let the work done before detaching.
> >> +		 */
> >> +		if (rte_eal_alarm_set(100000,
> >> +				rmv_event_callback, (void
> >> *)(intptr_t)port_id))
> >> +			RTE_LOG(ERR, EAL,
> >> +				"Could not set up deferred device
> >
> > It looks me strange to use callback and alarm to remove a device:
> > Why not to use callback and that is it?
> >
> > I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
> > So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback
> itself) will be detached by the EAL.
> 
> 
> Firstly, at the currently framework and solution, such as callback for
> RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device removal,
> we tend to give the control of detaching device to the application, and
> the whole process is located on the user's callback. Notify app to
> detach device by callback but make it deferred, i think it is fine.

It is also unclear to me my we need an alarm here.
First (probably wrong) impression we just try to hide some synchronization
Problem by introducing delay.
Konstantin

> 
> Secondly, the vfio is different with igb_uio for hot-unplug, it
> register/unregister hotplug interrupt callback for each device, so need
> to make  the callback done before unregister the callback.
> 
> So I think it should be considerate as an workaround here, before we
> find a better way.
> 
> 
> >
> >> removal\n");
> >>   		break;
> >>   	case RTE_DEV_EVENT_ADD:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug
  2018-11-07  5:49     ` Jeff Guo
@ 2018-11-08  7:08       ` Matan Azrad
  0 siblings, 0 replies; 28+ messages in thread
From: Matan Azrad @ 2018-11-08  7:08 UTC (permalink / raw)
  To: Jeff Guo, konstantin.ananyev, anatoly.burakov, Thomas Monjalon,
	bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he



From: Jeff Guo 
> hi matan
> 
> On 11/6/2018 2:22 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Can you detail more in the commit log that we can understand the
> deadlock scenario. And how does this commit fix it?
> 
> 
> Before i add more detail in the commit log of next version, i would explain to
> you here at first here.
> 
> When the device be hot-unplugged,  the hot-unplug handler will be invoked
> and the device will be detached, at this time if the interrupt still not disable
> soon and the second
> 
> remove event come again(kernel will sent pci remove event after sent uio
> remove event) , the bus->find_device will return null and return, at this place
> lack of an unlock.
> 
> Without this unlock, it will block the next remove or add event detection. So
> it definitely need an unlock here to avoid dead lock.
> 

Makes sense.

Thanks

> 
> >> -----Original Message-----
> >> From: Jeff Guo <jia.guo@intel.com>
> >> Sent: Tuesday, November 6, 2018 8:07 AM
> >> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> >> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> >> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> >> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> >> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> >> shaopeng.he@intel.com
> >> Subject: [PATCH 1/3] eal: fix lock issue for hot-unplug
> >>
> >> This patch will add missing unlock for hot-unplug handler, without this
> patch
> >> potential dead lock will occur when device be hotplug-in after device be
> hot-
> >> unplugged.
> >>
> >> Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >>   lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
> >>   1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> index d589c69..2830c86 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> @@ -258,7 +258,7 @@ dev_uev_handler(__rte_unused void *param)
> >>   			if (bus == NULL) {
> >>   				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> >>   					busname);
> >> -				return;
> >> +				goto failure_handle_err;
> >>   			}
> >>
> >>   			dev = bus->find_device(NULL, cmp_dev_name, @@
> >> -266,19 +266,23 @@ dev_uev_handler(__rte_unused void *param)
> >>   			if (dev == NULL) {
> >>   				RTE_LOG(ERR, EAL, "Cannot find device (%s)
> >> on "
> >>   					"bus (%s)\n", uevent.devname,
> >> busname);
> >> -				return;
> >> +				goto failure_handle_err;
> >>   			}
> >>
> >>   			ret = bus->hot_unplug_handler(dev);
> >> -			rte_spinlock_unlock(&failure_handle_lock);
> >>   			if (ret) {
> >>   				RTE_LOG(ERR, EAL, "Can not handle hot-
> >> unplug "
> >>   					"for device (%s)\n", dev->name);
> >> -				return;
> >>   			}
> >> +			rte_spinlock_unlock(&failure_handle_lock);
> >>   		}
> >>   		rte_dev_event_callback_process(uevent.devname,
> >> uevent.type);
> >>   	}
> >> +
> >> +	return;
> >> +
> >> +failure_handle_err:
> >> +	rte_spinlock_unlock(&failure_handle_lock);
> >>   }
> >>
> >>   int __rte_experimental
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-07  6:15     ` Jeff Guo
@ 2018-11-08  7:20       ` Matan Azrad
  2018-11-08  8:30         ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-08  7:20 UTC (permalink / raw)
  To: Jeff Guo, konstantin.ananyev, anatoly.burakov, Thomas Monjalon,
	bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he



From: Jeff Guo
> On 11/6/2018 2:23 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Can you detail more in the commit log that we can understand the
> synchronization problematic scenario. And how does this commit fix it?
> >
> 
> Please check my reply in the 1/3 mail. And explain more here is that, when
> device be hot-unplugged in vfio, the req notifier will invoked, then user
> space could release device resource in user space side,
> 
> then vfio check that the device be released out from the device group, it will
> take the device control again and trigger the device kernel release
> processing, at the mean time it will sent remove uevent to
> 
> user space. Here although the req handler seems will always process before
> uevent handler, but even for fast path and slow path protection of device
> accessing when device is removing , it should also be need.
> 
> what do you think about that?

Just don't understanf hoe the fast\control-path can access to pci_vfio_req_handler...

> 
> 
> >> -----Original Message-----
> >> From: Jeff Guo <jia.guo@intel.com>
> >> Sent: Tuesday, November 6, 2018 8:07 AM
> >> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
> >> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
> >> jingjing.wu@intel.com; wenzhuo.lu@intel.com
> >> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
> >> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
> >> shaopeng.he@intel.com
> >> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
> >>
> >> This patch add hot-unplug handler lock and unlock in device request
> >> handler when process bus and device resource, in order to avoid the
> >> synchronization issue when device be hot-unplugged.
> >>
> >> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >>   drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
> >>   1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/bus/pci/linux/pci_vfio.c
> >> b/drivers/bus/pci/linux/pci_vfio.c
> >> index 305cc06..d2c8410 100644
> >> --- a/drivers/bus/pci/linux/pci_vfio.c
> >> +++ b/drivers/bus/pci/linux/pci_vfio.c
> >> @@ -19,6 +19,7 @@
> >>   #include <rte_vfio.h>
> >>   #include <rte_eal.h>
> >>   #include <rte_bus.h>
> >> +#include <rte_spinlock.h>
> >>
> >>   #include "eal_filesystem.h"
> >>
> >> @@ -35,6 +36,14 @@
> >>    * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
> >>    */
> >>
> >> +/*
> >> + * spinlock for device hot-unplug failure handling. If it try to
> >> +access bus or
> >> + * device, such as handle sigbus on bus or handle memory failure for
> >> +device
> >> + * just need to use this lock. It could protect the bus and the
> >> +device to avoid
> >> + * race condition.
> >> + */
> >> +static rte_spinlock_t failure_handle_lock =
> >> +RTE_SPINLOCK_INITIALIZER;
> >> +
> >>   #ifdef VFIO_PRESENT
> >>
> >>   #ifndef PAGE_SIZE
> >> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
> >>   	int ret;
> >>   	struct rte_device *device = (struct rte_device *)param;
> >>
> >> +	rte_spinlock_lock(&failure_handle_lock);
> >>   	bus = rte_bus_find_by_device(device);
> >>   	if (bus == NULL) {
> >>   		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
> >>   			device->name);
> >> -		return;
> >> +		goto handle_end;
> >>   	}
> >>
> >>   	/*
> >> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
> >>   		RTE_LOG(ERR, EAL,
> >>   			"Can not handle hot-unplug for device (%s)\n",
> >>   			device->name);
> >> +handle_end:
> >> +	rte_spinlock_unlock(&failure_handle_lock);
> >>   }
> >>
> >>   /* enable notifier (only enable req now) */
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-07 11:05       ` Ananyev, Konstantin
@ 2018-11-08  7:28         ` Matan Azrad
  2018-11-08  8:49           ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-08  7:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, Guo, Jia, Burakov, Anatoly, Thomas Monjalon,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng



From: Ananyev, Konstantin
> > -----Original Message-----
> > From: Guo, Jia
> > Sent: Wednesday, November 7, 2018 7:30 AM
> > To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>;
> > Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin
> > <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
> > Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> > hot-unplug
> >
> > matan
> >
> > On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > > Hi Jeff
> > >
> > >   From: Jeff Guo <jia.guo@intel.com>
> > >> Before detach device when device be hot-unplugged, the failure
> > >> process in user space and kernel space both need to be finished,
> > >> such as eal interrupt callback need to be inactive before the
> > >> callback be unregistered when device is being cleaned. This patch
> > >> add rte alarm for device detaching, with that it could finish
> > >> interrupt callback soon and give time to let the failure process done
> before detaching.
> > >>
> > >> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> > >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > >> ---
> > >>   app/test-pmd/testpmd.c | 13 ++++++++++++-
> > >>   1 file changed, 12 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > >> 9c0edca..9c673cf 100644
> > >> --- a/app/test-pmd/testpmd.c
> > >> +++ b/app/test-pmd/testpmd.c
> > >> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> > >> *device_name, enum rte_dev_event_type type,
> > >>   				device_name);
> > >>   			return;
> > >>   		}
> > >> -		rmv_event_callback((void *)(intptr_t)port_id);
> > >> +		/*
> > >> +		 * Before detach device, the hot-unplug failure process in
> > >> +		 * user space and kernel space both need to be finished,
> > >> +		 * such as eal interrupt callback need to be inactive before
> > >> +		 * the callback be unregistered when device is being cleaned.
> > >> +		 * So finished interrupt callback soon here and give time to
> > >> +		 * let the work done before detaching.
> > >> +		 */
> > >> +		if (rte_eal_alarm_set(100000,
> > >> +				rmv_event_callback, (void
> > >> *)(intptr_t)port_id))
> > >> +			RTE_LOG(ERR, EAL,
> > >> +				"Could not set up deferred device
> > >
> > > It looks me strange to use callback and alarm to remove a device:
> > > Why not to use callback and that is it?
> > >
> > > I think that it's better to let the EAL to detach the device after all the
> callbacks were done and not to do it by the user callback.
> > > So the application\callback owners just need to clean its resources
> > > with understanding that after the callback the device(and the
> > > callback
> > itself) will be detached by the EAL.
> >
> >
> > Firstly, at the currently framework and solution, such as callback for
> > RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> removal,
> > we tend to give the control of detaching device to the application,
> > and the whole process is located on the user's callback. Notify app to
> > detach device by callback but make it deferred, i think it is fine.

But the device must be detached in remove event, why not to do it in EAL?

> 
> It is also unclear to me my we need an alarm here.
> First (probably wrong) impression we just try to hide some synchronization
> Problem by introducing delay.

Looks like, the issue is that the callback function memory will be removed from the function itself (by the detach call), no?

> Konstantin
> 
> >
> > Secondly, the vfio is different with igb_uio for hot-unplug, it
> > register/unregister hotplug interrupt callback for each device, so
> > need to make  the callback done before unregister the callback.
> >
> > So I think it should be considerate as an workaround here, before we
> > find a better way.
> >
> >
> > >
> > >> removal\n");
> > >>   		break;
> > >>   	case RTE_DEV_EVENT_ADD:
> > >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> > >> --
> > >> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-08  7:20       ` Matan Azrad
@ 2018-11-08  8:30         ` Jeff Guo
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-08  8:30 UTC (permalink / raw)
  To: Matan Azrad, konstantin.ananyev, anatoly.burakov,
	Thomas Monjalon, bernard.iremonger, jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, helin.zhang, shaopeng.he


On 11/8/2018 3:20 PM, Matan Azrad wrote:
>
> From: Jeff Guo
>> On 11/6/2018 2:23 PM, Matan Azrad wrote:
>>> Hi Jeff
>>>
>>> Can you detail more in the commit log that we can understand the
>> synchronization problematic scenario. And how does this commit fix it?
>> Please check my reply in the 1/3 mail. And explain more here is that, when
>> device be hot-unplugged in vfio, the req notifier will invoked, then user
>> space could release device resource in user space side,
>>
>> then vfio check that the device be released out from the device group, it will
>> take the device control again and trigger the device kernel release
>> processing, at the mean time it will sent remove uevent to
>>
>> user space. Here although the req handler seems will always process before
>> uevent handler, but even for fast path and slow path protection of device
>> accessing when device is removing , it should also be need.
>>
>> what do you think about that?
> Just don't understanf hoe the fast\control-path can access to pci_vfio_req_handler...


The pci_vfio_req_handler is register in eal interrupt thread when pci 
probe driver, so it should be access in the control path.

Since the sigbus handler will be enable when enable hotplug , if origin 
sigbus occur the sigbus handler be invoked,  in the handler it will 
process the bus and device.  So i think protection of the bus and device 
so that it will not cause race condition that should be need.


>>
>>>> -----Original Message-----
>>>> From: Jeff Guo <jia.guo@intel.com>
>>>> Sent: Tuesday, November 6, 2018 8:07 AM
>>>> To: konstantin.ananyev@intel.com; anatoly.burakov@intel.com; Thomas
>>>> Monjalon <thomas@monjalon.net>; bernard.iremonger@intel.com;
>>>> jingjing.wu@intel.com; wenzhuo.lu@intel.com
>>>> Cc: ferruh.yigit@intel.com; dev@dpdk.org; jia.guo@intel.com;
>>>> helin.zhang@intel.com; Matan Azrad <matan@mellanox.com>;
>>>> shaopeng.he@intel.com
>>>> Subject: [PATCH 2/3] vfio: fix to add handler lock for hot-unplug
>>>>
>>>> This patch add hot-unplug handler lock and unlock in device request
>>>> handler when process bus and device resource, in order to avoid the
>>>> synchronization issue when device be hot-unplugged.
>>>>
>>>> Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>> ---
>>>>    drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
>>>>    1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/bus/pci/linux/pci_vfio.c
>>>> b/drivers/bus/pci/linux/pci_vfio.c
>>>> index 305cc06..d2c8410 100644
>>>> --- a/drivers/bus/pci/linux/pci_vfio.c
>>>> +++ b/drivers/bus/pci/linux/pci_vfio.c
>>>> @@ -19,6 +19,7 @@
>>>>    #include <rte_vfio.h>
>>>>    #include <rte_eal.h>
>>>>    #include <rte_bus.h>
>>>> +#include <rte_spinlock.h>
>>>>
>>>>    #include "eal_filesystem.h"
>>>>
>>>> @@ -35,6 +36,14 @@
>>>>     * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
>>>>     */
>>>>
>>>> +/*
>>>> + * spinlock for device hot-unplug failure handling. If it try to
>>>> +access bus or
>>>> + * device, such as handle sigbus on bus or handle memory failure for
>>>> +device
>>>> + * just need to use this lock. It could protect the bus and the
>>>> +device to avoid
>>>> + * race condition.
>>>> + */
>>>> +static rte_spinlock_t failure_handle_lock =
>>>> +RTE_SPINLOCK_INITIALIZER;
>>>> +
>>>>    #ifdef VFIO_PRESENT
>>>>
>>>>    #ifndef PAGE_SIZE
>>>> @@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
>>>>    	int ret;
>>>>    	struct rte_device *device = (struct rte_device *)param;
>>>>
>>>> +	rte_spinlock_lock(&failure_handle_lock);
>>>>    	bus = rte_bus_find_by_device(device);
>>>>    	if (bus == NULL) {
>>>>    		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
>>>>    			device->name);
>>>> -		return;
>>>> +		goto handle_end;
>>>>    	}
>>>>
>>>>    	/*
>>>> @@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
>>>>    		RTE_LOG(ERR, EAL,
>>>>    			"Can not handle hot-unplug for device (%s)\n",
>>>>    			device->name);
>>>> +handle_end:
>>>> +	rte_spinlock_unlock(&failure_handle_lock);
>>>>    }
>>>>
>>>>    /* enable notifier (only enable req now) */
>>>> --
>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-08  7:28         ` Matan Azrad
@ 2018-11-08  8:49           ` Jeff Guo
  2018-11-08  9:35             ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-08  8:49 UTC (permalink / raw)
  To: Matan Azrad, Ananyev, Konstantin, Burakov, Anatoly,
	Thomas Monjalon, Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng


On 11/8/2018 3:28 PM, Matan Azrad wrote:
>
> From: Ananyev, Konstantin
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>> <thomas@monjalon.net>;
>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin
>>> <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>> hot-unplug
>>>
>>> matan
>>>
>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>> Hi Jeff
>>>>
>>>>    From: Jeff Guo <jia.guo@intel.com>
>>>>> Before detach device when device be hot-unplugged, the failure
>>>>> process in user space and kernel space both need to be finished,
>>>>> such as eal interrupt callback need to be inactive before the
>>>>> callback be unregistered when device is being cleaned. This patch
>>>>> add rte alarm for device detaching, with that it could finish
>>>>> interrupt callback soon and give time to let the failure process done
>> before detaching.
>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>> ---
>>>>>    app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>> 9c0edca..9c673cf 100644
>>>>> --- a/app/test-pmd/testpmd.c
>>>>> +++ b/app/test-pmd/testpmd.c
>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>> *device_name, enum rte_dev_event_type type,
>>>>>    				device_name);
>>>>>    			return;
>>>>>    		}
>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>> +		/*
>>>>> +		 * Before detach device, the hot-unplug failure process in
>>>>> +		 * user space and kernel space both need to be finished,
>>>>> +		 * such as eal interrupt callback need to be inactive before
>>>>> +		 * the callback be unregistered when device is being cleaned.
>>>>> +		 * So finished interrupt callback soon here and give time to
>>>>> +		 * let the work done before detaching.
>>>>> +		 */
>>>>> +		if (rte_eal_alarm_set(100000,
>>>>> +				rmv_event_callback, (void
>>>>> *)(intptr_t)port_id))
>>>>> +			RTE_LOG(ERR, EAL,
>>>>> +				"Could not set up deferred device
>>>> It looks me strange to use callback and alarm to remove a device:
>>>> Why not to use callback and that is it?
>>>>
>>>> I think that it's better to let the EAL to detach the device after all the
>> callbacks were done and not to do it by the user callback.
>>>> So the application\callback owners just need to clean its resources
>>>> with understanding that after the callback the device(and the
>>>> callback
>>> itself) will be detached by the EAL.
>>>
>>>
>>> Firstly, at the currently framework and solution, such as callback for
>>> RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>> removal,
>>> we tend to give the control of detaching device to the application,
>>> and the whole process is located on the user's callback. Notify app to
>>> detach device by callback but make it deferred, i think it is fine.
> But the device must be detached in remove event, why not to do it in EAL?


I think it because of before detached the device, application should be 
stop the forwarding at first, then stop the device, then close

the device, finally call eal unplug API to detach device. If eal 
directly detach device at the first step, there will be mountain user 
space error need to handle, so that is one reason why need to provider 
the remove notification to app, and let app to process it.


>> It is also unclear to me my we need an alarm here.
>> First (probably wrong) impression we just try to hide some synchronization
>> Problem by introducing delay.
> Looks like, the issue is that the callback function memory will be removed from the function itself (by the detach call), no?


Answer here for both Konstantin and Matan.

Yes, i think matan is right, the interrupt callback will be destroy in 
the app callback itself, the sequence is that as below

hot-unplug interrupt -> interrupt callback -> app callback(return to 
finish interrupt callback, delay detaching) -> detach device(unregister 
interrupt callback)


>> Konstantin
>>
>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>> register/unregister hotplug interrupt callback for each device, so
>>> need to make  the callback done before unregister the callback.
>>>
>>> So I think it should be considerate as an workaround here, before we
>>> find a better way.
>>>
>>>
>>>>> removal\n");
>>>>>    		break;
>>>>>    	case RTE_DEV_EVENT_ADD:
>>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>>>>> --
>>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-08  8:49           ` Jeff Guo
@ 2018-11-08  9:35             ` Matan Azrad
  2018-11-09  3:55               ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-08  9:35 UTC (permalink / raw)
  To: Jeff Guo, Ananyev, Konstantin, Burakov, Anatoly, Thomas Monjalon,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng



From: Jeff Guo 
> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >
> > From: Ananyev, Konstantin
> >>> -----Original Message-----
> >>> From: Guo, Jia
> >>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> >>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
> >>> <anatoly.burakov@intel.com>; Thomas Monjalon
> >> <thomas@monjalon.net>;
> >>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> >>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
> >>> Helin <helin.zhang@intel.com>; He, Shaopeng
> <shaopeng.he@intel.com>
> >>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>> hot-unplug
> >>>
> >>> matan
> >>>
> >>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>> Hi Jeff
> >>>>
> >>>>    From: Jeff Guo <jia.guo@intel.com>
> >>>>> Before detach device when device be hot-unplugged, the failure
> >>>>> process in user space and kernel space both need to be finished,
> >>>>> such as eal interrupt callback need to be inactive before the
> >>>>> callback be unregistered when device is being cleaned. This patch
> >>>>> add rte alarm for device detaching, with that it could finish
> >>>>> interrupt callback soon and give time to let the failure process
> >>>>> done
> >> before detaching.
> >>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >>>>> ---
> >>>>>    app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>>> 9c0edca..9c673cf 100644
> >>>>> --- a/app/test-pmd/testpmd.c
> >>>>> +++ b/app/test-pmd/testpmd.c
> >>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>> *device_name, enum rte_dev_event_type type,
> >>>>>    				device_name);
> >>>>>    			return;
> >>>>>    		}
> >>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>> +		/*
> >>>>> +		 * Before detach device, the hot-unplug failure
> process in
> >>>>> +		 * user space and kernel space both need to be
> finished,
> >>>>> +		 * such as eal interrupt callback need to be inactive
> before
> >>>>> +		 * the callback be unregistered when device is being
> cleaned.
> >>>>> +		 * So finished interrupt callback soon here and give
> time to
> >>>>> +		 * let the work done before detaching.
> >>>>> +		 */
> >>>>> +		if (rte_eal_alarm_set(100000,
> >>>>> +				rmv_event_callback, (void
> >>>>> *)(intptr_t)port_id))
> >>>>> +			RTE_LOG(ERR, EAL,
> >>>>> +				"Could not set up deferred device
> >>>> It looks me strange to use callback and alarm to remove a device:
> >>>> Why not to use callback and that is it?
> >>>>
> >>>> I think that it's better to let the EAL to detach the device after
> >>>> all the
> >> callbacks were done and not to do it by the user callback.
> >>>> So the application\callback owners just need to clean its resources
> >>>> with understanding that after the callback the device(and the
> >>>> callback
> >>> itself) will be detached by the EAL.
> >>>
> >>>
> >>> Firstly, at the currently framework and solution, such as callback
> >>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> >> removal,
> >>> we tend to give the control of detaching device to the application,
> >>> and the whole process is located on the user's callback. Notify app
> >>> to detach device by callback but make it deferred, i think it is fine.
> > But the device must be detached in remove event, why not to do it in EAL?
> 
> 
> I think it because of before detached the device, application should be stop
> the forwarding at first, then stop the device, then close
> 
> the device, finally call eal unplug API to detach device. If eal directly detach
> device at the first step, there will be mountain user space error need to
> handle, so that is one reason why need to provider the remove notification
> to app, and let app to process it.


This is why the EAL need to detach the device only after all the user callbacks were done.

> 
> 
> >> It is also unclear to me my we need an alarm here.
> >> First (probably wrong) impression we just try to hide some
> >> synchronization Problem by introducing delay.
> > Looks like, the issue is that the callback function memory will be removed
> from the function itself (by the detach call), no?
> 
> 
> Answer here for both Konstantin and Matan.
> 
> Yes, i think matan is right, the interrupt callback will be destroy in the app
> callback itself, the sequence is that as below
> 
> hot-unplug interrupt -> interrupt callback -> app callback(return to finish
> interrupt callback, delay detaching) -> detach device(unregister interrupt
> callback)
> 
> 
> >> Konstantin
> >>
> >>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>> register/unregister hotplug interrupt callback for each device, so
> >>> need to make  the callback done before unregister the callback.
> >>>
> >>> So I think it should be considerate as an workaround here, before we
> >>> find a better way.
> >>>
> >>>
> >>>>> removal\n");
> >>>>>    		break;
> >>>>>    	case RTE_DEV_EVENT_ADD:
> >>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been
> added!\n",
> >>>>> --
> >>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-08  9:35             ` Matan Azrad
@ 2018-11-09  3:55               ` Jeff Guo
  2018-11-09  5:24                 ` Matan Azrad
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-09  3:55 UTC (permalink / raw)
  To: Matan Azrad, Ananyev, Konstantin, Burakov, Anatoly,
	Thomas Monjalon, Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng


On 11/8/2018 5:35 PM, Matan Azrad wrote:
>
> From: Jeff Guo
>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>> From: Ananyev, Konstantin
>>>>> -----Original Message-----
>>>>> From: Guo, Jia
>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>>>> <thomas@monjalon.net>;
>>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
>>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
>> <shaopeng.he@intel.com>
>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>>>> hot-unplug
>>>>>
>>>>> matan
>>>>>
>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>> Hi Jeff
>>>>>>
>>>>>>     From: Jeff Guo <jia.guo@intel.com>
>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>> process in user space and kernel space both need to be finished,
>>>>>>> such as eal interrupt callback need to be inactive before the
>>>>>>> callback be unregistered when device is being cleaned. This patch
>>>>>>> add rte alarm for device detaching, with that it could finish
>>>>>>> interrupt callback soon and give time to let the failure process
>>>>>>> done
>>>> before detaching.
>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>> ---
>>>>>>>     app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>>>> 9c0edca..9c673cf 100644
>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>     				device_name);
>>>>>>>     			return;
>>>>>>>     		}
>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>> +		/*
>>>>>>> +		 * Before detach device, the hot-unplug failure
>> process in
>>>>>>> +		 * user space and kernel space both need to be
>> finished,
>>>>>>> +		 * such as eal interrupt callback need to be inactive
>> before
>>>>>>> +		 * the callback be unregistered when device is being
>> cleaned.
>>>>>>> +		 * So finished interrupt callback soon here and give
>> time to
>>>>>>> +		 * let the work done before detaching.
>>>>>>> +		 */
>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>> +				rmv_event_callback, (void
>>>>>>> *)(intptr_t)port_id))
>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>> +				"Could not set up deferred device
>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>> Why not to use callback and that is it?
>>>>>>
>>>>>> I think that it's better to let the EAL to detach the device after
>>>>>> all the
>>>> callbacks were done and not to do it by the user callback.
>>>>>> So the application\callback owners just need to clean its resources
>>>>>> with understanding that after the callback the device(and the
>>>>>> callback
>>>>> itself) will be detached by the EAL.
>>>>>
>>>>>
>>>>> Firstly, at the currently framework and solution, such as callback
>>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>>>> removal,
>>>>> we tend to give the control of detaching device to the application,
>>>>> and the whole process is located on the user's callback. Notify app
>>>>> to detach device by callback but make it deferred, i think it is fine.
>>> But the device must be detached in remove event, why not to do it in EAL?
>>
>> I think it because of before detached the device, application should be stop
>> the forwarding at first, then stop the device, then close
>>
>> the device, finally call eal unplug API to detach device. If eal directly detach
>> device at the first step, there will be mountain user space error need to
>> handle, so that is one reason why need to provider the remove notification
>> to app, and let app to process it.
>
> This is why the EAL need to detach the device only after all the user callbacks were done.


If i correctly got your meaning, you suppose to let eal to mandatory 
detach device but not app, app just need to stop/close port, right?

If so, it will need to modify rmv_event_callback by not invoke the 
detaching func and add some detaching logic to hotplug framework in eal.

It is hardly say better or worse but this new propose need to discuss 
more in public. And it might be better to use another specific patch to 
handler it. What do you or other guys think so?


>>
>>>> It is also unclear to me my we need an alarm here.
>>>> First (probably wrong) impression we just try to hide some
>>>> synchronization Problem by introducing delay.
>>> Looks like, the issue is that the callback function memory will be removed
>> from the function itself (by the detach call), no?
>>
>>
>> Answer here for both Konstantin and Matan.
>>
>> Yes, i think matan is right, the interrupt callback will be destroy in the app
>> callback itself, the sequence is that as below
>>
>> hot-unplug interrupt -> interrupt callback -> app callback(return to finish
>> interrupt callback, delay detaching) -> detach device(unregister interrupt
>> callback)
>>
>>
>>>> Konstantin
>>>>
>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>>>> register/unregister hotplug interrupt callback for each device, so
>>>>> need to make  the callback done before unregister the callback.
>>>>>
>>>>> So I think it should be considerate as an workaround here, before we
>>>>> find a better way.
>>>>>
>>>>>
>>>>>>> removal\n");
>>>>>>>     		break;
>>>>>>>     	case RTE_DEV_EVENT_ADD:
>>>>>>>     		RTE_LOG(ERR, EAL, "The device: %s has been
>> added!\n",
>>>>>>> --
>>>>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-09  3:55               ` Jeff Guo
@ 2018-11-09  5:24                 ` Matan Azrad
  2018-11-09  6:17                   ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Matan Azrad @ 2018-11-09  5:24 UTC (permalink / raw)
  To: Jeff Guo, Ananyev, Konstantin, Burakov, Anatoly, Thomas Monjalon,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng



 From: Jeff Guo 
> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> >
> > From: Jeff Guo
> >> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >>> From: Ananyev, Konstantin
> >>>>> -----Original Message-----
> >>>>> From: Guo, Jia
> >>>>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> >>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
> >>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
> >>>> <thomas@monjalon.net>;
> >>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> >>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
> >>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
> >> <shaopeng.he@intel.com>
> >>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>>>> hot-unplug
> >>>>>
> >>>>> matan
> >>>>>
> >>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>>>> Hi Jeff
> >>>>>>
> >>>>>>     From: Jeff Guo <jia.guo@intel.com>
> >>>>>>> Before detach device when device be hot-unplugged, the failure
> >>>>>>> process in user space and kernel space both need to be finished,
> >>>>>>> such as eal interrupt callback need to be inactive before the
> >>>>>>> callback be unregistered when device is being cleaned. This
> >>>>>>> patch add rte alarm for device detaching, with that it could
> >>>>>>> finish interrupt callback soon and give time to let the failure
> >>>>>>> process done
> >>>> before detaching.
> >>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >>>>>>> ---
> >>>>>>>     app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >>>>>>> index 9c0edca..9c673cf 100644
> >>>>>>> --- a/app/test-pmd/testpmd.c
> >>>>>>> +++ b/app/test-pmd/testpmd.c
> >>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>>>> *device_name, enum rte_dev_event_type type,
> >>>>>>>     				device_name);
> >>>>>>>     			return;
> >>>>>>>     		}
> >>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>>>> +		/*
> >>>>>>> +		 * Before detach device, the hot-unplug failure
> >> process in
> >>>>>>> +		 * user space and kernel space both need to be
> >> finished,
> >>>>>>> +		 * such as eal interrupt callback need to be inactive
> >> before
> >>>>>>> +		 * the callback be unregistered when device is being
> >> cleaned.
> >>>>>>> +		 * So finished interrupt callback soon here and give
> >> time to
> >>>>>>> +		 * let the work done before detaching.
> >>>>>>> +		 */
> >>>>>>> +		if (rte_eal_alarm_set(100000,
> >>>>>>> +				rmv_event_callback, (void
> >>>>>>> *)(intptr_t)port_id))
> >>>>>>> +			RTE_LOG(ERR, EAL,
> >>>>>>> +				"Could not set up deferred device
> >>>>>> It looks me strange to use callback and alarm to remove a device:
> >>>>>> Why not to use callback and that is it?
> >>>>>>
> >>>>>> I think that it's better to let the EAL to detach the device
> >>>>>> after all the
> >>>> callbacks were done and not to do it by the user callback.
> >>>>>> So the application\callback owners just need to clean its
> >>>>>> resources with understanding that after the callback the
> >>>>>> device(and the callback
> >>>>> itself) will be detached by the EAL.
> >>>>>
> >>>>>
> >>>>> Firstly, at the currently framework and solution, such as callback
> >>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> >>>> removal,
> >>>>> we tend to give the control of detaching device to the
> >>>>> application, and the whole process is located on the user's
> >>>>> callback. Notify app to detach device by callback but make it deferred,
> i think it is fine.
> >>> But the device must be detached in remove event, why not to do it in
> EAL?
> >>
> >> I think it because of before detached the device, application should
> >> be stop the forwarding at first, then stop the device, then close
> >>
> >> the device, finally call eal unplug API to detach device. If eal
> >> directly detach device at the first step, there will be mountain user
> >> space error need to handle, so that is one reason why need to
> >> provider the remove notification to app, and let app to process it.
> >
> > This is why the EAL need to detach the device only after all the user
> callbacks were done.
> 
> 
> If i correctly got your meaning, you suppose to let eal to mandatory detach
> device but not app, app just need to stop/close port, right?

Yes, the app should stop,close,clean its own resources of the removed device,
Then, EAL to detach the device.

> 
> If so, it will need to modify rmv_event_callback by not invoke the detaching
> func and add some detaching logic to hotplug framework in eal.
> 
rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV event), so you need to use another one\ add parameter to it.
And yes, you need to detach the device from EAL, should be simple.

> It is hardly say better or worse but this new propose need to discuss more in
> public. And it might be better to use another specific patch to handler it.
> What do you or other guys think so?

Since you are fixing issue here, it can be done by a fix series.

Other feedbacks are welcome all the time 😊

> 
> 
> >>
> >>>> It is also unclear to me my we need an alarm here.
> >>>> First (probably wrong) impression we just try to hide some
> >>>> synchronization Problem by introducing delay.
> >>> Looks like, the issue is that the callback function memory will be
> >>> removed
> >> from the function itself (by the detach call), no?
> >>
> >>
> >> Answer here for both Konstantin and Matan.
> >>
> >> Yes, i think matan is right, the interrupt callback will be destroy
> >> in the app callback itself, the sequence is that as below
> >>
> >> hot-unplug interrupt -> interrupt callback -> app callback(return to
> >> finish interrupt callback, delay detaching) -> detach
> >> device(unregister interrupt
> >> callback)
> >>
> >>
> >>>> Konstantin
> >>>>
> >>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>>>> register/unregister hotplug interrupt callback for each device, so
> >>>>> need to make  the callback done before unregister the callback.
> >>>>>
> >>>>> So I think it should be considerate as an workaround here, before
> >>>>> we find a better way.
> >>>>>
> >>>>>
> >>>>>>> removal\n");
> >>>>>>>     		break;
> >>>>>>>     	case RTE_DEV_EVENT_ADD:
> >>>>>>>     		RTE_LOG(ERR, EAL, "The device: %s has been
> >> added!\n",
> >>>>>>> --
> >>>>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-09  5:24                 ` Matan Azrad
@ 2018-11-09  6:17                   ` Jeff Guo
       [not found]                     ` <AM0PR0502MB401938411A7E2BA9E76576A2D2C00@AM0PR0502MB4019.eurprd05.prod.outlook.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-09  6:17 UTC (permalink / raw)
  To: Matan Azrad, Ananyev, Konstantin, Burakov, Anatoly,
	Thomas Monjalon, Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo
  Cc: Yigit, Ferruh, dev, Zhang, Helin, He, Shaopeng


On 11/9/2018 1:24 PM, Matan Azrad wrote:
>
>   From: Jeff Guo
>> On 11/8/2018 5:35 PM, Matan Azrad wrote:
>>> From: Jeff Guo
>>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>>>> From: Ananyev, Konstantin
>>>>>>> -----Original Message-----
>>>>>>> From: Guo, Jia
>>>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>>>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>>>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>>>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>>>>>> <thomas@monjalon.net>;
>>>>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>>>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>>>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
>>>>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
>>>> <shaopeng.he@intel.com>
>>>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>>>>>> hot-unplug
>>>>>>>
>>>>>>> matan
>>>>>>>
>>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>>>> Hi Jeff
>>>>>>>>
>>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
>>>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>>>> process in user space and kernel space both need to be finished,
>>>>>>>>> such as eal interrupt callback need to be inactive before the
>>>>>>>>> callback be unregistered when device is being cleaned. This
>>>>>>>>> patch add rte alarm for device detaching, with that it could
>>>>>>>>> finish interrupt callback soon and give time to let the failure
>>>>>>>>> process done
>>>>>> before detaching.
>>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>>>> ---
>>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>>>>>>> index 9c0edca..9c673cf 100644
>>>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>>>      				device_name);
>>>>>>>>>      			return;
>>>>>>>>>      		}
>>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>>>> +		/*
>>>>>>>>> +		 * Before detach device, the hot-unplug failure
>>>> process in
>>>>>>>>> +		 * user space and kernel space both need to be
>>>> finished,
>>>>>>>>> +		 * such as eal interrupt callback need to be inactive
>>>> before
>>>>>>>>> +		 * the callback be unregistered when device is being
>>>> cleaned.
>>>>>>>>> +		 * So finished interrupt callback soon here and give
>>>> time to
>>>>>>>>> +		 * let the work done before detaching.
>>>>>>>>> +		 */
>>>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>>>> +				rmv_event_callback, (void
>>>>>>>>> *)(intptr_t)port_id))
>>>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>>>> +				"Could not set up deferred device
>>>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>>>> Why not to use callback and that is it?
>>>>>>>>
>>>>>>>> I think that it's better to let the EAL to detach the device
>>>>>>>> after all the
>>>>>> callbacks were done and not to do it by the user callback.
>>>>>>>> So the application\callback owners just need to clean its
>>>>>>>> resources with understanding that after the callback the
>>>>>>>> device(and the callback
>>>>>>> itself) will be detached by the EAL.
>>>>>>>
>>>>>>>
>>>>>>> Firstly, at the currently framework and solution, such as callback
>>>>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>>>>>> removal,
>>>>>>> we tend to give the control of detaching device to the
>>>>>>> application, and the whole process is located on the user's
>>>>>>> callback. Notify app to detach device by callback but make it deferred,
>> i think it is fine.
>>>>> But the device must be detached in remove event, why not to do it in
>> EAL?
>>>> I think it because of before detached the device, application should
>>>> be stop the forwarding at first, then stop the device, then close
>>>>
>>>> the device, finally call eal unplug API to detach device. If eal
>>>> directly detach device at the first step, there will be mountain user
>>>> space error need to handle, so that is one reason why need to
>>>> provider the remove notification to app, and let app to process it.
>>> This is why the EAL need to detach the device only after all the user
>> callbacks were done.
>>
>>
>> If i correctly got your meaning, you suppose to let eal to mandatory detach
>> device but not app, app just need to stop/close port, right?
> Yes, the app should stop,close,clean its own resources of the removed device,
> Then, EAL to detach the device.
>
>> If so, it will need to modify rmv_event_callback by not invoke the detaching
>> func and add some detaching logic to hotplug framework in eal.
>>
> rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV event), so you need to use another one\ add parameter to it.
> And yes, you need to detach the device from EAL, should be simple.


I think rmv_event_callback is original use for other hotplug event 
(ETHDEV RMV event), but it still use the common hotplug mechanism(app 
callback and app detach),

so i think it will still need to face this callback issue and you could 
check that eth_event_callback also use the rte alarm to detach device.

so my suggestion is that, you maybe propose a good idea but let we keep 
on current mechanism until we come to a final good solution agreement, 
before that, just let

it functional.


>> It is hardly say better or worse but this new propose need to discuss more in
>> public. And it might be better to use another specific patch to handler it.
>> What do you or other guys think so?
> Since you are fixing issue here, it can be done by a fix series.
>
> Other feedbacks are welcome all the time 😊
>
>>
>>>>>> It is also unclear to me my we need an alarm here.
>>>>>> First (probably wrong) impression we just try to hide some
>>>>>> synchronization Problem by introducing delay.
>>>>> Looks like, the issue is that the callback function memory will be
>>>>> removed
>>>> from the function itself (by the detach call), no?
>>>>
>>>>
>>>> Answer here for both Konstantin and Matan.
>>>>
>>>> Yes, i think matan is right, the interrupt callback will be destroy
>>>> in the app callback itself, the sequence is that as below
>>>>
>>>> hot-unplug interrupt -> interrupt callback -> app callback(return to
>>>> finish interrupt callback, delay detaching) -> detach
>>>> device(unregister interrupt
>>>> callback)
>>>>
>>>>
>>>>>> Konstantin
>>>>>>
>>>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>>>>>> register/unregister hotplug interrupt callback for each device, so
>>>>>>> need to make  the callback done before unregister the callback.
>>>>>>>
>>>>>>> So I think it should be considerate as an workaround here, before
>>>>>>> we find a better way.
>>>>>>>
>>>>>>>
>>>>>>>>> removal\n");
>>>>>>>>>      		break;
>>>>>>>>>      	case RTE_DEV_EVENT_ADD:
>>>>>>>>>      		RTE_LOG(ERR, EAL, "The device: %s has been
>>>> added!\n",
>>>>>>>>> --
>>>>>>>>> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
       [not found]                     ` <AM0PR0502MB401938411A7E2BA9E76576A2D2C00@AM0PR0502MB4019.eurprd05.prod.outlook.com>
@ 2018-11-12  1:35                       ` Thomas Monjalon
  2018-11-14  9:32                         ` Jeff Guo
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Monjalon @ 2018-11-12  1:35 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, Matan Azrad, Ananyev, Konstantin, Burakov, Anatoly,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo, Yigit, Ferruh,
	Zhang, Helin, He, Shaopeng

11/11/2018 08:31, Matan Azrad:
> From: Jeff Guo 
> > On 11/9/2018 1:24 PM, Matan Azrad wrote:
> > >   From: Jeff Guo
> > >> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> > >>> From: Jeff Guo
> > >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> > >>>>> From: Ananyev, Konstantin
> > >>>>>> From: Guo, Jia
> > >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > >>>>>>>> Hi Jeff
> > >>>>>>>>
> > >>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> Before detach device when device be hot-unplugged, the failure
> > >>>>>>>>> process in user space and kernel space both need to be
> > >>>>>>>>> finished, such as eal interrupt callback need to be inactive
> > >>>>>>>>> before the callback be unregistered when device is being
> > >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
> > >>>>>>>>> that it could finish interrupt callback soon and give time to
> > >>>>>>>>> let the failure process done
> > >>>>>> before detaching.
> > >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
> > >>>>>>>>> handler")
> > >>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> ---
> > >>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
> > >>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > >>>>>>>>> index 9c0edca..9c673cf 100644
> > >>>>>>>>> --- a/app/test-pmd/testpmd.c
> > >>>>>>>>> +++ b/app/test-pmd/testpmd.c
> > >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> > >>>>>>>>> *device_name, enum rte_dev_event_type type,
> > >>>>>>>>>      				device_name);
> > >>>>>>>>>      			return;
> > >>>>>>>>>      		}
> > >>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> > >>>>>>>>> +		/*
> > >>>>>>>>> +		 * Before detach device, the hot-unplug failure
> > >>>> process in
> > >>>>>>>>> +		 * user space and kernel space both need to be
> > >>>> finished,
> > >>>>>>>>> +		 * such as eal interrupt callback need to be inactive
> > >>>> before
> > >>>>>>>>> +		 * the callback be unregistered when device is being
> > >>>> cleaned.
> > >>>>>>>>> +		 * So finished interrupt callback soon here and give
> > >>>> time to
> > >>>>>>>>> +		 * let the work done before detaching.
> > >>>>>>>>> +		 */
> > >>>>>>>>> +		if (rte_eal_alarm_set(100000,
> > >>>>>>>>> +				rmv_event_callback, (void
> > >>>>>>>>> *)(intptr_t)port_id))
> > >>>>>>>>> +			RTE_LOG(ERR, EAL,
> > >>>>>>>>> +				"Could not set up deferred device
> > >>>>>>>> It looks me strange to use callback and alarm to remove a device:
> > >>>>>>>> Why not to use callback and that is it?
> > >>>>>>>>
> > >>>>>>>> I think that it's better to let the EAL to detach the device
> > >>>>>>>> after all the
> > >>>>>> callbacks were done and not to do it by the user callback.
> > >>>>>>>> So the application\callback owners just need to clean its
> > >>>>>>>> resources with understanding that after the callback the
> > >>>>>>>> device(and the callback
> > >>>>>>> itself) will be detached by the EAL.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Firstly, at the currently framework and solution, such as
> > >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
> > >>>>>>> deferred device
> > >>>>>> removal,
> > >>>>>>> we tend to give the control of detaching device to the
> > >>>>>>> application, and the whole process is located on the user's
> > >>>>>>> callback. Notify app to detach device by callback but make it
> > >>>>>>> deferred,
> > >> i think it is fine.
> > >>>>> But the device must be detached in remove event, why not to do it
> > >>>>> in
> > >> EAL?
> > >>>> I think it because of before detached the device, application
> > >>>> should be stop the forwarding at first, then stop the device, then
> > >>>> close
> > >>>>
> > >>>> the device, finally call eal unplug API to detach device. If eal
> > >>>> directly detach device at the first step, there will be mountain
> > >>>> user space error need to handle, so that is one reason why need to
> > >>>> provider the remove notification to app, and let app to process it.
> > >>> This is why the EAL need to detach the device only after all the
> > >>> user
> > >> callbacks were done.
> > >>
> > >>
> > >> If i correctly got your meaning, you suppose to let eal to mandatory
> > >> detach device but not app, app just need to stop/close port, right?
> > > Yes, the app should stop,close,clean its own resources of the removed
> > > device, Then, EAL to detach the device.
> > >
> > >> If so, it will need to modify rmv_event_callback by not invoke the
> > >> detaching func and add some detaching logic to hotplug framework in eal.
> > >>
> > > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
> > event), so you need to use another one\ add parameter to it.
> > > And yes, you need to detach the device from EAL, should be simple.
> > 
> > 
> > I think rmv_event_callback is original use for other hotplug event (ETHDEV
> > RMV event), but it still use the common hotplug mechanism(app callback and
> > app detach),
> > 
> > so i think it will still need to face this callback issue and you could check that
> > eth_event_callback also use the rte alarm to detach device.
> 
> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.

Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed.

> But, in the EAL event case this is different as I explained before.
> 
> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable.

Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE.

> > so my suggestion is that, you maybe propose a good idea but let we keep on
> > current mechanism until we come to a final good solution agreement, before
> > that, just let it functional.
> 
> I still suggest to fix the issue by an EAL detaching.

I don't understand the issue, but yes we can call rte_dev_remove in EAL,
after the callback return.
Ideally, the callback should be able to return the decision of cleaning
the device or not.

I suggest several steps (a roadmap for HW unplug):
	- in 18.11, remove call to detach_port_device() from rmv_event_callback()
	- in 18.11, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback
	- in 18.11, remove call to rmv_event_callback() in eth_dev_event_callback()
	- in 18.11, rename eth_dev_event_callback to dev_event_callback
	- in 19.02, convert all PMDs to free port resources on rte_eth_dev_close()
	- in 19.05, automatically call rte_dev_remove() from PMD when closing last port
	- in 19.08, we may try to unbind the kernel driver in rte_dev_remove()

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-12  1:35                       ` Thomas Monjalon
@ 2018-11-14  9:32                         ` Jeff Guo
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-14  9:32 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Matan Azrad, Ananyev, Konstantin, Burakov, Anatoly,
	Iremonger, Bernard, Wu, Jingjing, Lu, Wenzhuo, Yigit, Ferruh,
	Zhang, Helin, He, Shaopeng


On 11/12/2018 9:35 AM, Thomas Monjalon wrote:
> 11/11/2018 08:31, Matan Azrad:
>> From: Jeff Guo
>>> On 11/9/2018 1:24 PM, Matan Azrad wrote:
>>>>    From: Jeff Guo
>>>>> On 11/8/2018 5:35 PM, Matan Azrad wrote:
>>>>>> From: Jeff Guo
>>>>>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>>>>>>> From: Ananyev, Konstantin
>>>>>>>>> From: Guo, Jia
>>>>>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>>>>>>> Hi Jeff
>>>>>>>>>>>
>>>>>>>>>>>       From: Jeff Guo <jia.guo@intel.com>
>>>>>>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>>>>>>> process in user space and kernel space both need to be
>>>>>>>>>>>> finished, such as eal interrupt callback need to be inactive
>>>>>>>>>>>> before the callback be unregistered when device is being
>>>>>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
>>>>>>>>>>>> that it could finish interrupt callback soon and give time to
>>>>>>>>>>>> let the failure process done
>>>>>>>>> before detaching.
>>>>>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
>>>>>>>>>>>> handler")
>>>>>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>>>>>>       1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>>>>>>>>>> index 9c0edca..9c673cf 100644
>>>>>>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>>>>>>       				device_name);
>>>>>>>>>>>>       			return;
>>>>>>>>>>>>       		}
>>>>>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>>>>>>> +		/*
>>>>>>>>>>>> +		 * Before detach device, the hot-unplug failure
>>>>>>> process in
>>>>>>>>>>>> +		 * user space and kernel space both need to be
>>>>>>> finished,
>>>>>>>>>>>> +		 * such as eal interrupt callback need to be inactive
>>>>>>> before
>>>>>>>>>>>> +		 * the callback be unregistered when device is being
>>>>>>> cleaned.
>>>>>>>>>>>> +		 * So finished interrupt callback soon here and give
>>>>>>> time to
>>>>>>>>>>>> +		 * let the work done before detaching.
>>>>>>>>>>>> +		 */
>>>>>>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>>>>>>> +				rmv_event_callback, (void
>>>>>>>>>>>> *)(intptr_t)port_id))
>>>>>>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>>>>>>> +				"Could not set up deferred device
>>>>>>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>>>>>>> Why not to use callback and that is it?
>>>>>>>>>>>
>>>>>>>>>>> I think that it's better to let the EAL to detach the device
>>>>>>>>>>> after all the
>>>>>>>>> callbacks were done and not to do it by the user callback.
>>>>>>>>>>> So the application\callback owners just need to clean its
>>>>>>>>>>> resources with understanding that after the callback the
>>>>>>>>>>> device(and the callback
>>>>>>>>>> itself) will be detached by the EAL.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Firstly, at the currently framework and solution, such as
>>>>>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
>>>>>>>>>> deferred device
>>>>>>>>> removal,
>>>>>>>>>> we tend to give the control of detaching device to the
>>>>>>>>>> application, and the whole process is located on the user's
>>>>>>>>>> callback. Notify app to detach device by callback but make it
>>>>>>>>>> deferred,
>>>>> i think it is fine.
>>>>>>>> But the device must be detached in remove event, why not to do it
>>>>>>>> in
>>>>> EAL?
>>>>>>> I think it because of before detached the device, application
>>>>>>> should be stop the forwarding at first, then stop the device, then
>>>>>>> close
>>>>>>>
>>>>>>> the device, finally call eal unplug API to detach device. If eal
>>>>>>> directly detach device at the first step, there will be mountain
>>>>>>> user space error need to handle, so that is one reason why need to
>>>>>>> provider the remove notification to app, and let app to process it.
>>>>>> This is why the EAL need to detach the device only after all the
>>>>>> user
>>>>> callbacks were done.
>>>>>
>>>>>
>>>>> If i correctly got your meaning, you suppose to let eal to mandatory
>>>>> detach device but not app, app just need to stop/close port, right?
>>>> Yes, the app should stop,close,clean its own resources of the removed
>>>> device, Then, EAL to detach the device.
>>>>
>>>>> If so, it will need to modify rmv_event_callback by not invoke the
>>>>> detaching func and add some detaching logic to hotplug framework in eal.
>>>>>
>>>> rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
>>> event), so you need to use another one\ add parameter to it.
>>>> And yes, you need to detach the device from EAL, should be simple.
>>>
>>> I think rmv_event_callback is original use for other hotplug event (ETHDEV
>>> RMV event), but it still use the common hotplug mechanism(app callback and
>>> app detach),
>>>
>>> so i think it will still need to face this callback issue and you could check that
>>> eth_event_callback also use the rte alarm to detach device.
>> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
>> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.
> Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed.
>
>> But, in the EAL event case this is different as I explained before.
>>
>> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable.
> Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE.
>
>>> so my suggestion is that, you maybe propose a good idea but let we keep on
>>> current mechanism until we come to a final good solution agreement, before
>>> that, just let it functional.
>> I still suggest to fix the issue by an EAL detaching.
> I don't understand the issue, but yes we can call rte_dev_remove in EAL,
> after the callback return.
> Ideally, the callback should be able to return the decision of cleaning
> the device or not.
>
> I suggest several steps (a roadmap for HW unplug):
>
>

I mainly agree on that we need to modify currently hotplug framework for 
HW hot-unplug. But i think we might be think about that.

1) simple delete detach_port_device from rmv_event_callback will affect 
both currently ethdev event and eal event hot-unplug mechanism.

2) If we remove call to rmv_event_callback in eth_dev_event_callback() , 
we also need to remove call to rmv_event_callback in 
eth_event_callback(),  since it also use deferred device removal to 
waiting the interrupt callback be unregistered in 
mlx5_dev_interrupt_handler_uninstall.

so i think that basically the multiple port free and the hotplug 
framework integrate should be an RFC at the begin of 19.02, but should 
not affect currently mechanism. And i suggest the several steps might be 
better as.

	- in 18.11, rename eth_dev_event_callback to dev_event_callback
	- in 18.11, rename rmv_event_callback() to rmv_port_callback(), keep current function calling when detach one port both for ethdev event and eal event. (we could document it for the limitation.)

	- in 19.02, remove call to detach_port_device() from rmv_event_callback()
	- in 19.02, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback
	- in 19.02, remove call to rmv_event_callback() in eth_dev_event_callback()
	- in 19.02, convert all PMDs to free port resources on rte_eth_dev_close()
	- in 19.05, automatically call rte_dev_remove() from PMD when closing last port
	- in 19.08, we may try to unbind the kernel driver in rte_dev_remove()

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue
  2018-11-06  6:07 [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue Jeff Guo
                   ` (2 preceding siblings ...)
  2018-11-06  6:07 ` [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue " Jeff Guo
@ 2018-11-15  9:18 ` Jeff Guo
  2018-11-15  9:18   ` Jeff Guo
                     ` (3 more replies)
  3 siblings, 4 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-15  9:18 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

This patch set aim to fix some dead lock issue when device be hotplug-in
after device be hot-unplugged for vfio.

v2->v1:
refine some document and show the limitation. 

Jeff Guo (3):
  eal: fix lock issue for hot-unplug
  vfio: fix to add handler lock for hot-unplug
  app/testpmd: fix callback issue for hot-unplug

 app/test-pmd/testpmd.c                | 34 ++++++++++++++++++++++++++--------
 drivers/bus/pci/linux/pci_vfio.c      | 14 +++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
 3 files changed, 47 insertions(+), 13 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
@ 2018-11-15  9:18   ` Jeff Guo
  2018-11-18 16:19     ` Thomas Monjalon
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 1/3] eal: fix lock issue for hot-unplug Jeff Guo
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Jeff Guo @ 2018-11-15  9:18 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

This patch set aim to fix some dead lock issue when device be hotplug-in
after device be hot-unplugged for vfio.

v2->v1:
refine some document and show the limitation. 

Jeff Guo (3):
  eal: fix lock issue for hot-unplug
  vfio: fix to add handler lock for hot-unplug
  app/testpmd: fix callback issue for hot-unplug

 app/test-pmd/testpmd.c                | 34 ++++++++++++++++++++++++++--------
 drivers/bus/pci/linux/pci_vfio.c      | 14 +++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
 3 files changed, 47 insertions(+), 13 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V2 1/3] eal: fix lock issue for hot-unplug
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
  2018-11-15  9:18   ` Jeff Guo
@ 2018-11-15  9:18   ` Jeff Guo
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 2/3] vfio: fix to add handler lock " Jeff Guo
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 3/3] app/testpmd: fix callback issue " Jeff Guo
  3 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-15  9:18 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

When device be hot-unplugged, the hot-unplug handler will be invoked by uio
remove event and the device will be detached, then kernel will sent another
pci remove event. So if there is any unlock miss, it will cause a dead lock
issue. This patch will add this missing unlock for hot-unplug handler.

Fixes: 0fc54536b14a ("eal: add failure handling for hot-unplug")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1:
refine commit log
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index d589c69..2830c86 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -258,7 +258,7 @@ dev_uev_handler(__rte_unused void *param)
 			if (bus == NULL) {
 				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
 					busname);
-				return;
+				goto failure_handle_err;
 			}
 
 			dev = bus->find_device(NULL, cmp_dev_name,
@@ -266,19 +266,23 @@ dev_uev_handler(__rte_unused void *param)
 			if (dev == NULL) {
 				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
 					"bus (%s)\n", uevent.devname, busname);
-				return;
+				goto failure_handle_err;
 			}
 
 			ret = bus->hot_unplug_handler(dev);
-			rte_spinlock_unlock(&failure_handle_lock);
 			if (ret) {
 				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
 					"for device (%s)\n", dev->name);
-				return;
 			}
+			rte_spinlock_unlock(&failure_handle_lock);
 		}
 		rte_dev_event_callback_process(uevent.devname, uevent.type);
 	}
+
+	return;
+
+failure_handle_err:
+	rte_spinlock_unlock(&failure_handle_lock);
 }
 
 int __rte_experimental
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V2 2/3] vfio: fix to add handler lock for hot-unplug
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
  2018-11-15  9:18   ` Jeff Guo
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 1/3] eal: fix lock issue for hot-unplug Jeff Guo
@ 2018-11-15  9:18   ` Jeff Guo
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 3/3] app/testpmd: fix callback issue " Jeff Guo
  3 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-15  9:18 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

When the sigbus handler be enabled for hot-unplug, whatever hot-unplug
sigbus or origin sigbus occur, the sigbus handler will be invoked and
it will access the bus and device. While in the control path, the vfio
req handler also will process the bus and device, so a protection of
the resources in vfio req handler should be need. This patch add a lock
in vfio req handler when process bus and device resource, to avoid the
synchronization issue when device be hot-unplugged.

Fixes: c115fd000c32 ("vfio: handle hotplug request notifier")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1:
refine commit log
---
 drivers/bus/pci/linux/pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 305cc06..d2c8410 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -19,6 +19,7 @@
 #include <rte_vfio.h>
 #include <rte_eal.h>
 #include <rte_bus.h>
+#include <rte_spinlock.h>
 
 #include "eal_filesystem.h"
 
@@ -35,6 +36,14 @@
  * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
  */
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
 #ifdef VFIO_PRESENT
 
 #ifndef PAGE_SIZE
@@ -289,11 +298,12 @@ pci_vfio_req_handler(void *param)
 	int ret;
 	struct rte_device *device = (struct rte_device *)param;
 
+	rte_spinlock_lock(&failure_handle_lock);
 	bus = rte_bus_find_by_device(device);
 	if (bus == NULL) {
 		RTE_LOG(ERR, EAL, "Cannot find bus for device (%s)\n",
 			device->name);
-		return;
+		goto handle_end;
 	}
 
 	/*
@@ -306,6 +316,8 @@ pci_vfio_req_handler(void *param)
 		RTE_LOG(ERR, EAL,
 			"Can not handle hot-unplug for device (%s)\n",
 			device->name);
+handle_end:
+	rte_spinlock_unlock(&failure_handle_lock);
 }
 
 /* enable notifier (only enable req now) */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH V2 3/3] app/testpmd: fix callback issue for hot-unplug
  2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
                     ` (2 preceding siblings ...)
  2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 2/3] vfio: fix to add handler lock " Jeff Guo
@ 2018-11-15  9:18   ` Jeff Guo
  3 siblings, 0 replies; 28+ messages in thread
From: Jeff Guo @ 2018-11-15  9:18 UTC (permalink / raw)
  To: konstantin.ananyev, anatoly.burakov, thomas, bernard.iremonger,
	jingjing.wu, wenzhuo.lu
  Cc: ferruh.yigit, dev, jia.guo, helin.zhang, matan, shaopeng.he

Because the user's callback is invoked in eal interrupt callback, the
interrupt callback need to be finished before it can be unregistered
when detaching device. So finish callback soon and use a deferred
removal to detach device is need.

It is a workaround, once the device detaching be moved into the eal in
the future, the deferred removal could be deleted. This patch aim to
add this workaround and refine the function name and the description to
be more explicit and comment the limitation.

Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1:
rename the function, refine the doc, and show the limitation.
---
 app/test-pmd/testpmd.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9c0edca..4c75587 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -506,7 +506,7 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
-static void eth_dev_event_callback(const char *device_name,
+static void dev_event_callback(const char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
 
@@ -2434,7 +2434,7 @@ pmd_test_exit(void)
 		}
 
 		ret = rte_dev_event_callback_unregister(NULL,
-			eth_dev_event_callback, NULL);
+			dev_event_callback, NULL);
 		if (ret < 0) {
 			RTE_LOG(ERR, EAL,
 				"fail to unregister device event callback.\n");
@@ -2516,8 +2516,14 @@ check_all_ports_link_status(uint32_t port_mask)
 	}
 }
 
+/*
+ * This callback is for remove a port for a device. It has limitation because
+ * it is not for multiple port removal for a device.
+ * TODO: the device detach invoke will plan to be removed from user side to
+ * eal. And convert all PMDs to free port resources on ether device closing.
+ */
 static void
-rmv_event_callback(void *arg)
+rmv_port_callback(void *arg)
 {
 	int need_to_start = 0;
 	int org_no_link_check = no_link_check;
@@ -2565,7 +2571,7 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 		if (port_id_is_invalid(port_id, DISABLED_WARN))
 			break;
 		if (rte_eal_alarm_set(100000,
-				rmv_event_callback, (void *)(intptr_t)port_id))
+				rmv_port_callback, (void *)(intptr_t)port_id))
 			fprintf(stderr, "Could not set up deferred device removal\n");
 		break;
 	default:
@@ -2598,7 +2604,7 @@ register_eth_event_callback(void)
 
 /* This function is used by the interrupt thread */
 static void
-eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
+dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
 	uint16_t port_id;
@@ -2612,7 +2618,7 @@ eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 
 	switch (type) {
 	case RTE_DEV_EVENT_REMOVE:
-		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+		RTE_LOG(DEBUG, EAL, "The device: %s has been removed!\n",
 			device_name);
 		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
 		if (ret) {
@@ -2620,7 +2626,19 @@ eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 				device_name);
 			return;
 		}
-		rmv_event_callback((void *)(intptr_t)port_id);
+		/*
+		 * Because the user's callback is invoked in eal interrupt
+		 * callback, the interrupt callback need to be finished before
+		 * it can be unregistered when detaching device. So finish
+		 * callback soon and use a deferred removal to detach device
+		 * is need. It is a workaround, once the device detaching be
+		 * moved into the eal in the future, the deferred removal could
+		 * be deleted.
+		 */
+		if (rte_eal_alarm_set(100000,
+				rmv_port_callback, (void *)(intptr_t)port_id))
+			RTE_LOG(ERR, EAL,
+				"Could not set up deferred device removal\n");
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -3167,7 +3185,7 @@ main(int argc, char** argv)
 		}
 
 		ret = rte_dev_event_callback_register(NULL,
-			eth_dev_event_callback, NULL);
+			dev_event_callback, NULL);
 		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail  to register device event callback\n");
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue
  2018-11-15  9:18   ` Jeff Guo
@ 2018-11-18 16:19     ` Thomas Monjalon
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Monjalon @ 2018-11-18 16:19 UTC (permalink / raw)
  To: Jeff Guo
  Cc: konstantin.ananyev, anatoly.burakov, bernard.iremonger,
	jingjing.wu, wenzhuo.lu, ferruh.yigit, dev, helin.zhang, matan,
	shaopeng.he

15/11/2018 10:18, Jeff Guo:
> This patch set aim to fix some dead lock issue when device be hotplug-in
> after device be hot-unplugged for vfio.
> 
> v2->v1:
> refine some document and show the limitation. 
> 
> Jeff Guo (3):
>   eal: fix lock issue for hot-unplug
>   vfio: fix to add handler lock for hot-unplug
>   app/testpmd: fix callback issue for hot-unplug

Applied with following titles:
	eal: fix deadlock in hot-unplug
	vfio: add lock for hot-unplug failure handler
	app/testpmd: workaround deadlock in hot-unplug callback

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-11-18 16:19 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-06  6:07 [dpdk-dev] [PATCH 0/3] fix vfio hot-unplug issue Jeff Guo
2018-11-06  6:07 ` [dpdk-dev] [PATCH 1/3] eal: fix lock issue for hot-unplug Jeff Guo
2018-11-06  6:22   ` Matan Azrad
2018-11-07  5:49     ` Jeff Guo
2018-11-08  7:08       ` Matan Azrad
2018-11-06  6:07 ` [dpdk-dev] [PATCH 2/3] vfio: fix to add handler lock " Jeff Guo
2018-11-06  6:23   ` Matan Azrad
2018-11-07  6:15     ` Jeff Guo
2018-11-08  7:20       ` Matan Azrad
2018-11-08  8:30         ` Jeff Guo
2018-11-06  6:07 ` [dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue " Jeff Guo
2018-11-06  6:36   ` Matan Azrad
2018-11-07  7:30     ` Jeff Guo
2018-11-07 11:05       ` Ananyev, Konstantin
2018-11-08  7:28         ` Matan Azrad
2018-11-08  8:49           ` Jeff Guo
2018-11-08  9:35             ` Matan Azrad
2018-11-09  3:55               ` Jeff Guo
2018-11-09  5:24                 ` Matan Azrad
2018-11-09  6:17                   ` Jeff Guo
     [not found]                     ` <AM0PR0502MB401938411A7E2BA9E76576A2D2C00@AM0PR0502MB4019.eurprd05.prod.outlook.com>
2018-11-12  1:35                       ` Thomas Monjalon
2018-11-14  9:32                         ` Jeff Guo
2018-11-15  9:18 ` [dpdk-dev] [PATCH V2 0/3] fix vfio hot-unplug issue Jeff Guo
2018-11-15  9:18   ` Jeff Guo
2018-11-18 16:19     ` Thomas Monjalon
2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 1/3] eal: fix lock issue for hot-unplug Jeff Guo
2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 2/3] vfio: fix to add handler lock " Jeff Guo
2018-11-15  9:18   ` [dpdk-dev] [PATCH V2 3/3] app/testpmd: fix callback issue " Jeff Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).