DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe
@ 2019-03-02  2:42 Thomas Monjalon
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device Thomas Monjalon
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Thomas Monjalon @ 2019-03-02  2:42 UTC (permalink / raw)
  To: dev

While working on multi-process support of failsafe with Raslan,
some issues have been discovered on probing failures.

Thomas Monjalon (3):
  eal: remove useless checks for already probed device
  eal: remove error logs for already probed device
  eal: fix multi-process probe failure handling

 lib/librte_eal/common/eal_common_dev.c | 31 ++++++++++----------------
 lib/librte_eal/common/eal_private.h    |  2 +-
 lib/librte_eal/common/hotplug_mp.c     | 14 +++++++-----
 3 files changed, 21 insertions(+), 26 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device
  2019-03-02  2:42 [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Thomas Monjalon
@ 2019-03-02  2:42 ` Thomas Monjalon
  2019-03-13 13:46   ` Zhang, Qi Z
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 2/3] eal: remove error logs " Thomas Monjalon
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Thomas Monjalon @ 2019-03-02  2:42 UTC (permalink / raw)
  To: dev; +Cc: dariusz.stojaczyk, qi.z.zhang, stable

The function eal_dev_hotplug_request_to_secondary() never returns
-EEXIST result. The case of already probed device is filtered out.

The test in __handle_secondary_request() was always true.
The test in rte_dev_probe() was never true, and that's fine not
returning -EEXIST if device is already attached in secondary processes.

Fixes: 494db286f37d ("eal: fix multi-process hotplug if attached in secondary")
Cc: dariusz.stojaczyk@intel.com
Cc: qi.z.zhang@intel.com
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/librte_eal/common/eal_common_dev.c | 9 +--------
 lib/librte_eal/common/hotplug_mp.c     | 3 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index fd7f5ca7d5..048c0b025f 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -247,18 +247,11 @@ rte_dev_probe(const char *devargs)
 		goto rollback;
 	}
 
-	/**
-	 * if any secondary failed to attach, we need to consider if rollback
-	 * is necessary.
-	 */
+	/* if any secondary failed to attach, need to rollback. */
 	if (req.result != 0) {
 		RTE_LOG(ERR, EAL,
 			"Failed to attach device on secondary process\n");
 		ret = req.result;
-
-		/* for -EEXIST, we don't need to rollback. */
-		if (ret == -EEXIST)
-			return ret;
 		goto rollback;
 	}
 
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
index 4052a5c7fb..94bd1d896e 100644
--- a/lib/librte_eal/common/hotplug_mp.c
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -110,8 +110,7 @@ __handle_secondary_request(void *param)
 		if (tmp_req.result != 0) {
 			ret = tmp_req.result;
 			RTE_LOG(ERR, EAL, "Failed to hotplug add device on secondary\n");
-			if (ret != -EEXIST)
-				goto rollback;
+			goto rollback;
 		}
 	} else if (req->t == EAL_DEV_REQ_TYPE_DETACH) {
 		ret = rte_devargs_parse(&da, req->devargs);
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 2/3] eal: remove error logs for already probed device
  2019-03-02  2:42 [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Thomas Monjalon
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device Thomas Monjalon
@ 2019-03-02  2:42 ` Thomas Monjalon
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling Thomas Monjalon
  2023-06-14 19:39 ` [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Stephen Hemminger
  3 siblings, 0 replies; 9+ messages in thread
From: Thomas Monjalon @ 2019-03-02  2:42 UTC (permalink / raw)
  To: dev

In functions rte_dev_probe() and __handle_secondary_request(),
an error message was logged as a failure if the probe returns -EEXIST.
Anyway this error code is not returned by rte_dev_probe()
as it is not considered as an error.
For instance, in multi-process case, some synchronizations may request
to probe a device which was already probed before by the process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/librte_eal/common/eal_common_dev.c | 12 ++----------
 lib/librte_eal/common/hotplug_mp.c     |  5 ++---
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 048c0b025f..deaaea9345 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -222,18 +222,10 @@ rte_dev_probe(const char *devargs)
 	/* primary attach the new device itself. */
 	ret = local_dev_probe(devargs, &dev);
 
-	if (ret != 0) {
+	if (ret != 0 && ret != -EEXIST) {
 		RTE_LOG(ERR, EAL,
 			"Failed to attach device on primary process\n");
-
-		/**
-		 * it is possible that secondary process failed to attached a
-		 * device that primary process have during initialization,
-		 * so for -EEXIST case, we still need to sync with secondary
-		 * process.
-		 */
-		if (ret != -EEXIST)
-			return ret;
+		return ret;
 	}
 
 	/* primary send attach sync request to secondary. */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
index 94bd1d896e..69e9a16d6a 100644
--- a/lib/librte_eal/common/hotplug_mp.c
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -96,10 +96,9 @@ __handle_secondary_request(void *param)
 
 	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
 		ret = local_dev_probe(req->devargs, &dev);
-		if (ret != 0) {
+		if (ret != 0 && ret != -EEXIST) {
 			RTE_LOG(ERR, EAL, "Failed to hotplug add device on primary\n");
-			if (ret != -EEXIST)
-				goto finish;
+			goto finish;
 		}
 		ret = eal_dev_hotplug_request_to_secondary(&tmp_req);
 		if (ret != 0) {
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling
  2019-03-02  2:42 [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Thomas Monjalon
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device Thomas Monjalon
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 2/3] eal: remove error logs " Thomas Monjalon
@ 2019-03-02  2:42 ` Thomas Monjalon
  2019-03-14  7:15   ` Zhang, Qi Z
  2023-06-14 19:39 ` [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Stephen Hemminger
  3 siblings, 1 reply; 9+ messages in thread
From: Thomas Monjalon @ 2019-03-02  2:42 UTC (permalink / raw)
  To: dev; +Cc: qi.z.zhang, stable

If probe fails in multi-process context, the device must removed
in other processes for consistency. This is a rollback mechanism.
However the rollback should not happen for devices which were
already probed before the current probe transaction.

When probing an already probed device, the driver may reject
with -EEXIST or update and succeed with code 0.
In order to distinguish successful new probe from re-probe,
in the function local_dev_probe(), the positive EEXIST code
is returned for the latter case.

The functions rte_dev_probe() and __handle_secondary_request()
can test for -EEXIST and +EEXIST, and skip rollback in such case.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
Cc: qi.z.zhang@intel.com
Cc: stable@dpdk.org

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/librte_eal/common/eal_common_dev.c | 12 ++++++++++--
 lib/librte_eal/common/eal_private.h    |  2 +-
 lib/librte_eal/common/hotplug_mp.c     |  8 ++++++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index deaaea9345..2c7b1ab071 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -132,6 +132,7 @@ local_dev_probe(const char *devargs, struct rte_device **new_dev)
 {
 	struct rte_device *dev;
 	struct rte_devargs *da;
+	bool already_probed;
 	int ret;
 
 	*new_dev = NULL;
@@ -171,12 +172,15 @@ local_dev_probe(const char *devargs, struct rte_device **new_dev)
 	 * those devargs shouldn't be removed manually anymore.
 	 */
 
+	already_probed = rte_dev_is_probed(dev);
 	ret = dev->bus->plug(dev);
 	if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
 		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
 			dev->name);
 		return ret;
 	}
+	if (ret == 0 && already_probed)
+		ret = EEXIST; /* hint to avoid any rollback */
 
 	*new_dev = dev;
 	return ret;
@@ -194,6 +198,7 @@ rte_dev_probe(const char *devargs)
 {
 	struct eal_dev_mp_req req;
 	struct rte_device *dev;
+	bool already_probed;
 	int ret;
 
 	memset(&req, 0, sizeof(req));
@@ -221,8 +226,8 @@ rte_dev_probe(const char *devargs)
 
 	/* primary attach the new device itself. */
 	ret = local_dev_probe(devargs, &dev);
-
-	if (ret != 0 && ret != -EEXIST) {
+	already_probed = (ret == -EEXIST || ret == EEXIST);
+	if (ret < 0 && !already_probed) {
 		RTE_LOG(ERR, EAL,
 			"Failed to attach device on primary process\n");
 		return ret;
@@ -250,6 +255,9 @@ rte_dev_probe(const char *devargs)
 	return 0;
 
 rollback:
+	if (already_probed)
+		return ret; /* skip rollback */
+
 	req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
 
 	/* primary send rollback request to secondary. */
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 798ede553b..a01d252930 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,7 +304,7 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  * @param new_dev
  *   new device be probed as output.
  * @return
- *   0 on success, negative on error.
+ *   >=0 on success (+EEXIST if already probed), negative on error.
  */
 int local_dev_probe(const char *devargs, struct rte_device **new_dev);
 
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
index 69e9a16d6a..9f8ef28a3b 100644
--- a/lib/librte_eal/common/hotplug_mp.c
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -90,13 +90,15 @@ __handle_secondary_request(void *param)
 	struct rte_devargs da;
 	struct rte_device *dev;
 	struct rte_bus *bus;
+	bool already_probed = false;
 	int ret = 0;
 
 	tmp_req = *req;
 
 	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
 		ret = local_dev_probe(req->devargs, &dev);
-		if (ret != 0 && ret != -EEXIST) {
+		already_probed = (ret == -EEXIST || ret == EEXIST);
+		if (ret < 0 && !already_probed) {
 			RTE_LOG(ERR, EAL, "Failed to hotplug add device on primary\n");
 			goto finish;
 		}
@@ -159,7 +161,7 @@ __handle_secondary_request(void *param)
 	goto finish;
 
 rollback:
-	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
+	if (req->t == EAL_DEV_REQ_TYPE_ATTACH && !already_probed) {
 		tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
 		eal_dev_hotplug_request_to_secondary(&tmp_req);
 		local_dev_remove(dev);
@@ -238,6 +240,8 @@ static void __handle_primary_request(void *param)
 	case EAL_DEV_REQ_TYPE_ATTACH:
 	case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
 		ret = local_dev_probe(req->devargs, &dev);
+		if (ret > 0)
+			ret = 0; /* return only errors */
 		break;
 	case EAL_DEV_REQ_TYPE_DETACH:
 	case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device Thomas Monjalon
@ 2019-03-13 13:46   ` Zhang, Qi Z
  2023-06-11  2:52     ` Stephen Hemminger
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Qi Z @ 2019-03-13 13:46 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Stojaczyk, Dariusz, stable

Hi:

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Saturday, March 2, 2019 10:43 AM
> To: dev@dpdk.org
> Cc: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; stable@dpdk.org
> Subject: [PATCH 1/3] eal: remove useless checks for already probed device
> 
> The function eal_dev_hotplug_request_to_secondary() never returns -EEXIST
> result. The case of already probed device is filtered out.
> 
> The test in __handle_secondary_request() was always true.
> The test in rte_dev_probe() was never true, and that's fine not returning -EEXIST
> if device is already attached in secondary processes.

I didn't get this.
eal_dev_hotplug_request_to_secondary() never return -EEXIST, but req->result could be -EEXIST.

This happens when secondary try to attach an already attached device 
(__handle_primary_request --> local_dev_probe --> dev->bus->plug )


> 
> Fixes: 494db286f37d ("eal: fix multi-process hotplug if attached in secondary")
> Cc: dariusz.stojaczyk@intel.com
> Cc: qi.z.zhang@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/librte_eal/common/eal_common_dev.c | 9 +--------
>  lib/librte_eal/common/hotplug_mp.c     | 3 +--
>  2 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index fd7f5ca7d5..048c0b025f 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -247,18 +247,11 @@ rte_dev_probe(const char *devargs)
>  		goto rollback;
>  	}
> 
> -	/**
> -	 * if any secondary failed to attach, we need to consider if rollback
> -	 * is necessary.
> -	 */
> +	/* if any secondary failed to attach, need to rollback. */
>  	if (req.result != 0) {
>  		RTE_LOG(ERR, EAL,
>  			"Failed to attach device on secondary process\n");
>  		ret = req.result;
> -
> -		/* for -EEXIST, we don't need to rollback. */
> -		if (ret == -EEXIST)
> -			return ret;
>  		goto rollback;
>  	}
> 
> diff --git a/lib/librte_eal/common/hotplug_mp.c
> b/lib/librte_eal/common/hotplug_mp.c
> index 4052a5c7fb..94bd1d896e 100644
> --- a/lib/librte_eal/common/hotplug_mp.c
> +++ b/lib/librte_eal/common/hotplug_mp.c
> @@ -110,8 +110,7 @@ __handle_secondary_request(void *param)
>  		if (tmp_req.result != 0) {
>  			ret = tmp_req.result;
>  			RTE_LOG(ERR, EAL, "Failed to hotplug add device on
> secondary\n");
> -			if (ret != -EEXIST)
> -				goto rollback;
> +			goto rollback;
>  		}
>  	} else if (req->t == EAL_DEV_REQ_TYPE_DETACH) {
>  		ret = rte_devargs_parse(&da, req->devargs);
> --
> 2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling Thomas Monjalon
@ 2019-03-14  7:15   ` Zhang, Qi Z
  2019-03-14  7:15     ` Zhang, Qi Z
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang, Qi Z @ 2019-03-14  7:15 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: stable



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Saturday, March 2, 2019 10:43 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; stable@dpdk.org
> Subject: [PATCH 3/3] eal: fix multi-process probe failure handling
> 
> If probe fails in multi-process context, the device must removed in other
> processes for consistency. This is a rollback mechanism.
> However the rollback should not happen for devices which were already probed
> before the current probe transaction.
> 
> When probing an already probed device, the driver may reject with -EEXIST or
> update and succeed with code 0.
> In order to distinguish successful new probe from re-probe, in the function
> local_dev_probe(), the positive EEXIST code is returned for the latter case.
> 
> The functions rte_dev_probe() and __handle_secondary_request() can test for
> -EEXIST and +EEXIST, and skip rollback in such case.
> 
> Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
> Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
> Cc: qi.z.zhang@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/librte_eal/common/eal_common_dev.c | 12 ++++++++++--
>  lib/librte_eal/common/eal_private.h    |  2 +-
>  lib/librte_eal/common/hotplug_mp.c     |  8 ++++++--
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index deaaea9345..2c7b1ab071 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -132,6 +132,7 @@ local_dev_probe(const char *devargs, struct rte_device
> **new_dev)  {
>  	struct rte_device *dev;
>  	struct rte_devargs *da;
> +	bool already_probed;
>  	int ret;
> 
>  	*new_dev = NULL;
> @@ -171,12 +172,15 @@ local_dev_probe(const char *devargs, struct rte_device
> **new_dev)
>  	 * those devargs shouldn't be removed manually anymore.
>  	 */
> 
> +	already_probed = rte_dev_is_probed(dev);
>  	ret = dev->bus->plug(dev);
>  	if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
>  		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
>  			dev->name);
>  		return ret;
>  	}
> +	if (ret == 0 && already_probed)
> +		ret = EEXIST; /* hint to avoid any rollback */

What if bus->plug return -EEXIST and rte_dev_is_probed return true? (See rte_pci_probe_one_driver)
You will not give hint here, but is this expected?

> 
>  	*new_dev = dev;
>  	return ret;
> @@ -194,6 +198,7 @@ rte_dev_probe(const char *devargs)  {
>  	struct eal_dev_mp_req req;
>  	struct rte_device *dev;
> +	bool already_probed;
>  	int ret;
> 
>  	memset(&req, 0, sizeof(req));
> @@ -221,8 +226,8 @@ rte_dev_probe(const char *devargs)
> 
>  	/* primary attach the new device itself. */
>  	ret = local_dev_probe(devargs, &dev);
> -
> -	if (ret != 0 && ret != -EEXIST) {
> +	already_probed = (ret == -EEXIST || ret == EEXIST);
> +	if (ret < 0 && !already_probed) {
>  		RTE_LOG(ERR, EAL,
>  			"Failed to attach device on primary process\n");
>  		return ret;
> @@ -250,6 +255,9 @@ rte_dev_probe(const char *devargs)
>  	return 0;
> 
>  rollback:
> +	if (already_probed)
> +		return ret; /* skip rollback */
> +
>  	req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
> 
>  	/* primary send rollback request to secondary. */ diff --git
> a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index 798ede553b..a01d252930 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -304,7 +304,7 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
>   * @param new_dev
>   *   new device be probed as output.
>   * @return
> - *   0 on success, negative on error.
> + *   >=0 on success (+EEXIST if already probed), negative on error.
>   */
>  int local_dev_probe(const char *devargs, struct rte_device **new_dev);
> 
> diff --git a/lib/librte_eal/common/hotplug_mp.c
> b/lib/librte_eal/common/hotplug_mp.c
> index 69e9a16d6a..9f8ef28a3b 100644
> --- a/lib/librte_eal/common/hotplug_mp.c
> +++ b/lib/librte_eal/common/hotplug_mp.c
> @@ -90,13 +90,15 @@ __handle_secondary_request(void *param)
>  	struct rte_devargs da;
>  	struct rte_device *dev;
>  	struct rte_bus *bus;
> +	bool already_probed = false;
>  	int ret = 0;
> 
>  	tmp_req = *req;
> 
>  	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
>  		ret = local_dev_probe(req->devargs, &dev);
> -		if (ret != 0 && ret != -EEXIST) {
> +		already_probed = (ret == -EEXIST || ret == EEXIST);
> +		if (ret < 0 && !already_probed) {
>  			RTE_LOG(ERR, EAL, "Failed to hotplug add device on primary\n");
>  			goto finish;
>  		}
> @@ -159,7 +161,7 @@ __handle_secondary_request(void *param)
>  	goto finish;
> 
>  rollback:
> -	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
> +	if (req->t == EAL_DEV_REQ_TYPE_ATTACH && !already_probed) {
>  		tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
>  		eal_dev_hotplug_request_to_secondary(&tmp_req);
>  		local_dev_remove(dev);
> @@ -238,6 +240,8 @@ static void __handle_primary_request(void *param)
>  	case EAL_DEV_REQ_TYPE_ATTACH:
>  	case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
>  		ret = local_dev_probe(req->devargs, &dev);
> +		if (ret > 0)
> +			ret = 0; /* return only errors */
>  		break;
>  	case EAL_DEV_REQ_TYPE_DETACH:
>  	case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
> --
> 2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling
  2019-03-14  7:15   ` Zhang, Qi Z
@ 2019-03-14  7:15     ` Zhang, Qi Z
  0 siblings, 0 replies; 9+ messages in thread
From: Zhang, Qi Z @ 2019-03-14  7:15 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: stable



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Saturday, March 2, 2019 10:43 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; stable@dpdk.org
> Subject: [PATCH 3/3] eal: fix multi-process probe failure handling
> 
> If probe fails in multi-process context, the device must removed in other
> processes for consistency. This is a rollback mechanism.
> However the rollback should not happen for devices which were already probed
> before the current probe transaction.
> 
> When probing an already probed device, the driver may reject with -EEXIST or
> update and succeed with code 0.
> In order to distinguish successful new probe from re-probe, in the function
> local_dev_probe(), the positive EEXIST code is returned for the latter case.
> 
> The functions rte_dev_probe() and __handle_secondary_request() can test for
> -EEXIST and +EEXIST, and skip rollback in such case.
> 
> Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
> Fixes: ac9e4a17370f ("eal: support attach/detach shared device from secondary")
> Cc: qi.z.zhang@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  lib/librte_eal/common/eal_common_dev.c | 12 ++++++++++--
>  lib/librte_eal/common/eal_private.h    |  2 +-
>  lib/librte_eal/common/hotplug_mp.c     |  8 ++++++--
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index deaaea9345..2c7b1ab071 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -132,6 +132,7 @@ local_dev_probe(const char *devargs, struct rte_device
> **new_dev)  {
>  	struct rte_device *dev;
>  	struct rte_devargs *da;
> +	bool already_probed;
>  	int ret;
> 
>  	*new_dev = NULL;
> @@ -171,12 +172,15 @@ local_dev_probe(const char *devargs, struct rte_device
> **new_dev)
>  	 * those devargs shouldn't be removed manually anymore.
>  	 */
> 
> +	already_probed = rte_dev_is_probed(dev);
>  	ret = dev->bus->plug(dev);
>  	if (ret && !rte_dev_is_probed(dev)) { /* if hasn't ever succeeded */
>  		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
>  			dev->name);
>  		return ret;
>  	}
> +	if (ret == 0 && already_probed)
> +		ret = EEXIST; /* hint to avoid any rollback */

What if bus->plug return -EEXIST and rte_dev_is_probed return true? (See rte_pci_probe_one_driver)
You will not give hint here, but is this expected?

> 
>  	*new_dev = dev;
>  	return ret;
> @@ -194,6 +198,7 @@ rte_dev_probe(const char *devargs)  {
>  	struct eal_dev_mp_req req;
>  	struct rte_device *dev;
> +	bool already_probed;
>  	int ret;
> 
>  	memset(&req, 0, sizeof(req));
> @@ -221,8 +226,8 @@ rte_dev_probe(const char *devargs)
> 
>  	/* primary attach the new device itself. */
>  	ret = local_dev_probe(devargs, &dev);
> -
> -	if (ret != 0 && ret != -EEXIST) {
> +	already_probed = (ret == -EEXIST || ret == EEXIST);
> +	if (ret < 0 && !already_probed) {
>  		RTE_LOG(ERR, EAL,
>  			"Failed to attach device on primary process\n");
>  		return ret;
> @@ -250,6 +255,9 @@ rte_dev_probe(const char *devargs)
>  	return 0;
> 
>  rollback:
> +	if (already_probed)
> +		return ret; /* skip rollback */
> +
>  	req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
> 
>  	/* primary send rollback request to secondary. */ diff --git
> a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index 798ede553b..a01d252930 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -304,7 +304,7 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
>   * @param new_dev
>   *   new device be probed as output.
>   * @return
> - *   0 on success, negative on error.
> + *   >=0 on success (+EEXIST if already probed), negative on error.
>   */
>  int local_dev_probe(const char *devargs, struct rte_device **new_dev);
> 
> diff --git a/lib/librte_eal/common/hotplug_mp.c
> b/lib/librte_eal/common/hotplug_mp.c
> index 69e9a16d6a..9f8ef28a3b 100644
> --- a/lib/librte_eal/common/hotplug_mp.c
> +++ b/lib/librte_eal/common/hotplug_mp.c
> @@ -90,13 +90,15 @@ __handle_secondary_request(void *param)
>  	struct rte_devargs da;
>  	struct rte_device *dev;
>  	struct rte_bus *bus;
> +	bool already_probed = false;
>  	int ret = 0;
> 
>  	tmp_req = *req;
> 
>  	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
>  		ret = local_dev_probe(req->devargs, &dev);
> -		if (ret != 0 && ret != -EEXIST) {
> +		already_probed = (ret == -EEXIST || ret == EEXIST);
> +		if (ret < 0 && !already_probed) {
>  			RTE_LOG(ERR, EAL, "Failed to hotplug add device on primary\n");
>  			goto finish;
>  		}
> @@ -159,7 +161,7 @@ __handle_secondary_request(void *param)
>  	goto finish;
> 
>  rollback:
> -	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
> +	if (req->t == EAL_DEV_REQ_TYPE_ATTACH && !already_probed) {
>  		tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
>  		eal_dev_hotplug_request_to_secondary(&tmp_req);
>  		local_dev_remove(dev);
> @@ -238,6 +240,8 @@ static void __handle_primary_request(void *param)
>  	case EAL_DEV_REQ_TYPE_ATTACH:
>  	case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
>  		ret = local_dev_probe(req->devargs, &dev);
> +		if (ret > 0)
> +			ret = 0; /* return only errors */
>  		break;
>  	case EAL_DEV_REQ_TYPE_DETACH:
>  	case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device
  2019-03-13 13:46   ` Zhang, Qi Z
@ 2023-06-11  2:52     ` Stephen Hemminger
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2023-06-11  2:52 UTC (permalink / raw)
  To: Zhang, Qi Z; +Cc: Thomas Monjalon, dev, Stojaczyk, Dariusz, stable

On Wed, 13 Mar 2019 13:46:01 +0000
"Zhang, Qi Z" <qi.z.zhang@intel.com> wrote:

> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Saturday, March 2, 2019 10:43 AM
> > To: dev@dpdk.org
> > Cc: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; stable@dpdk.org
> > Subject: [PATCH 1/3] eal: remove useless checks for already probed device
> > 
> > The function eal_dev_hotplug_request_to_secondary() never returns -EEXIST
> > result. The case of already probed device is filtered out.
> > 
> > The test in __handle_secondary_request() was always true.
> > The test in rte_dev_probe() was never true, and that's fine not returning -EEXIST
> > if device is already attached in secondary processes.  
> 
> I didn't get this.
> eal_dev_hotplug_request_to_secondary() never return -EEXIST, but req->result could be -EEXIST.
> 
> This happens when secondary try to attach an already attached device 
> (__handle_primary_request --> local_dev_probe --> dev->bus->plug )

This seems to be the outstanding question on this old patch.
Is this possible? If so then the original code is ok, and patch is not required.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe
  2019-03-02  2:42 [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Thomas Monjalon
                   ` (2 preceding siblings ...)
  2019-03-02  2:42 ` [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling Thomas Monjalon
@ 2023-06-14 19:39 ` Stephen Hemminger
  3 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2023-06-14 19:39 UTC (permalink / raw)
  To: dev

On Sat,  2 Mar 2019 03:42:50 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> While working on multi-process support of failsafe with Raslan,
> some issues have been discovered on probing failures.
> 
> Thomas Monjalon (3):
>   eal: remove useless checks for already probed device
>   eal: remove error logs for already probed device
>   eal: fix multi-process probe failure handling
> 
>  lib/librte_eal/common/eal_common_dev.c | 31 ++++++++++----------------
>  lib/librte_eal/common/eal_private.h    |  2 +-
>  lib/librte_eal/common/hotplug_mp.c     | 14 +++++++-----
>  3 files changed, 21 insertions(+), 26 deletions(-)
> 

Is this old patch set relevant. Seems like it addressing a
"this should never happen but if it does don't be stupid" kind of scenario.

Marking it as "Changes Requested" since rebase and retest would be needed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-14 19:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-02  2:42 [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Thomas Monjalon
2019-03-02  2:42 ` [dpdk-dev] [PATCH 1/3] eal: remove useless checks for already probed device Thomas Monjalon
2019-03-13 13:46   ` Zhang, Qi Z
2023-06-11  2:52     ` Stephen Hemminger
2019-03-02  2:42 ` [dpdk-dev] [PATCH 2/3] eal: remove error logs " Thomas Monjalon
2019-03-02  2:42 ` [dpdk-dev] [PATCH 3/3] eal: fix multi-process probe failure handling Thomas Monjalon
2019-03-14  7:15   ` Zhang, Qi Z
2019-03-14  7:15     ` Zhang, Qi Z
2023-06-14 19:39 ` [dpdk-dev] [PATCH 0/3] fix error path of multi-process probe Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).