[PATCH v2 2/5] eal: fix multiprocess hotplug race

patches for DPDK stable branches
 help / color / mirror / Atom feed

* [PATCH v2 2/5] eal: fix multiprocess hotplug race
       [not found] ` <20240307070113.29580-1-artemyko@nvidia.com>
@ 2024-03-07  7:01   ` Artemy Kovalyov
  2024-03-13 16:05     ` Burakov, Anatoly
  2024-03-07  7:01   ` [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss Artemy Kovalyov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Artemy Kovalyov @ 2024-03-07  7:01 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, stable, Qi Zhang, Anatoly Burakov

There exists a time gap between the creation of the multiprocess channel
and the registration of request action handlers. Within this window, a
secondary process that receives an eal_dev_mp_request broadcast
notification might respond with ENOTSUP. This, in turn, causes the
rte_dev_probe() operation to fail in another secondary process.
To avoid this, disregarding ENOTSUP responses to attach notifications.

Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
 lib/eal/common/hotplug_mp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/eal/common/hotplug_mp.c b/lib/eal/common/hotplug_mp.c
index 6027819..e6a3f6b 100644
--- a/lib/eal/common/hotplug_mp.c
+++ b/lib/eal/common/hotplug_mp.c
@@ -428,6 +428,9 @@ int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
 			if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
 				resp->result == -EEXIST)
 				continue;
+			if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+				resp->result == -ENOTSUP)
+				continue;
 			if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
 				resp->result == -ENOENT)
 				continue;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss
       [not found] ` <20240307070113.29580-1-artemyko@nvidia.com>
  2024-03-07  7:01   ` [PATCH v2 2/5] eal: fix multiprocess hotplug race Artemy Kovalyov
@ 2024-03-07  7:01   ` Artemy Kovalyov
  2024-03-13 16:06     ` Burakov, Anatoly
  2024-03-07  7:01   ` [PATCH v2 4/5] eal: fix first time primary autodetect Artemy Kovalyov
  2024-03-07  7:01   ` [PATCH v2 5/5] eal: fix memzone fbarray cleanup Artemy Kovalyov
  3 siblings, 1 reply; 8+ messages in thread
From: Artemy Kovalyov @ 2024-03-07  7:01 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, stable, Anatoly Burakov, David Marchand,
	Maxime Coquelin

This commit addresses an issue related to the cleanup of the
multiprocess channel. Previously, when closing the channel, there was a
risk of losing trailing messages. This issue was particularly noticeable
when broadcast message from primary to secondary processes was sent
while a secondary process was closing it's mp channel. In this fix, we
delete mp socket file before stopping mp receive thread.

Fixes: e7885281ded1 ("ipc: stop mp control thread on cleanup")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
 lib/eal/common/eal_common_proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 728815c..d34fdda 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -593,7 +593,7 @@ enum async_action {
 }
 
 static void
-close_socket_fd(int fd)
+remove_socket_fd(int fd)
 {
 	char path[PATH_MAX];
 
@@ -672,9 +672,9 @@ enum async_action {
 	if (fd < 0)
 		return;
 
+	remove_socket_fd(fd);
 	pthread_cancel((pthread_t)mp_handle_tid.opaque_id);
 	rte_thread_join(mp_handle_tid, NULL);
-	close_socket_fd(fd);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 4/5] eal: fix first time primary autodetect
       [not found] ` <20240307070113.29580-1-artemyko@nvidia.com>
  2024-03-07  7:01   ` [PATCH v2 2/5] eal: fix multiprocess hotplug race Artemy Kovalyov
  2024-03-07  7:01   ` [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss Artemy Kovalyov
@ 2024-03-07  7:01   ` Artemy Kovalyov
  2024-03-13 16:06     ` Burakov, Anatoly
  2024-03-07  7:01   ` [PATCH v2 5/5] eal: fix memzone fbarray cleanup Artemy Kovalyov
  3 siblings, 1 reply; 8+ messages in thread
From: Artemy Kovalyov @ 2024-03-07  7:01 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, stable

If the configuration file is absent, the autodetection function should
generate and secure it. Otherwise, multiple simultaneous openings could
erroneously identify themselves as primary instances.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
 lib/eal/linux/eal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 57da058..9b59cec 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -360,7 +360,7 @@ enum rte_proc_type_t
 		 * keep that open and don't close it to prevent a race condition
 		 * between multiple opens.
 		 */
-		if (((mem_cfg_fd = open(pathname, O_RDWR)) >= 0) &&
+		if (((mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0600)) >= 0) &&
 				(fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0))
 			ptype = RTE_PROC_SECONDARY;
 	}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 5/5] eal: fix memzone fbarray cleanup
       [not found] ` <20240307070113.29580-1-artemyko@nvidia.com>
                     ` (2 preceding siblings ...)
  2024-03-07  7:01   ` [PATCH v2 4/5] eal: fix first time primary autodetect Artemy Kovalyov
@ 2024-03-07  7:01   ` Artemy Kovalyov
  2024-03-13 16:17     ` Burakov, Anatoly
  3 siblings, 1 reply; 8+ messages in thread
From: Artemy Kovalyov @ 2024-03-07  7:01 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, stable, Anatoly Burakov

The initialization of the Memzone file-backed array ensures its
uniqueness by employing an exclusive lock. This is crucial because only
one primary process can exist per specific shm_id, which is further
protected by the exclusive EAL runtime configuration lock.

However, during the process closure, the exclusive lock on both the
fbarray and the configuration is not explicitly released. The
responsibility of releasing these locks is left to the generic quit
procedure. This can lead to a potential race condition when the
configuration is released before the fbarray.

To address this, we propose explicitly closing the memzone fbarray. This
ensures proper order of operations during process closure and prevents
any potential race conditions arising from the mismatched lock release
timings.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
 lib/eal/common/eal_common_memzone.c | 12 ++++++++++++
 lib/eal/common/eal_private.h        |  5 +++++
 lib/eal/linux/eal.c                 |  1 +
 3 files changed, 18 insertions(+)

diff --git a/lib/eal/common/eal_common_memzone.c b/lib/eal/common/eal_common_memzone.c
index 1f3e701..7db8029 100644
--- a/lib/eal/common/eal_common_memzone.c
+++ b/lib/eal/common/eal_common_memzone.c
@@ -447,6 +447,18 @@
 	return ret;
 }
 
+void
+rte_eal_memzone_cleanup(void)
+{
+	struct rte_mem_config *mcfg;
+
+	mcfg = rte_eal_get_configuration()->mem_config;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		rte_fbarray_destroy(&mcfg->memzones);
+	}
+}
+
 /* Walk all reserved memory zones */
 void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
 		      void *arg)
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 4d2e806..944c365 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -81,6 +81,11 @@ struct rte_config {
 int rte_eal_memzone_init(void);
 
 /**
+ * Cleanup the memzone subsystem (private to eal).
+ */
+void rte_eal_memzone_cleanup(void);
+
+/**
  * Fill configuration with number of physical and logical processors
  *
  * This function is private to EAL.
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 9b59cec..dfcbe64 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1375,6 +1375,7 @@ static void rte_eal_init_alert(const char *msg)
 	eal_trace_fini();
 	eal_mp_dev_hotplug_cleanup();
 	rte_eal_alarm_cleanup();
+	rte_eal_memzone_cleanup();
 	/* after this point, any DPDK pointers will become dangling */
 	rte_eal_memory_detach();
 	rte_eal_malloc_heap_cleanup();
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/5] eal: fix multiprocess hotplug race
  2024-03-07  7:01   ` [PATCH v2 2/5] eal: fix multiprocess hotplug race Artemy Kovalyov
@ 2024-03-13 16:05     ` Burakov, Anatoly
  0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2024-03-13 16:05 UTC (permalink / raw)
  To: Artemy Kovalyov, dev; +Cc: Thomas Monjalon, stable, Qi Zhang

On 3/7/2024 8:01 AM, Artemy Kovalyov wrote:
> There exists a time gap between the creation of the multiprocess channel
> and the registration of request action handlers. Within this window, a
> secondary process that receives an eal_dev_mp_request broadcast
> notification might respond with ENOTSUP. This, in turn, causes the
> rte_dev_probe() operation to fail in another secondary process.
> To avoid this, disregarding ENOTSUP responses to attach notifications.
> 
> Fixes: 244d5130719c ("eal: enable hotplug on multi-process")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss
  2024-03-07  7:01   ` [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss Artemy Kovalyov
@ 2024-03-13 16:06     ` Burakov, Anatoly
  0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2024-03-13 16:06 UTC (permalink / raw)
  To: Artemy Kovalyov, dev
  Cc: Thomas Monjalon, stable, David Marchand, Maxime Coquelin

On 3/7/2024 8:01 AM, Artemy Kovalyov wrote:
> This commit addresses an issue related to the cleanup of the
> multiprocess channel. Previously, when closing the channel, there was a
> risk of losing trailing messages. This issue was particularly noticeable
> when broadcast message from primary to secondary processes was sent
> while a secondary process was closing it's mp channel. In this fix, we
> delete mp socket file before stopping mp receive thread.
> 
> Fixes: e7885281ded1 ("ipc: stop mp control thread on cleanup")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/5] eal: fix first time primary autodetect
  2024-03-07  7:01   ` [PATCH v2 4/5] eal: fix first time primary autodetect Artemy Kovalyov
@ 2024-03-13 16:06     ` Burakov, Anatoly
  0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2024-03-13 16:06 UTC (permalink / raw)
  To: Artemy Kovalyov, dev; +Cc: Thomas Monjalon, stable

On 3/7/2024 8:01 AM, Artemy Kovalyov wrote:
> If the configuration file is absent, the autodetection function should
> generate and secure it. Otherwise, multiple simultaneous openings could
> erroneously identify themselves as primary instances.
> 
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 5/5] eal: fix memzone fbarray cleanup
  2024-03-07  7:01   ` [PATCH v2 5/5] eal: fix memzone fbarray cleanup Artemy Kovalyov
@ 2024-03-13 16:17     ` Burakov, Anatoly
  0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2024-03-13 16:17 UTC (permalink / raw)
  To: Artemy Kovalyov, dev; +Cc: Thomas Monjalon, stable

On 3/7/2024 8:01 AM, Artemy Kovalyov wrote:
> The initialization of the Memzone file-backed array ensures its
> uniqueness by employing an exclusive lock. This is crucial because only
> one primary process can exist per specific shm_id, which is further
> protected by the exclusive EAL runtime configuration lock.
> 

I think you meant to say "prefix", not "shm_id".

> However, during the process closure, the exclusive lock on both the
> fbarray and the configuration is not explicitly released. The
> responsibility of releasing these locks is left to the generic quit
> procedure. This can lead to a potential race condition when the
> configuration is released before the fbarray.
> 
> To address this, we propose explicitly closing the memzone fbarray. This
> ensures proper order of operations during process closure and prevents
> any potential race conditions arising from the mismatched lock release
> timings.
> 
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
> ---

I would suggest having a different Fixes: ID, because fbarrays were only 
added in 18.05 when we added dynamic memory support. I propose using 
this commit ID instead:

49df3db84883 ("memzone: replace memzone array with fbarray")

This is the first commit where memzones used fbarrays.

> +void
> +rte_eal_memzone_cleanup(void)
> +{
> +	struct rte_mem_config *mcfg;
> +
> +	mcfg = rte_eal_get_configuration()->mem_config;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		rte_fbarray_destroy(&mcfg->memzones);
> +	}

Nitpick: extraneous brackets, this is a one liner so they're not needed.

With changes above,

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-13 16:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20231212042517.164353-2-artemyko@nvidia.com>
     [not found] ` <20240307070113.29580-1-artemyko@nvidia.com>
2024-03-07  7:01   ` [PATCH v2 2/5] eal: fix multiprocess hotplug race Artemy Kovalyov
2024-03-13 16:05     ` Burakov, Anatoly
2024-03-07  7:01   ` [PATCH v2 3/5] ipc: fix mp channel closure to prevent message loss Artemy Kovalyov
2024-03-13 16:06     ` Burakov, Anatoly
2024-03-07  7:01   ` [PATCH v2 4/5] eal: fix first time primary autodetect Artemy Kovalyov
2024-03-13 16:06     ` Burakov, Anatoly
2024-03-07  7:01   ` [PATCH v2 5/5] eal: fix memzone fbarray cleanup Artemy Kovalyov
2024-03-13 16:17     ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).