DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] eal/alarm_cancel: Fix thread starvation
@ 2024-09-25 19:42 Wojciech Panfil
  2024-09-28 16:40 ` Stephen Hemminger
  2024-10-04 12:00 ` David Marchand
  0 siblings, 2 replies; 6+ messages in thread
From: Wojciech Panfil @ 2024-09-25 19:42 UTC (permalink / raw)
  To: bruce.richardson
  Cc: pallavi.kadam, dev, jacek.kalwas, konrad.sztyber, dmitry.kozliuk,
	roretzla, wojciech.panfil

Issue:
Two threads:

- A, executing rte_eal_alarm_cancel,
- B, executing eal_alarm_callback.

Such case can cause starvation of thread B. Please see that there is a
small time window between lock and unlock in thread A, so thread B must
be switched to within a very small time window, so that it can obtain
the lock.

Solution to this problem is use sched_yield(), which puts current thread
(A) at the end of thread execution priority queue and allows thread B to
execute.

The issue can be observed e.g. on hot-pluggable device detach path.
On such path, rte_alarm can used to check if DPDK has completed
the detachment. Waiting for completion, rte_eal_alarm_cancel
is called, while another thread periodically calls eal_alarm_callback
causing the issue to occur.

Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>
---
 lib/eal/freebsd/eal_alarm.c | 6 ++++++
 lib/eal/linux/eal_alarm.c   | 6 ++++++
 lib/eal/windows/eal_alarm.c | 5 +++++
 3 files changed, 17 insertions(+)

diff --git a/lib/eal/freebsd/eal_alarm.c b/lib/eal/freebsd/eal_alarm.c
index 94cae5f4b6..3680f5caba 100644
--- a/lib/eal/freebsd/eal_alarm.c
+++ b/lib/eal/freebsd/eal_alarm.c
@@ -318,7 +318,13 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/linux/eal_alarm.c b/lib/eal/linux/eal_alarm.c
index eeb096213b..9fe14ade63 100644
--- a/lib/eal/linux/eal_alarm.c
+++ b/lib/eal/linux/eal_alarm.c
@@ -248,7 +248,13 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index 052af4b21b..9ad530dd31 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -211,6 +211,11 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 		}
 
 		rte_spinlock_unlock(&alarm_lock);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		SwitchToThread();
 	} while (executing);
 
 	rte_eal_trace_alarm_cancel(cb_fn, cb_arg, removed);
-- 
2.46.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eal/alarm_cancel: Fix thread starvation
  2024-09-25 19:42 [PATCH] eal/alarm_cancel: Fix thread starvation Wojciech Panfil
@ 2024-09-28 16:40 ` Stephen Hemminger
  2024-10-04 12:00 ` David Marchand
  1 sibling, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2024-09-28 16:40 UTC (permalink / raw)
  To: Wojciech Panfil
  Cc: bruce.richardson, pallavi.kadam, dev, jacek.kalwas,
	konrad.sztyber, dmitry.kozliuk, roretzla

On Wed, 25 Sep 2024 21:42:06 +0200
Wojciech Panfil <wojciech.panfil@intel.com> wrote:

> Issue:
> Two threads:
> 
> - A, executing rte_eal_alarm_cancel,
> - B, executing eal_alarm_callback.
> 
> Such case can cause starvation of thread B. Please see that there is a
> small time window between lock and unlock in thread A, so thread B must
> be switched to within a very small time window, so that it can obtain
> the lock.
> 
> Solution to this problem is use sched_yield(), which puts current thread
> (A) at the end of thread execution priority queue and allows thread B to
> execute.
> 
> The issue can be observed e.g. on hot-pluggable device detach path.
> On such path, rte_alarm can used to check if DPDK has completed
> the detachment. Waiting for completion, rte_eal_alarm_cancel
> is called, while another thread periodically calls eal_alarm_callback
> causing the issue to occur.
> 
> Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>

It would be good to get test for this in the DPDK functional test.
See: https://patchwork.dpdk.org/project/dpdk/patch/20240809152540.9568-4-stephen@networkplumber.org/

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eal/alarm_cancel: Fix thread starvation
  2024-09-25 19:42 [PATCH] eal/alarm_cancel: Fix thread starvation Wojciech Panfil
  2024-09-28 16:40 ` Stephen Hemminger
@ 2024-10-04 12:00 ` David Marchand
  1 sibling, 0 replies; 6+ messages in thread
From: David Marchand @ 2024-10-04 12:00 UTC (permalink / raw)
  To: Wojciech Panfil
  Cc: bruce.richardson, pallavi.kadam, dev, jacek.kalwas,
	konrad.sztyber, dmitry.kozliuk, roretzla, Stephen Hemminger

On Wed, Sep 25, 2024 at 9:34 PM Wojciech Panfil
<wojciech.panfil@intel.com> wrote:
>
> Issue:
> Two threads:
>
> - A, executing rte_eal_alarm_cancel,
> - B, executing eal_alarm_callback.
>
> Such case can cause starvation of thread B. Please see that there is a
> small time window between lock and unlock in thread A, so thread B must
> be switched to within a very small time window, so that it can obtain
> the lock.
>
> Solution to this problem is use sched_yield(), which puts current thread
> (A) at the end of thread execution priority queue and allows thread B to
> execute.
>
> The issue can be observed e.g. on hot-pluggable device detach path.
> On such path, rte_alarm can used to check if DPDK has completed
> the detachment. Waiting for completion, rte_eal_alarm_cancel
> is called, while another thread periodically calls eal_alarm_callback
> causing the issue to occur.
>
> Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

Applied, thanks (and welcome) Wojciech.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eal/alarm_cancel: Fix thread starvation
  2024-09-18 11:39 Wojciech Panfil
@ 2024-09-18 16:06 ` Stephen Hemminger
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2024-09-18 16:06 UTC (permalink / raw)
  To: Wojciech Panfil
  Cc: bruce.richardson, pallavi.kadam, dev, jacek.kalwas,
	konrad.sztyber, dmitry.kozliuk, roretzla

On Wed, 18 Sep 2024 13:39:06 +0200
Wojciech Panfil <wojciech.panfil@intel.com> wrote:

> Issue:
> Two threads:
> 
> - A, executing rte_eal_alarm_cancel,
> - B, executing eal_alarm_callback.
> 
> Such case can cause starvation of thread B. Please see that there is a
> small time window between lock and unlock in thread A, so thread B must
> be switched to within a very small time window, so that it can obtain
> the lock.
> 
> Solution to this problem is use sched_yield(), which puts current thread
> (A) at the end of thread execution priority queue and allows thread B to
> execute.
> 
> The issue can be observed e.g. on hot-pluggable device detach path.
> On such path, rte_alarm can used to check if DPDK has completed
> the detachment. Waiting for completion, rte_eal_alarm_cancel
> is called, while another thread periodically calls eal_alarm_callback
> causing the issue to occur.
> 
> Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>

Make sense. Alarm is non-EAL thread, and so is hotplug.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

Does the timer_stop code have similar issues?
Probably only if users do unexpected things like
map multiple logical lcores to same CPU.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] eal/alarm_cancel: Fix thread starvation
@ 2024-09-18 11:39 Wojciech Panfil
  2024-09-18 16:06 ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: Wojciech Panfil @ 2024-09-18 11:39 UTC (permalink / raw)
  To: bruce.richardson
  Cc: pallavi.kadam, dev, jacek.kalwas, konrad.sztyber, dmitry.kozliuk,
	roretzla, wojciech.panfil

Issue:
Two threads:

- A, executing rte_eal_alarm_cancel,
- B, executing eal_alarm_callback.

Such case can cause starvation of thread B. Please see that there is a
small time window between lock and unlock in thread A, so thread B must
be switched to within a very small time window, so that it can obtain
the lock.

Solution to this problem is use sched_yield(), which puts current thread
(A) at the end of thread execution priority queue and allows thread B to
execute.

The issue can be observed e.g. on hot-pluggable device detach path.
On such path, rte_alarm can used to check if DPDK has completed
the detachment. Waiting for completion, rte_eal_alarm_cancel
is called, while another thread periodically calls eal_alarm_callback
causing the issue to occur.

Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>
---
 lib/eal/freebsd/eal_alarm.c | 6 ++++++
 lib/eal/linux/eal_alarm.c   | 6 ++++++
 lib/eal/windows/eal_alarm.c | 5 +++++
 3 files changed, 17 insertions(+)

diff --git a/lib/eal/freebsd/eal_alarm.c b/lib/eal/freebsd/eal_alarm.c
index 94cae5f4b6..3680f5caba 100644
--- a/lib/eal/freebsd/eal_alarm.c
+++ b/lib/eal/freebsd/eal_alarm.c
@@ -318,7 +318,13 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/linux/eal_alarm.c b/lib/eal/linux/eal_alarm.c
index eeb096213b..9fe14ade63 100644
--- a/lib/eal/linux/eal_alarm.c
+++ b/lib/eal/linux/eal_alarm.c
@@ -248,7 +248,13 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index 052af4b21b..9ad530dd31 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -211,6 +211,11 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 		}
 
 		rte_spinlock_unlock(&alarm_lock);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid
+		 * its starvation, as it is waiting for the lock we have just released.
+		 */
+		SwitchToThread();
 	} while (executing);
 
 	rte_eal_trace_alarm_cancel(cb_fn, cb_arg, removed);
-- 
2.46.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] eal/alarm_cancel: Fix thread starvation
@ 2024-09-18  7:02 Wojciech Panfil
  0 siblings, 0 replies; 6+ messages in thread
From: Wojciech Panfil @ 2024-09-18  7:02 UTC (permalink / raw)
  To: bruce.richardson
  Cc: pallavi.kadam, dev, jacek.kalwas, konrad.sztyber, dmitry.kozliuk,
	roretzla, wojciech.panfil

Issue:
Two threads:

- A, executing rte_eal_alarm_cancel,
- B, executing eal_alarm_callback.

Such case can cause starvation of thread B. Please see that there is a
small time window between lock and unlock in thread A, so thread B must
be switched to within a very small time window, so that it can obtain
the lock.

Solution to this problem is use sched_yield(), which puts current thread
(A) at the end of thread execution priority queue and allows thread B to
execute.

The issue can be observed e.g. on hot-pluggable device detach path.
On such path, rte_alarm can used to check if DPDK has completed
the detachment. Waiting for completion, rte_eal_alarm_cancel
is called, while another thread periodically calls eal_alarm_callback
causing the issue to occur.

Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>
---
 lib/eal/freebsd/eal_alarm.c | 5 +++++
 lib/eal/linux/eal_alarm.c   | 5 +++++
 lib/eal/windows/eal_alarm.c | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/lib/eal/freebsd/eal_alarm.c b/lib/eal/freebsd/eal_alarm.c
index 94cae5f4b6..8425b4f5a2 100644
--- a/lib/eal/freebsd/eal_alarm.c
+++ b/lib/eal/freebsd/eal_alarm.c
@@ -318,7 +318,12 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid its starvation,
+		 * as it is waiting for the lock we have just released. */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/linux/eal_alarm.c b/lib/eal/linux/eal_alarm.c
index eeb096213b..5326b1895f 100644
--- a/lib/eal/linux/eal_alarm.c
+++ b/lib/eal/linux/eal_alarm.c
@@ -248,7 +248,12 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
 		rte_spinlock_unlock(&alarm_list_lk);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid its starvation,
+		 * as it is waiting for the lock we have just released. */
+		sched_yield();
 	} while (executing != 0);
 
 	if (count == 0 && err == 0)
diff --git a/lib/eal/windows/eal_alarm.c b/lib/eal/windows/eal_alarm.c
index 052af4b21b..43e8d7881f 100644
--- a/lib/eal/windows/eal_alarm.c
+++ b/lib/eal/windows/eal_alarm.c
@@ -211,6 +211,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 		}
 
 		rte_spinlock_unlock(&alarm_lock);
+
+		/* Yield control to a second thread executing eal_alarm_callback to avoid its starvation,
+		 * as it is waiting for the lock we have just released. */
+		SwitchToThread();
 	} while (executing);
 
 	rte_eal_trace_alarm_cancel(cb_fn, cb_arg, removed);
-- 
2.46.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-04 12:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-25 19:42 [PATCH] eal/alarm_cancel: Fix thread starvation Wojciech Panfil
2024-09-28 16:40 ` Stephen Hemminger
2024-10-04 12:00 ` David Marchand
  -- strict thread matches above, loose matches on Subject: below --
2024-09-18 11:39 Wojciech Panfil
2024-09-18 16:06 ` Stephen Hemminger
2024-09-18  7:02 Wojciech Panfil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).