patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Kevin Traynor <ktraynor@redhat.com>
To: Wathsala Vithanage <wathsala.vithanage@arm.com>
Cc: Ola Liljedahl <ola.liljedahl@arm.com>, dpdk stable <stable@dpdk.org>
Subject: patch 'mcslock: fix memory ordering' has been queued to stable release 24.11.4
Date: Fri, 21 Nov 2025 11:20:48 +0000	[thread overview]
Message-ID: <20251121112128.485623-64-ktraynor@redhat.com> (raw)
In-Reply-To: <20251121112128.485623-1-ktraynor@redhat.com>

Hi,

FYI, your patch has been queued to stable release 24.11.4

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 11/26/25. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Queued patches are on a temporary branch at:
https://github.com/kevintraynor/dpdk-stable

This queued commit can be viewed at:
https://github.com/kevintraynor/dpdk-stable/commit/c9e3afaa776e94a0d8671bc547806fdc1e657269

Thanks.

Kevin

---
From c9e3afaa776e94a0d8671bc547806fdc1e657269 Mon Sep 17 00:00:00 2001
From: Wathsala Vithanage <wathsala.vithanage@arm.com>
Date: Tue, 11 Nov 2025 17:38:50 +0000
Subject: [PATCH] mcslock: fix memory ordering
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[ upstream commit 8357af1cb3a359810bd56eab78ed104495c8094f ]

Fix incorrect memory ordering in the MCS lock implementation by
adding proper synchronizing edges to establish clear happens-before
relationships between threads invoking lock() and unlock(). These
synchronizing edges prevent potential deadlocks caused by improper
ordering and are documented in detail through in-code comments.

The previously relaxed load of the successor’s lock object pointer
in unlock() has been upgraded to a load-acquire operation. This
change ensures that the successor’s initialization does not
overwrite the current lock holder’s update to the locked field,
which could otherwise lead to deadlocks.

Remove two unnecessary fences:

The acquire fence in unlock() had no matching release fence, making
it ineffective for enforcing memory order. The associated comment
suggested it prevented speculative reordering, but such fences (data
memory barriers) only establish memory ordering and do not control
instruction speculation.

The release-acquire fence pair in lock() was previously justified as
preventing reordering between the load-acquire loop of me->locked
and the store-release of prev->next. This is no longer needed, as the
new synchronizing edges ensure a chain of happens-before
relationships between memory operations of threads calling lock() and
unlock().

Fixes: 2173f3333b61d ("mcslock: add MCS queued lock implementation")

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/eal/include/rte_mcslock.h | 100 ++++++++++++++++++++++------------
 1 file changed, 64 insertions(+), 36 deletions(-)

diff --git a/lib/eal/include/rte_mcslock.h b/lib/eal/include/rte_mcslock.h
index bb218d2e50..0af7a94a06 100644
--- a/lib/eal/include/rte_mcslock.h
+++ b/lib/eal/include/rte_mcslock.h
@@ -58,9 +58,19 @@ rte_mcslock_lock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me)
 	rte_atomic_store_explicit(&me->next, NULL, rte_memory_order_relaxed);
 
-	/* If the queue is empty, the exchange operation is enough to acquire
-	 * the lock. Hence, the exchange operation requires acquire semantics.
-	 * The store to me->next above should complete before the node is
-	 * visible to other CPUs/threads. Hence, the exchange operation requires
-	 * release semantics as well.
+	/*
+	 * A0/R0: Queue might be empty, perform the exchange (RMW) with both acquire and
+	 * release semantics:
+	 * A0: Acquire — synchronizes with both R0 and R2.
+	 *     Must synchronize with R0 to ensure that this thread observes predecessor's
+	 *     initialization of its lock object or risk them overwriting this thread's
+	 *     update to the next of the same object via store to prev->next.
+	 *
+	 *     Must synchronize with R2 the releasing CAS in unlock(), this will ensure
+	 *     that all prior critical-section writes become visible to this thread.
+	 *
+	 * R0: Release — ensures the successor observes our initialization of me->next;
+	 *     without it, me->next could be overwritten to NULL after the successor
+	 *     sets its own address, causing deadlock. This release synchronizes with
+	 *     A0 above.
 	 */
 	prev = rte_atomic_exchange_explicit(msl, me, rte_memory_order_acq_rel);
@@ -71,22 +81,24 @@ rte_mcslock_lock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me)
 		return;
 	}
-	/* The store to me->next above should also complete before the node is
-	 * visible to predecessor thread releasing the lock. Hence, the store
-	 * prev->next also requires release semantics. Note that, for example,
-	 * on ARM, the release semantics in the exchange operation is not
-	 * strong as a release fence and is not sufficient to enforce the
-	 * desired order here.
+
+	/*
+	 * R1: With the relaxed memory model of C/C++, it's essential that after
+	 * we link ourselves by storing prev->next = me, the owner of prev must
+	 * observe our prior initialization of me->locked. Otherwise it could
+	 * clear me->locked before we set it to 1, which may deadlock.
+	 * Perform a releasing store so the predecessor's acquire loads A2 and A3
+	 * observes our initialization, establishing a happens-before from those
+	 * writes.
 	 */
 	rte_atomic_store_explicit(&prev->next, me, rte_memory_order_release);
 
-	/* The while-load of me->locked should not move above the previous
-	 * store to prev->next. Otherwise it will cause a deadlock. Need a
-	 * store-load barrier.
-	 */
-	rte_atomic_thread_fence(rte_memory_order_acq_rel);
-	/* If the lock has already been acquired, it first atomically
+	/*
+	 * A1: If the lock has already been acquired, it first atomically
 	 * places the node at the end of the queue and then proceeds
 	 * to spin on me->locked until the previous lock holder resets
-	 * the me->locked using mcslock_unlock().
+	 * the me->locked in rte_mcslock_unlock().
+	 * This load must synchronize with store-release R3 to ensure that
+	 * all updates to critical section by previous lock holder is visible
+	 * to this thread after acquiring the lock.
 	 */
 	rte_wait_until_equal_32((uint32_t *)(uintptr_t)&me->locked, 0, rte_memory_order_acquire);
@@ -104,28 +116,44 @@ static inline void
 rte_mcslock_unlock(RTE_ATOMIC(rte_mcslock_t *) *msl, RTE_ATOMIC(rte_mcslock_t *) me)
 {
-	/* Check if there are more nodes in the queue. */
-	if (likely(rte_atomic_load_explicit(&me->next, rte_memory_order_relaxed) == NULL)) {
+	/*
+	 * A2: Check whether a successor is already queued.
+	 * Load me->next with acquire semantics so it can synchronize with the
+	 * successor’s release store R1. This guarantees that the successor’s
+	 * initialization of its lock object (me) is completed before we observe
+	 * it here, preventing a race between this thread’s store-release to
+	 * me->next->locked and the successor’s store to me->locked.
+	 */
+	if (likely(rte_atomic_load_explicit(&me->next, rte_memory_order_acquire) == NULL)) {
 		/* No, last member in the queue. */
-		rte_mcslock_t *save_me = rte_atomic_load_explicit(&me, rte_memory_order_relaxed);
+		rte_mcslock_t *save_me = me;
 
-		/* Release the lock by setting it to NULL */
+		/*
+		 * R2: Try to release the lock by swinging *msl from save_me to NULL.
+		 * Use release semantics so all critical section writes become
+		 * visible to the next lock acquirer.
+		 */
 		if (likely(rte_atomic_compare_exchange_strong_explicit(msl, &save_me, NULL,
 				rte_memory_order_release, rte_memory_order_relaxed)))
 			return;
 
-		/* Speculative execution would be allowed to read in the
-		 * while-loop first. This has the potential to cause a
-		 * deadlock. Need a load barrier.
-		 */
-		rte_atomic_thread_fence(rte_memory_order_acquire);
-		/* More nodes added to the queue by other CPUs.
-		 * Wait until the next pointer is set.
+		/*
+		 * A3: Another thread was enqueued concurrently, so the CAS and the lock
+		 * release failed. Wait until the successor sets our 'next' pointer.
+		 * This load must synchronize with the successor’s release store (R1) to
+		 * ensure that the successor’s initialization completes before we observe
+		 * it here. This ordering prevents a race between this thread’s later
+		 * store-release to me->next->locked and the successor’s store to me->locked.
 		 */
 		RTE_ATOMIC(uintptr_t) *next;
 		next = (__rte_atomic uintptr_t *)&me->next;
-		RTE_WAIT_UNTIL_MASKED(next, UINTPTR_MAX, !=, 0, rte_memory_order_relaxed);
+		RTE_WAIT_UNTIL_MASKED(next, UINTPTR_MAX, !=, 0, rte_memory_order_acquire);
 	}
 
-	/* Pass lock to next waiter. */
+	/*
+	 * R3: Pass the lock to the successor.
+	 * Use a release store to synchronize with A1 when clearing me->next->locked
+	 * so the successor observes our critical section writes after it sees locked
+	 * become 0.
+	 */
 	rte_atomic_store_explicit(&me->next->locked, 0, rte_memory_order_release);
 }
@@ -150,9 +178,9 @@ rte_mcslock_trylock(RTE_ATOMIC(rte_mcslock_t *) *msl, rte_mcslock_t *me)
 	rte_mcslock_t *expected = NULL;
 
-	/* The lock can be taken only when the queue is empty. Hence,
-	 * the compare-exchange operation requires acquire semantics.
-	 * The store to me->next above should complete before the node
-	 * is visible to other CPUs/threads. Hence, the compare-exchange
-	 * operation requires release semantics as well.
+	/*
+	 * A4/R4: The lock can be acquired only when the queue is empty.
+	 * The compare-and-exchange operation must use acquire and release
+	 * semantics for the same reasons described in the rte_mcslock_lock()
+	 * function’s empty-queue case (see A0/R0 for details).
 	 */
 	return rte_atomic_compare_exchange_strong_explicit(msl, &expected, me,
-- 
2.51.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2025-11-21 11:05:11.588297999 +0000
+++ 0064-mcslock-fix-memory-ordering.patch	2025-11-21 11:05:09.540201498 +0000
@@ -1 +1 @@
-From 8357af1cb3a359810bd56eab78ed104495c8094f Mon Sep 17 00:00:00 2001
+From c9e3afaa776e94a0d8671bc547806fdc1e657269 Mon Sep 17 00:00:00 2001
@@ -8,0 +9,2 @@
+[ upstream commit 8357af1cb3a359810bd56eab78ed104495c8094f ]
+
@@ -37 +38,0 @@
-Cc: stable@dpdk.org


  parent reply	other threads:[~2025-11-21 11:23 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21 11:19 patch 'test/hash: check memory allocation' " Kevin Traynor
2025-11-21 11:19 ` patch 'dmadev: fix debug build with tracepoints' " Kevin Traynor
2025-11-21 11:19 ` patch 'bus/cdx: fix device name in probing error message' " Kevin Traynor
2025-11-21 11:19 ` patch 'bus/cdx: fix release in probing for secondary process' " Kevin Traynor
2025-11-21 11:19 ` patch 'buildtools/pmdinfogen: fix warning with python 3.14' " Kevin Traynor
2025-11-21 11:19 ` patch 'net/iavf: fix build with clang 21' " Kevin Traynor
2025-11-21 11:19 ` patch 'test: " Kevin Traynor
2025-11-21 11:19 ` patch 'app/eventdev: " Kevin Traynor
2025-11-21 11:19 ` patch 'eventdev/crypto: " Kevin Traynor
2025-11-21 11:19 ` patch 'rawdev: " Kevin Traynor
2025-11-21 11:19 ` patch 'vdpa/mlx5: remove unused constant' " Kevin Traynor
2025-11-21 11:19 ` patch 'crypto/mlx5: remove unused constants' " Kevin Traynor
2025-11-21 11:19 ` patch 'regex/mlx5: remove useless " Kevin Traynor
2025-11-21 11:19 ` patch 'common/mlx5: " Kevin Traynor
2025-11-21 11:19 ` patch 'net/mlx5: " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: remove unused macros' " Kevin Traynor
2025-11-21 11:20 ` patch 'doc: fix NVIDIA bifurcated driver presentation link' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/dma-perf: fix use after free' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/dma-perf: check buffer size' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/vmxnet3: disable RSS for single queue for ESX8.0+' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/dpaa: fix resource leak' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix checksum error counter' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/ngbe: " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: reduce memory size of ring descriptors' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/ngbe: " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix VF Rx buffer size in config register' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: remove duplicate Tx queue assignment' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: add device arguments for FDIR' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix maximum number of FDIR filters' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix FDIR mode clearing' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix FDIR drop action for L4 match packets' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix FDIR filter for SCTP tunnel' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: filter FDIR match flex bytes for " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix FDIR rule raw relative for L3 packets' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: fix FDIR input mask' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: switch to FDIR when ntuple filter is full' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/txgbe: remove unsupported flow action mark' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/nfp: fix metering cleanup' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/bonding: fix MAC address propagation in 802.3ad mode' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/testpmd: fix DCB Tx port' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/testpmd: fix DCB Rx queues' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/e1000/base: fix crash on init with GCC 13' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/fm10k: fix build with GCC 16' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx4: fix unnecessary comma' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix unnecessary commas' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: store MTU at Rx queue allocation time' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix indirect RSS action hash' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: remove counter alignment' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix external queues access' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix modify field action restriction' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix meter mark allocation' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix indirect meter index leak' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/mlx5: fix error reporting on masked indirect actions' " Kevin Traynor
2025-11-21 11:20 ` patch 'vhost: fix external buffer in VDUSE' " Kevin Traynor
2025-11-21 11:20 ` patch 'net: fix L2 length for GRE packets' " Kevin Traynor
2025-11-21 11:20 ` patch 'graph: fix xstats description allocation' " Kevin Traynor
2025-11-21 11:20 ` patch 'graph: fix updating edge with active graph' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/pdump: remove hard-coded memory channels' " Kevin Traynor
2025-11-21 11:20 ` patch 'pdump: handle primary process exit' " Kevin Traynor
2025-11-21 11:20 ` patch 'telemetry: make socket handler typedef private' " Kevin Traynor
2025-11-21 11:20 ` patch 'usertools/telemetry: fix exporter default IP binding' " Kevin Traynor
2025-11-21 11:20 ` patch 'examples/l3fwd-power: fix telemetry command registration' " Kevin Traynor
2025-11-21 11:20 ` patch 'lib: fix backticks matching in Doxygen comments' " Kevin Traynor
2025-11-21 11:20 ` Kevin Traynor [this message]
2025-11-21 15:22   ` patch 'mcslock: fix memory ordering' " Wathsala Vithanage
2025-11-21 11:20 ` patch 'net/axgbe: fix build with GCC 16' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/dpaa2: fix duplicate call of close' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/dpaa2: clear active VDQ state when freeing Rx queues' " Kevin Traynor
2025-11-21 11:20 ` patch 'app/testpmd: fix flex item link parsing' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/ixgbe/base: fix PF link state request size' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/ice: fix path selection for QinQ Tx offload' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/ice: fix statistics' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/idpf: fix queue setup with TSO offload' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/iavf: fix check for PF Rx timestamp support' " Kevin Traynor
2025-11-21 11:20 ` patch 'net/iavf: fix Rx timestamp validity check' " Kevin Traynor
2025-11-21 23:41   ` Jacob Keller
2025-11-21 11:20 ` patch 'common/cnxk: fix max number of SQB buffers in clean up' " Kevin Traynor
2025-11-21 11:21 ` patch 'common/cnxk: fix null SQ access' " Kevin Traynor
2025-11-21 11:21 ` patch 'common/cnxk: fix format specifier for bandwidth profile ID' " Kevin Traynor
2025-11-21 11:21 ` patch 'common/cnxk: fix NIX Rx inject enabling' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/cnxk: fix Rx inject LF' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/cnxk: fix default meter pre-color' " Kevin Traynor
2025-11-21 11:21 ` patch 'crypto/qat: fix CCM request descriptor hash state size' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/dpaa2: remove ethdev pointer from bus device' " Kevin Traynor
2025-11-21 11:21 ` patch 'app/flow-perf: fix rules array length' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix spurious CPU wakeups' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix IPv6 DSCP offset in HWS sync API' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix send to kernel action resources release' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: release representor interrupt handler' " Kevin Traynor
2025-11-21 11:21 ` patch 'common/mlx5: release unused mempool entries' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5/hws: fix buddy memory allocation' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix uninitialized variable' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix flow tag indexes support on root table' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5/hws: fix flow rule hash capability' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix null dereference in modify header' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: skip Rx control flow tables in isolated mode' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: fix crash on flow rule destruction' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5: move auxiliary data inline' " Kevin Traynor
2025-11-21 11:21 ` patch 'net/mlx5/windows: fix match criteria in flow creation' " Kevin Traynor
2025-11-21 11:21 ` patch 'net: fix IPv6 link local compliance with RFC 4291' " Kevin Traynor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251121112128.485623-64-ktraynor@redhat.com \
    --to=ktraynor@redhat.com \
    --cc=ola.liljedahl@arm.com \
    --cc=stable@dpdk.org \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).