DPDK patches and discussions
 help / color / mirror / Atom feed
From: Wathsala Vithanage <wathsala.vithanage@arm.com>
To: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	Ola Liljedahl <ola.liljedahl@arm.com>,
	Steve Capper <steve.capper@arm.com>, Gavin Hu <gahu@nvidia.com>
Cc: dev@dpdk.org, Wathsala Vithanage <wathsala.vithanage@arm.com>,
	stable@dpdk.org, Dhruv Tripathi <dhruv.tripathi@arm.com>
Subject: [PATCH v5 1/3] ring: safe partial ordering for head/tail update
Date: Tue, 11 Nov 2025 18:37:17 +0000	[thread overview]
Message-ID: <20251111183720.833295-1-wathsala.vithanage@arm.com> (raw)

The function __rte_ring_headtail_move_head() assumes that the barrier
(fence) between the load of the head and the load-acquire of the
opposing tail guarantees the following: if a first thread reads tail
and then writes head and a second thread reads the new value of head
and then reads tail, then it should observe the same (or a later)
value of tail.

This assumption is incorrect under the C11 memory model. If the barrier
(fence) is intended to establish a total ordering of ring operations,
it fails to do so. Instead, the current implementation only enforces a
partial ordering, which can lead to unsafe interleavings. In particular,
some partial orders can cause underflows in free slot or available
element computations, potentially resulting in data corruption.

The issue manifests when a CPU first acts as a producer and later as a
consumer. In this scenario, the barrier assumption may fail when another
core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
this violation. The problem has not been widely observed so far because:
  (a) on strong memory models (e.g., x86-64) the assumption holds, and
  (b) on relaxed models with RCsc semantics the ordering is still strong
      enough to prevent hazards.
The problem becomes visible only on weaker models, when load-acquire is
implemented with RCpc semantics (e.g. some AArch64 CPUs which support
the LDAPR and LDAPUR instructions).

Three possible solutions exist:
  1. Strengthen ordering by upgrading release/acquire semantics to
     sequential consistency. This requires using seq-cst for stores,
     loads, and CAS operations. However, this approach introduces a
     significant performance penalty on relaxed-memory architectures.

  2. Establish a safe partial order by enforcing a pair-wise
     happens-before relationship between thread of same role by changing
     the CAS and the preceding load of the head by converting them to
     release and acquire respectively. This approach makes the original
     barrier assumption unnecessary and allows its removal.

  3. Retain partial ordering but ensure only safe partial orders are
     committed. This can be done by detecting underflow conditions
     (producer < consumer) and quashing the update in such cases.
     This approach makes the original barrier assumption unnecessary
     and allows its removal.

This patch implements solution (2) to preserve the “enqueue always
succeeds” contract expected by dependent libraries (e.g., mempool).
While solution (3) offers higher performance, adopting it now would
break that assumption.

Fixes: 49594a63147a9 ("ring/c11: relax ordering for load and store of the head")
Cc: stable@dpdk.org

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/ring/rte_ring_c11_pvt.h | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index b9388af0da..07b6efc416 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -36,6 +36,11 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
 			rte_memory_order_relaxed);
 
+	/*
+	 * R0: Establishes a synchronizing edge with load-acquire of tail at A1.
+	 * Ensures that memory effects by this thread on ring elements array
+	 * is observed by a different thread of the other type.
+	 */
 	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
@@ -77,17 +82,24 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 	int success;
 	unsigned int max = n;
 
+	/*
+	 * A0: Establishes a synchronizing edge with R1.
+	 * Ensure that this thread observes same values
+	 * to stail observed by the thread that updated
+	 * d->head.
+	 * If not, an unsafe partial order may ensue.
+	 */
 	*old_head = rte_atomic_load_explicit(&d->head,
-			rte_memory_order_relaxed);
+			rte_memory_order_acquire);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		/* Ensure the head is read before tail */
-		rte_atomic_thread_fence(rte_memory_order_acquire);
-
-		/* load-acquire synchronize with store-release of ht->tail
-		 * in update_tail.
+		/*
+		 * A1: Establishes a synchronizing edge with R0.
+		 * Ensures that other thread's memory effects on
+		 * ring elements array is observed by the time
+		 * this thread observes its tail update.
 		 */
 		stail = rte_atomic_load_explicit(&s->tail,
 					rte_memory_order_acquire);
@@ -113,10 +125,19 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 			success = 1;
 		} else
 			/* on failure, *old_head is updated */
+			/*
+			 * R1/A2.
+			 * R1: Establishes a synchronizing edge with A0 of a
+			 * different thread.
+			 * A2: Establishes a synchronizing edge with R1 of a
+			 * different thread to observe same value for stail
+			 * observed by that thread on CAS failure (to retry
+			 * with an updated *old_head).
+			 */
 			success = rte_atomic_compare_exchange_strong_explicit(
 					&d->head, old_head, *new_head,
-					rte_memory_order_relaxed,
-					rte_memory_order_relaxed);
+					rte_memory_order_release,
+					rte_memory_order_acquire);
 	} while (unlikely(success == 0));
 	return n;
 }
-- 
2.43.0


             reply	other threads:[~2025-11-11 18:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 18:37 Wathsala Vithanage [this message]
2025-11-11 18:37 ` [PATCH v5 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
2025-11-11 18:37 ` [PATCH v5 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
2025-11-11 20:45 ` [PATCH v5 1/3] ring: safe partial ordering for head/tail update Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251111183720.833295-1-wathsala.vithanage@arm.com \
    --to=wathsala.vithanage@arm.com \
    --cc=dev@dpdk.org \
    --cc=dhruv.tripathi@arm.com \
    --cc=gahu@nvidia.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=ola.liljedahl@arm.com \
    --cc=stable@dpdk.org \
    --cc=steve.capper@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).