* [PATCH v4 1/3] ring: safe partial ordering for head/tail update
@ 2025-11-11 18:16 Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
0 siblings, 2 replies; 3+ messages in thread
From: Wathsala Vithanage @ 2025-11-11 18:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev, Ola Liljedahl,
Steve Capper, Gavin Hu
Cc: dev, Wathsala Vithanage, stable, Dhruv Tripathi
The function __rte_ring_headtail_move_head() assumes that the barrier
(fence) between the load of the head and the load-acquire of the
opposing tail guarantees the following: if a first thread reads tail
and then writes head and a second thread reads the new value of head
and then reads tail, then it should observe the same (or a later)
value of tail.
This assumption is incorrect under the C11 memory model. If the barrier
(fence) is intended to establish a total ordering of ring operations,
it fails to do so. Instead, the current implementation only enforces a
partial ordering, which can lead to unsafe interleavings. In particular,
some partial orders can cause underflows in free slot or available
element computations, potentially resulting in data corruption.
The issue manifests when a CPU first acts as a producer and later as a
consumer. In this scenario, the barrier assumption may fail when another
core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
this violation. The problem has not been widely observed so far because:
(a) on strong memory models (e.g., x86-64) the assumption holds, and
(b) on relaxed models with RCsc semantics the ordering is still strong
enough to prevent hazards.
The problem becomes visible only on weaker models, when load-acquire is
implemented with RCpc semantics (e.g. some AArch64 CPUs which support
the LDAPR and LDAPUR instructions).
Three possible solutions exist:
1. Strengthen ordering by upgrading release/acquire semantics to
sequential consistency. This requires using seq-cst for stores,
loads, and CAS operations. However, this approach introduces a
significant performance penalty on relaxed-memory architectures.
2. Establish a safe partial order by enforcing a pair-wise
happens-before relationship between thread of same role by changing
the CAS and the preceding load of the head by converting them to
release and acquire respectively. This approach makes the original
barrier assumption unnecessary and allows its removal.
3. Retain partial ordering but ensure only safe partial orders are
committed. This can be done by detecting underflow conditions
(producer < consumer) and quashing the update in such cases.
This approach makes the original barrier assumption unnecessary
and allows its removal.
This patch implements solution (2) to preserve the “enqueue always
succeeds” contract expected by dependent libraries (e.g., mempool).
While solution (3) offers higher performance, adopting it now would
break that assumption.
Fixes: 49594a63147a9 ("ring/c11: relax ordering for load and store of the head")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
lib/ring/rte_ring_c11_pvt.h | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index b9388af0da..07b6efc416 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -36,6 +36,11 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
rte_memory_order_relaxed);
+ /*
+ * R0: Establishes a synchronizing edge with load-acquire of tail at A1.
+ * Ensures that memory effects by this thread on ring elements array
+ * is observed by a different thread of the other type.
+ */
rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
}
@@ -77,17 +82,24 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
int success;
unsigned int max = n;
+ /*
+ * A0: Establishes a synchronizing edge with R1.
+ * Ensure that this thread observes same values
+ * to stail observed by the thread that updated
+ * d->head.
+ * If not, an unsafe partial order may ensue.
+ */
*old_head = rte_atomic_load_explicit(&d->head,
- rte_memory_order_relaxed);
+ rte_memory_order_acquire);
do {
/* Reset n to the initial burst count */
n = max;
- /* Ensure the head is read before tail */
- rte_atomic_thread_fence(rte_memory_order_acquire);
-
- /* load-acquire synchronize with store-release of ht->tail
- * in update_tail.
+ /*
+ * A1: Establishes a synchronizing edge with R0.
+ * Ensures that other thread's memory effects on
+ * ring elements array is observed by the time
+ * this thread observes its tail update.
*/
stail = rte_atomic_load_explicit(&s->tail,
rte_memory_order_acquire);
@@ -113,10 +125,19 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
success = 1;
} else
/* on failure, *old_head is updated */
+ /*
+ * R1/A2.
+ * R1: Establishes a synchronizing edge with A0 of a
+ * different thread.
+ * A2: Establishes a synchronizing edge with R1 of a
+ * different thread to observe same value for stail
+ * observed by that thread on CAS failure (to retry
+ * with an updated *old_head).
+ */
success = rte_atomic_compare_exchange_strong_explicit(
&d->head, old_head, *new_head,
- rte_memory_order_relaxed,
- rte_memory_order_relaxed);
+ rte_memory_order_release,
+ rte_memory_order_acquire);
} while (unlikely(success == 0));
return n;
}
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v4 2/3] ring: establish a safe partial order in hts-ring
2025-11-11 18:16 [PATCH v4 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
@ 2025-11-11 18:16 ` Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
1 sibling, 0 replies; 3+ messages in thread
From: Wathsala Vithanage @ 2025-11-11 18:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev
Cc: dev, Wathsala Vithanage, stable, Ola Liljedahl
Enforce a safe partial order by making the CAS and the preceding head
load use release and acquire semantics. This creates a pairwise
happens-before relationship between threads of the same role.
Combine the two load-acquire operations of ht.raw, which were previously
split across the two paths of a conditional branch, into
__rte_ring_hts_head_wait. This simplifies the branching logic and makes
the synchronization behavior easier to understand.
Add comments to explain synchronizes with edges in detail.
Fixes: 1cc363b8ce06e ("ring: introduce HTS ring mode")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
lib/ring/rte_ring_hts_elem_pvt.h | 66 ++++++++++++++++++++++++--------
1 file changed, 49 insertions(+), 17 deletions(-)
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index e2b82dd1e6..a01089d15d 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -32,22 +32,40 @@ __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
RTE_SET_USED(enqueue);
tail = old_tail + num;
+
+ /*
+ * R0: Release the tail update. Establishes a synchronization edge with
+ * the load-acquire at A1. This release ensures that all updates to *ht
+ * and the ring array made by this thread become visible to the opposing
+ * thread once the tail value written here is observed.
+ */
rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
}
/**
- * @internal waits till tail will become equal to head.
- * Means no writer/reader is active for that ring.
- * Suppose to work as serialization point.
+ * @internal
+ * Waits until the tail becomes equal to the head.
+ * This indicates that another thread has finished its transaction, and there
+ * is a chance that we could be the next writer or reader in line.
+ *
+ * Returns ht.raw at this point. The value may be imprecise, since another
+ * thread might change the state before we observe ht.raw, but that does not
+ * matter. The function __rte_ring_hts_move_head() can detect and recall this
+ * function when it reaches the linearization point (CAS).
*/
-static __rte_always_inline void
+static __rte_always_inline union __rte_ring_hts_pos
__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
- union __rte_ring_hts_pos *p)
+ rte_memory_order memorder)
{
- while (p->pos.head != p->pos.tail) {
+ union __rte_ring_hts_pos p;
+ p.raw = rte_atomic_load_explicit(&ht->ht.raw, memorder);
+
+ while (p.pos.head != p.pos.tail) {
rte_pause();
- p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
+ p.raw = rte_atomic_load_explicit(&ht->ht.raw, memorder);
}
+
+ return p;
}
/**
@@ -80,11 +98,9 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
enum rte_ring_queue_behavior behavior, uint32_t *old_head,
uint32_t *entries)
{
- uint32_t n;
+ uint32_t n, stail;
union __rte_ring_hts_pos np, op;
- op.raw = rte_atomic_load_explicit(&d->ht.raw, rte_memory_order_acquire);
-
do {
/* Reset n to the initial burst count */
n = num;
@@ -94,7 +110,20 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
* make sure that we read prod head/tail *before*
* reading cons tail.
*/
- __rte_ring_hts_head_wait(d, &op);
+ /*
+ * A0: Synchronizes with the CAS at R1.
+ * Establishes a happens-before relationship with a thread of the same
+ * type that released the ht.raw, ensuring this thread observes all of
+ * its memory effects needed to maintain a safe partial order.
+ */
+ op = __rte_ring_hts_head_wait(d, rte_memory_order_acquire);
+
+ /*
+ * A1: Establish a synchronizes-with edge using a store-release at R0.
+ * This ensures that all memory effects from the preceding opposing
+ * thread are observed.
+ */
+ stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);
/*
* The subtraction is done between two unsigned 32bits value
@@ -102,7 +131,7 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
* *old_head > cons_tail). So 'entries' is always between 0
* and capacity (which is < size).
*/
- *entries = capacity + s->tail - op.pos.head;
+ *entries = capacity + stail - op.pos.head;
/* check that we have enough room in ring */
if (unlikely(n > *entries))
@@ -116,14 +145,17 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
np.pos.head = op.pos.head + n;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
- * - OOO reads of cons tail value
- * - OOO copy of elems from the ring
+ * R1: Establishes a synchronizes-with edge with the load-acquire
+ * of ht.raw at A0. This makes sure that the store-release to the
+ * tail by this thread, if it was of the opposite type, becomes
+ * visible to another thread of the current type. That thread will
+ * then observe the updates in the same order, keeping a safe
+ * partial order.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->ht.raw,
(uint64_t *)(uintptr_t)&op.raw, np.raw,
- rte_memory_order_acquire,
- rte_memory_order_acquire) == 0);
+ rte_memory_order_release,
+ rte_memory_order_relaxed) == 0);
*old_head = op.pos.head;
return n;
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH v4 3/3] ring: establish a safe partial order in rts-ring
2025-11-11 18:16 [PATCH v4 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
@ 2025-11-11 18:16 ` Wathsala Vithanage
1 sibling, 0 replies; 3+ messages in thread
From: Wathsala Vithanage @ 2025-11-11 18:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev
Cc: dev, Wathsala Vithanage, stable, Ola Liljedahl
Enforce a safe partial order by making the CAS and the preceding head
load use release and acquire semantics. This creates a pairwise
happens-before relationship between threads of the same kind
(i.e. between consumers or between producers).
Combine the two load-acquire operations of ht->head.raw, which were
previously split across the two paths of a conditional branch, into
__rte_ring_rts_head_wait. This simplifies the branching logic and makes
the synchronization behavior easier to understand.
Add comments to explain synchronizes with edges in detail.
Fixes: e6ba4731c0f3a ("ring: introduce RTS ring mode")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
lib/ring/rte_ring_rts_elem_pvt.h | 67 +++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 18 deletions(-)
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 96825931f8..0280369b21 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -31,6 +31,17 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
* might preceded us, then don't update tail with new value.
*/
+ /*
+ * A0 = {A0.a, A0.b}: Synchronizes with the CAS at R0.
+ * The CAS at R0 in same typed thread establishes a happens-before
+ * relationship with this load acquire. Ensures that this thread
+ * observes the same or later values for h.raw/h.val.cnt
+ * observed by the other thread when it updated ht->tail.raw.
+ * If not, ht->tail.raw may get updated out of sync (e.g. getting
+ * updated to the same value twice). A0.a makes sure this condition
+ * holds when CAS succeeds and A0.b when it fails.
+ */
+ /* A0.a */
ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
do {
@@ -40,7 +51,11 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
nt.raw = ot.raw;
if (++nt.val.cnt == h.val.cnt)
nt.val.pos = h.val.pos;
-
+ /*
+ * R0: Synchronizes with A2 of a different thread of the opposite type and A0.b
+ * of a different thread of the same type.
+ */
+ /* A0.b */
} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
rte_memory_order_release, rte_memory_order_acquire) == 0);
@@ -50,18 +65,21 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
* @internal This function waits till head/tail distance wouldn't
* exceed pre-defined max value.
*/
-static __rte_always_inline void
+static __rte_always_inline union __rte_ring_rts_poscnt
__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
- union __rte_ring_rts_poscnt *h)
+ rte_memory_order memorder)
{
- uint32_t max;
+ union __rte_ring_rts_poscnt h;
+ uint32_t max = ht->htd_max;
- max = ht->htd_max;
+ h.raw = rte_atomic_load_explicit(&ht->head.raw, memorder);
- while (h->val.pos - ht->tail.val.pos > max) {
+ while (h.val.pos - ht->tail.val.pos > max) {
rte_pause();
- h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
+ h.raw = rte_atomic_load_explicit(&ht->head.raw, memorder);
}
+
+ return h;
}
/**
@@ -94,12 +112,9 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
enum rte_ring_queue_behavior behavior, uint32_t *old_head,
uint32_t *entries)
{
- uint32_t n;
+ uint32_t n, stail;
union __rte_ring_rts_poscnt nh, oh;
- oh.raw = rte_atomic_load_explicit(&d->head.raw,
- rte_memory_order_acquire);
-
do {
/* Reset n to the initial burst count */
n = num;
@@ -109,7 +124,20 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
* make sure that we read prod head *before*
* reading cons tail.
*/
- __rte_ring_rts_head_wait(d, &oh);
+ /*
+ * A1 Synchronizes with the CAS at R1.
+ * Establishes a happens-before relationship with a thread of the same
+ * type that released the ht.raw, ensuring this thread observes all of
+ * its memory effects needed to maintain a safe partial order.
+ */
+ oh = __rte_ring_rts_head_wait(d, rte_memory_order_acquire);
+
+ /*
+ * A2: Establish a synchronizes-with edge using a store-release at R0.
+ * This ensures that all memory effects from the preceding opposing
+ * thread are observed.
+ */
+ stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);
/*
* The subtraction is done between two unsigned 32bits value
@@ -117,7 +145,7 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
* *old_head > cons_tail). So 'entries' is always between 0
* and capacity (which is < size).
*/
- *entries = capacity + s->tail - oh.val.pos;
+ *entries = capacity + stail - oh.val.pos;
/* check that we have enough room in ring */
if (unlikely(n > *entries))
@@ -131,14 +159,17 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
nh.val.cnt = oh.val.cnt + 1;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
- * - OOO reads of cons tail value
- * - OOO copy of elems to the ring
+ * R1: Establishes a synchronizes-with edge with the load-acquire
+ * of ht.raw at A1. Ensures that the store-release to the tail by
+ * this thread, if it was of the opposite type, becomes
+ * visible to another thread of the current type. That thread will
+ * then observe the updates in the same order, keeping a safe
+ * partial order.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->head.raw,
(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
- rte_memory_order_acquire,
- rte_memory_order_acquire) == 0);
+ rte_memory_order_release,
+ rte_memory_order_relaxed) == 0);
*old_head = oh.val.pos;
return n;
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-11 18:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-11 18:16 [PATCH v4 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
2025-11-11 18:16 ` [PATCH v4 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).