* [PATCH v5 2/3] ring: establish a safe partial order in hts-ring
2025-11-11 18:37 [PATCH v5 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
@ 2025-11-11 18:37 ` Wathsala Vithanage
2025-11-11 18:37 ` [PATCH v5 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
2025-11-11 20:45 ` [PATCH v5 1/3] ring: safe partial ordering for head/tail update Thomas Monjalon
2 siblings, 0 replies; 4+ messages in thread
From: Wathsala Vithanage @ 2025-11-11 18:37 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev
Cc: dev, Wathsala Vithanage, stable, Ola Liljedahl
Enforce a safe partial order by making the CAS and the preceding head
load use release and acquire semantics. This creates a pairwise
happens-before relationship between threads of the same role.
Combine the two load-acquire operations of ht.raw, which were previously
split across the two paths of a conditional branch, into
__rte_ring_hts_head_wait. This simplifies the branching logic and makes
the synchronization behavior easier to understand.
Add comments to explain synchronizes with edges in detail.
Fixes: 1cc363b8ce06e ("ring: introduce HTS ring mode")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
lib/ring/rte_ring_hts_elem_pvt.h | 66 ++++++++++++++++++++++++--------
1 file changed, 49 insertions(+), 17 deletions(-)
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index e2b82dd1e6..a01089d15d 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -32,22 +32,40 @@ __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
RTE_SET_USED(enqueue);
tail = old_tail + num;
+
+ /*
+ * R0: Release the tail update. Establishes a synchronization edge with
+ * the load-acquire at A1. This release ensures that all updates to *ht
+ * and the ring array made by this thread become visible to the opposing
+ * thread once the tail value written here is observed.
+ */
rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
}
/**
- * @internal waits till tail will become equal to head.
- * Means no writer/reader is active for that ring.
- * Suppose to work as serialization point.
+ * @internal
+ * Waits until the tail becomes equal to the head.
+ * This indicates that another thread has finished its transaction, and there
+ * is a chance that we could be the next writer or reader in line.
+ *
+ * Returns ht.raw at this point. The value may be imprecise, since another
+ * thread might change the state before we observe ht.raw, but that does not
+ * matter. The function __rte_ring_hts_move_head() can detect and recall this
+ * function when it reaches the linearization point (CAS).
*/
-static __rte_always_inline void
+static __rte_always_inline union __rte_ring_hts_pos
__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
- union __rte_ring_hts_pos *p)
+ rte_memory_order memorder)
{
- while (p->pos.head != p->pos.tail) {
+ union __rte_ring_hts_pos p;
+ p.raw = rte_atomic_load_explicit(&ht->ht.raw, memorder);
+
+ while (p.pos.head != p.pos.tail) {
rte_pause();
- p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
+ p.raw = rte_atomic_load_explicit(&ht->ht.raw, memorder);
}
+
+ return p;
}
/**
@@ -80,11 +98,9 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
enum rte_ring_queue_behavior behavior, uint32_t *old_head,
uint32_t *entries)
{
- uint32_t n;
+ uint32_t n, stail;
union __rte_ring_hts_pos np, op;
- op.raw = rte_atomic_load_explicit(&d->ht.raw, rte_memory_order_acquire);
-
do {
/* Reset n to the initial burst count */
n = num;
@@ -94,7 +110,20 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
* make sure that we read prod head/tail *before*
* reading cons tail.
*/
- __rte_ring_hts_head_wait(d, &op);
+ /*
+ * A0: Synchronizes with the CAS at R1.
+ * Establishes a happens-before relationship with a thread of the same
+ * type that released the ht.raw, ensuring this thread observes all of
+ * its memory effects needed to maintain a safe partial order.
+ */
+ op = __rte_ring_hts_head_wait(d, rte_memory_order_acquire);
+
+ /*
+ * A1: Establish a synchronizes-with edge using a store-release at R0.
+ * This ensures that all memory effects from the preceding opposing
+ * thread are observed.
+ */
+ stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);
/*
* The subtraction is done between two unsigned 32bits value
@@ -102,7 +131,7 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
* *old_head > cons_tail). So 'entries' is always between 0
* and capacity (which is < size).
*/
- *entries = capacity + s->tail - op.pos.head;
+ *entries = capacity + stail - op.pos.head;
/* check that we have enough room in ring */
if (unlikely(n > *entries))
@@ -116,14 +145,17 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
np.pos.head = op.pos.head + n;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
- * - OOO reads of cons tail value
- * - OOO copy of elems from the ring
+ * R1: Establishes a synchronizes-with edge with the load-acquire
+ * of ht.raw at A0. This makes sure that the store-release to the
+ * tail by this thread, if it was of the opposite type, becomes
+ * visible to another thread of the current type. That thread will
+ * then observe the updates in the same order, keeping a safe
+ * partial order.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->ht.raw,
(uint64_t *)(uintptr_t)&op.raw, np.raw,
- rte_memory_order_acquire,
- rte_memory_order_acquire) == 0);
+ rte_memory_order_release,
+ rte_memory_order_relaxed) == 0);
*old_head = op.pos.head;
return n;
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v5 3/3] ring: establish a safe partial order in rts-ring
2025-11-11 18:37 [PATCH v5 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
2025-11-11 18:37 ` [PATCH v5 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
@ 2025-11-11 18:37 ` Wathsala Vithanage
2025-11-11 20:45 ` [PATCH v5 1/3] ring: safe partial ordering for head/tail update Thomas Monjalon
2 siblings, 0 replies; 4+ messages in thread
From: Wathsala Vithanage @ 2025-11-11 18:37 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev
Cc: dev, Wathsala Vithanage, stable, Ola Liljedahl
Enforce a safe partial order by making the CAS and the preceding head
load use release and acquire semantics. This creates a pairwise
happens-before relationship between threads of the same kind
(i.e. between consumers or between producers).
Combine the two load-acquire operations of ht->head.raw, which were
previously split across the two paths of a conditional branch, into
__rte_ring_rts_head_wait. This simplifies the branching logic and makes
the synchronization behavior easier to understand.
Add comments to explain synchronizes with edges in detail.
Fixes: e6ba4731c0f3a ("ring: introduce RTS ring mode")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
lib/ring/rte_ring_rts_elem_pvt.h | 67 +++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 18 deletions(-)
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 96825931f8..0280369b21 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -31,6 +31,17 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
* might preceded us, then don't update tail with new value.
*/
+ /*
+ * A0 = {A0.a, A0.b}: Synchronizes with the CAS at R0.
+ * The CAS at R0 in same typed thread establishes a happens-before
+ * relationship with this load acquire. Ensures that this thread
+ * observes the same or later values for h.raw/h.val.cnt
+ * observed by the other thread when it updated ht->tail.raw.
+ * If not, ht->tail.raw may get updated out of sync (e.g. getting
+ * updated to the same value twice). A0.a makes sure this condition
+ * holds when CAS succeeds and A0.b when it fails.
+ */
+ /* A0.a */
ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
do {
@@ -40,7 +51,11 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
nt.raw = ot.raw;
if (++nt.val.cnt == h.val.cnt)
nt.val.pos = h.val.pos;
-
+ /*
+ * R0: Synchronizes with A2 of a different thread of the opposite type and A0.b
+ * of a different thread of the same type.
+ */
+ /* A0.b */
} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
rte_memory_order_release, rte_memory_order_acquire) == 0);
@@ -50,18 +65,21 @@ __rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
* @internal This function waits till head/tail distance wouldn't
* exceed pre-defined max value.
*/
-static __rte_always_inline void
+static __rte_always_inline union __rte_ring_rts_poscnt
__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
- union __rte_ring_rts_poscnt *h)
+ rte_memory_order memorder)
{
- uint32_t max;
+ union __rte_ring_rts_poscnt h;
+ uint32_t max = ht->htd_max;
- max = ht->htd_max;
+ h.raw = rte_atomic_load_explicit(&ht->head.raw, memorder);
- while (h->val.pos - ht->tail.val.pos > max) {
+ while (h.val.pos - ht->tail.val.pos > max) {
rte_pause();
- h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
+ h.raw = rte_atomic_load_explicit(&ht->head.raw, memorder);
}
+
+ return h;
}
/**
@@ -94,12 +112,9 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
enum rte_ring_queue_behavior behavior, uint32_t *old_head,
uint32_t *entries)
{
- uint32_t n;
+ uint32_t n, stail;
union __rte_ring_rts_poscnt nh, oh;
- oh.raw = rte_atomic_load_explicit(&d->head.raw,
- rte_memory_order_acquire);
-
do {
/* Reset n to the initial burst count */
n = num;
@@ -109,7 +124,20 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
* make sure that we read prod head *before*
* reading cons tail.
*/
- __rte_ring_rts_head_wait(d, &oh);
+ /*
+ * A1 Synchronizes with the CAS at R1.
+ * Establishes a happens-before relationship with a thread of the same
+ * type that released the ht.raw, ensuring this thread observes all of
+ * its memory effects needed to maintain a safe partial order.
+ */
+ oh = __rte_ring_rts_head_wait(d, rte_memory_order_acquire);
+
+ /*
+ * A2: Establish a synchronizes-with edge using a store-release at R0.
+ * This ensures that all memory effects from the preceding opposing
+ * thread are observed.
+ */
+ stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);
/*
* The subtraction is done between two unsigned 32bits value
@@ -117,7 +145,7 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
* *old_head > cons_tail). So 'entries' is always between 0
* and capacity (which is < size).
*/
- *entries = capacity + s->tail - oh.val.pos;
+ *entries = capacity + stail - oh.val.pos;
/* check that we have enough room in ring */
if (unlikely(n > *entries))
@@ -131,14 +159,17 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
nh.val.cnt = oh.val.cnt + 1;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
- * - OOO reads of cons tail value
- * - OOO copy of elems to the ring
+ * R1: Establishes a synchronizes-with edge with the load-acquire
+ * of ht.raw at A1. Ensures that the store-release to the tail by
+ * this thread, if it was of the opposite type, becomes
+ * visible to another thread of the current type. That thread will
+ * then observe the updates in the same order, keeping a safe
+ * partial order.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->head.raw,
(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
- rte_memory_order_acquire,
- rte_memory_order_acquire) == 0);
+ rte_memory_order_release,
+ rte_memory_order_relaxed) == 0);
*old_head = oh.val.pos;
return n;
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v5 1/3] ring: safe partial ordering for head/tail update
2025-11-11 18:37 [PATCH v5 1/3] ring: safe partial ordering for head/tail update Wathsala Vithanage
2025-11-11 18:37 ` [PATCH v5 2/3] ring: establish a safe partial order in hts-ring Wathsala Vithanage
2025-11-11 18:37 ` [PATCH v5 3/3] ring: establish a safe partial order in rts-ring Wathsala Vithanage
@ 2025-11-11 20:45 ` Thomas Monjalon
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Monjalon @ 2025-11-11 20:45 UTC (permalink / raw)
To: Wathsala Vithanage
Cc: Honnappa Nagarahalli, Konstantin Ananyev, Ola Liljedahl,
Steve Capper, Gavin Hu, stable, dev, Dhruv Tripathi
11/11/2025 19:37, Wathsala Vithanage:
> The function __rte_ring_headtail_move_head() assumes that the barrier
> (fence) between the load of the head and the load-acquire of the
> opposing tail guarantees the following: if a first thread reads tail
> and then writes head and a second thread reads the new value of head
> and then reads tail, then it should observe the same (or a later)
> value of tail.
>
> This assumption is incorrect under the C11 memory model. If the barrier
> (fence) is intended to establish a total ordering of ring operations,
> it fails to do so. Instead, the current implementation only enforces a
> partial ordering, which can lead to unsafe interleavings. In particular,
> some partial orders can cause underflows in free slot or available
> element computations, potentially resulting in data corruption.
>
> The issue manifests when a CPU first acts as a producer and later as a
> consumer. In this scenario, the barrier assumption may fail when another
> core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
> this violation. The problem has not been widely observed so far because:
> (a) on strong memory models (e.g., x86-64) the assumption holds, and
> (b) on relaxed models with RCsc semantics the ordering is still strong
> enough to prevent hazards.
> The problem becomes visible only on weaker models, when load-acquire is
> implemented with RCpc semantics (e.g. some AArch64 CPUs which support
> the LDAPR and LDAPUR instructions).
>
> Three possible solutions exist:
> 1. Strengthen ordering by upgrading release/acquire semantics to
> sequential consistency. This requires using seq-cst for stores,
> loads, and CAS operations. However, this approach introduces a
> significant performance penalty on relaxed-memory architectures.
>
> 2. Establish a safe partial order by enforcing a pair-wise
> happens-before relationship between thread of same role by changing
> the CAS and the preceding load of the head by converting them to
> release and acquire respectively. This approach makes the original
> barrier assumption unnecessary and allows its removal.
>
> 3. Retain partial ordering but ensure only safe partial orders are
> committed. This can be done by detecting underflow conditions
> (producer < consumer) and quashing the update in such cases.
> This approach makes the original barrier assumption unnecessary
> and allows its removal.
>
> This patch implements solution (2) to preserve the “enqueue always
> succeeds” contract expected by dependent libraries (e.g., mempool).
> While solution (3) offers higher performance, adopting it now would
> break that assumption.
>
> Fixes: 49594a63147a9 ("ring/c11: relax ordering for load and store of the head")
> Cc: stable@dpdk.org
>
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Series applied, thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread