* [PATCH v2 0/1] ring: correct ordering issue in head/tail update
@ 2025-10-02 17:41 Wathsala Vithanage
2025-10-02 17:41 ` [PATCH v2 1/1] ring: safe partial ordering for " Wathsala Vithanage
2025-11-10 10:17 ` [PATCH v3 0/1] ring: correct ordering issue in " Konstantin Ananyev
0 siblings, 2 replies; 10+ messages in thread
From: Wathsala Vithanage @ 2025-10-02 17:41 UTC (permalink / raw)
Cc: dev, Wathsala Vithanage
Hi all,
This patch fixes a subtle ordering issue in the ring code that can lead
to incorrect behavior under certain conditions. The change adopts a
solution that balances performance with compatibility for dependent
libraries.
For background, motivation, and validation (including Herd7 litmus
tests), see the accompanying write-up:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/when-a-barrier-does-not-block-the-pitfalls-of-partial-order
V1->V2:
* Switched from squashing unsafe partial orders (solution #3) to
* establishing a pairwise happens-before relationship between the
* producer and consumer heads (solution #2).
Wathsala Vithanage (1):
ring: safe partial ordering for head/tail update
lib/ring/rte_ring_c11_pvt.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-10-02 17:41 [PATCH v2 0/1] ring: correct ordering issue in head/tail update Wathsala Vithanage
@ 2025-10-02 17:41 ` Wathsala Vithanage
2025-10-08 8:00 ` Konstantin Ananyev
2025-11-10 10:17 ` [PATCH v3 0/1] ring: correct ordering issue in " Konstantin Ananyev
1 sibling, 1 reply; 10+ messages in thread
From: Wathsala Vithanage @ 2025-10-02 17:41 UTC (permalink / raw)
To: Honnappa Nagarahalli, Konstantin Ananyev
Cc: dev, Wathsala Vithanage, Ola Liljedahl, Dhruv Tripathi
The function __rte_ring_headtail_move_head() assumes that the barrier
(fence) between the load of the head and the load-acquire of the
opposing tail guarantees the following: if a first thread reads tail
and then writes head and a second thread reads the new value of head
and then reads tail, then it should observe the same (or a later)
value of tail.
This assumption is incorrect under the C11 memory model. If the barrier
(fence) is intended to establish a total ordering of ring operations,
it fails to do so. Instead, the current implementation only enforces a
partial ordering, which can lead to unsafe interleavings. In particular,
some partial orders can cause underflows in free slot or available
element computations, potentially resulting in data corruption.
The issue manifests when a CPU first acts as a producer and later as a
consumer. In this scenario, the barrier assumption may fail when another
core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
this violation. The problem has not been widely observed so far because:
(a) on strong memory models (e.g., x86-64) the assumption holds, and
(b) on relaxed models with RCsc semantics the ordering is still strong
enough to prevent hazards.
The problem becomes visible only on weaker models, when load-acquire is
implemented with RCpc semantics (e.g. some AArch64 CPUs which support
the LDAPR and LDAPUR instructions).
Three possible solutions exist:
1. Strengthen ordering by upgrading release/acquire semantics to
sequential consistency. This requires using seq-cst for stores,
loads, and CAS operations. However, this approach introduces a
significant performance penalty on relaxed-memory architectures.
2. Establish a safe partial order by enforcing a pair-wise
happens-before relationship between thread of same role by changing
the CAS and the preceding load of the head by converting them to
release and acquire respectively. This approach makes the original
barrier assumption unnecessary and allows its removal.
3. Retain partial ordering but ensure only safe partial orders are
committed. This can be done by detecting underflow conditions
(producer < consumer) and quashing the update in such cases.
This approach makes the original barrier assumption unnecessary
and allows its removal.
This patch implements solution (2) to preserve the “enqueue always
succeeds” contract expected by dependent libraries (e.g., mempool).
While solution (3) offers higher performance, adopting it now would
break that assumption.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
lib/ring/rte_ring_c11_pvt.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index b9388af0da..98c6584edb 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
unsigned int max = n;
*old_head = rte_atomic_load_explicit(&d->head,
- rte_memory_order_relaxed);
+ rte_memory_order_acquire);
do {
/* Reset n to the initial burst count */
n = max;
- /* Ensure the head is read before tail */
- rte_atomic_thread_fence(rte_memory_order_acquire);
-
/* load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
/* on failure, *old_head is updated */
success = rte_atomic_compare_exchange_strong_explicit(
&d->head, old_head, *new_head,
- rte_memory_order_relaxed,
- rte_memory_order_relaxed);
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire);
} while (unlikely(success == 0));
return n;
}
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-10-02 17:41 ` [PATCH v2 1/1] ring: safe partial ordering for " Wathsala Vithanage
@ 2025-10-08 8:00 ` Konstantin Ananyev
2025-10-13 18:09 ` Wathsala Vithanage
0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2025-10-08 8:00 UTC (permalink / raw)
To: Wathsala Vithanage, Honnappa Nagarahalli
Cc: dev, Ola Liljedahl, Dhruv Tripathi
> The function __rte_ring_headtail_move_head() assumes that the barrier
> (fence) between the load of the head and the load-acquire of the
> opposing tail guarantees the following: if a first thread reads tail
> and then writes head and a second thread reads the new value of head
> and then reads tail, then it should observe the same (or a later)
> value of tail.
>
> This assumption is incorrect under the C11 memory model. If the barrier
> (fence) is intended to establish a total ordering of ring operations,
> it fails to do so. Instead, the current implementation only enforces a
> partial ordering, which can lead to unsafe interleavings. In particular,
> some partial orders can cause underflows in free slot or available
> element computations, potentially resulting in data corruption.
>
> The issue manifests when a CPU first acts as a producer and later as a
> consumer. In this scenario, the barrier assumption may fail when another
> core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
> this violation. The problem has not been widely observed so far because:
> (a) on strong memory models (e.g., x86-64) the assumption holds, and
> (b) on relaxed models with RCsc semantics the ordering is still strong
> enough to prevent hazards.
> The problem becomes visible only on weaker models, when load-acquire is
> implemented with RCpc semantics (e.g. some AArch64 CPUs which support
> the LDAPR and LDAPUR instructions).
>
> Three possible solutions exist:
> 1. Strengthen ordering by upgrading release/acquire semantics to
> sequential consistency. This requires using seq-cst for stores,
> loads, and CAS operations. However, this approach introduces a
> significant performance penalty on relaxed-memory architectures.
>
> 2. Establish a safe partial order by enforcing a pair-wise
> happens-before relationship between thread of same role by changing
> the CAS and the preceding load of the head by converting them to
> release and acquire respectively. This approach makes the original
> barrier assumption unnecessary and allows its removal.
>
> 3. Retain partial ordering but ensure only safe partial orders are
> committed. This can be done by detecting underflow conditions
> (producer < consumer) and quashing the update in such cases.
> This approach makes the original barrier assumption unnecessary
> and allows its removal.
>
> This patch implements solution (2) to preserve the “enqueue always
> succeeds” contract expected by dependent libraries (e.g., mempool).
> While solution (3) offers higher performance, adopting it now would
> break that assumption.
>
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---
> lib/ring/rte_ring_c11_pvt.h | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index b9388af0da..98c6584edb 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> *d,
> unsigned int max = n;
>
> *old_head = rte_atomic_load_explicit(&d->head,
> - rte_memory_order_relaxed);
> + rte_memory_order_acquire);
> do {
> /* Reset n to the initial burst count */
> n = max;
>
> - /* Ensure the head is read before tail */
> - rte_atomic_thread_fence(rte_memory_order_acquire);
> -
> /* load-acquire synchronize with store-release of ht->tail
> * in update_tail.
> */
> @@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> *d,
> /* on failure, *old_head is updated */
> success =
> rte_atomic_compare_exchange_strong_explicit(
> &d->head, old_head, *new_head,
> - rte_memory_order_relaxed,
> - rte_memory_order_relaxed);
> + rte_memory_order_acq_rel,
> + rte_memory_order_acquire);
> } while (unlikely(success == 0));
> return n;
> }
> --
LGTM, though. I think that we also need to make similar changes in
rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
for CAS use 'acq_rel' order instead of simple 'acquire'.
Let me know would you have a bandwidth to do that.
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> 2.43.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-10-08 8:00 ` Konstantin Ananyev
@ 2025-10-13 18:09 ` Wathsala Vithanage
2025-11-06 8:15 ` David Marchand
0 siblings, 1 reply; 10+ messages in thread
From: Wathsala Vithanage @ 2025-10-13 18:09 UTC (permalink / raw)
To: Konstantin Ananyev, Honnappa Nagarahalli
Cc: dev, Ola Liljedahl, Dhruv Tripathi
On 10/8/25 03:00, Konstantin Ananyev wrote:
>
>> The function __rte_ring_headtail_move_head() assumes that the barrier
>> (fence) between the load of the head and the load-acquire of the
>> opposing tail guarantees the following: if a first thread reads tail
>> and then writes head and a second thread reads the new value of head
>> and then reads tail, then it should observe the same (or a later)
>> value of tail.
>>
>> This assumption is incorrect under the C11 memory model. If the barrier
>> (fence) is intended to establish a total ordering of ring operations,
>> it fails to do so. Instead, the current implementation only enforces a
>> partial ordering, which can lead to unsafe interleavings. In particular,
>> some partial orders can cause underflows in free slot or available
>> element computations, potentially resulting in data corruption.
>>
>> The issue manifests when a CPU first acts as a producer and later as a
>> consumer. In this scenario, the barrier assumption may fail when another
>> core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
>> this violation. The problem has not been widely observed so far because:
>> (a) on strong memory models (e.g., x86-64) the assumption holds, and
>> (b) on relaxed models with RCsc semantics the ordering is still strong
>> enough to prevent hazards.
>> The problem becomes visible only on weaker models, when load-acquire is
>> implemented with RCpc semantics (e.g. some AArch64 CPUs which support
>> the LDAPR and LDAPUR instructions).
>>
>> Three possible solutions exist:
>> 1. Strengthen ordering by upgrading release/acquire semantics to
>> sequential consistency. This requires using seq-cst for stores,
>> loads, and CAS operations. However, this approach introduces a
>> significant performance penalty on relaxed-memory architectures.
>>
>> 2. Establish a safe partial order by enforcing a pair-wise
>> happens-before relationship between thread of same role by changing
>> the CAS and the preceding load of the head by converting them to
>> release and acquire respectively. This approach makes the original
>> barrier assumption unnecessary and allows its removal.
>>
>> 3. Retain partial ordering but ensure only safe partial orders are
>> committed. This can be done by detecting underflow conditions
>> (producer < consumer) and quashing the update in such cases.
>> This approach makes the original barrier assumption unnecessary
>> and allows its removal.
>>
>> This patch implements solution (2) to preserve the “enqueue always
>> succeeds” contract expected by dependent libraries (e.g., mempool).
>> While solution (3) offers higher performance, adopting it now would
>> break that assumption.
>>
>> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
>> Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
>> ---
>> lib/ring/rte_ring_c11_pvt.h | 9 +++------
>> 1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
>> index b9388af0da..98c6584edb 100644
>> --- a/lib/ring/rte_ring_c11_pvt.h
>> +++ b/lib/ring/rte_ring_c11_pvt.h
>> @@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
>> *d,
>> unsigned int max = n;
>>
>> *old_head = rte_atomic_load_explicit(&d->head,
>> - rte_memory_order_relaxed);
>> + rte_memory_order_acquire);
>> do {
>> /* Reset n to the initial burst count */
>> n = max;
>>
>> - /* Ensure the head is read before tail */
>> - rte_atomic_thread_fence(rte_memory_order_acquire);
>> -
>> /* load-acquire synchronize with store-release of ht->tail
>> * in update_tail.
>> */
>> @@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
>> *d,
>> /* on failure, *old_head is updated */
>> success =
>> rte_atomic_compare_exchange_strong_explicit(
>> &d->head, old_head, *new_head,
>> - rte_memory_order_relaxed,
>> - rte_memory_order_relaxed);
>> + rte_memory_order_acq_rel,
>> + rte_memory_order_acquire);
>> } while (unlikely(success == 0));
>> return n;
>> }
>> --
> LGTM, though. I think that we also need to make similar changes in
> rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
> for CAS use 'acq_rel' order instead of simple 'acquire'.
> Let me know would you have a bandwidth to do that.
My bad, I forgot those two cases. I will send a v3.
-- wathsala
>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-10-13 18:09 ` Wathsala Vithanage
@ 2025-11-06 8:15 ` David Marchand
2025-11-10 10:25 ` Konstantin Ananyev
0 siblings, 1 reply; 10+ messages in thread
From: David Marchand @ 2025-11-06 8:15 UTC (permalink / raw)
To: Wathsala Vithanage
Cc: Konstantin Ananyev, Honnappa Nagarahalli, dev, Ola Liljedahl,
Dhruv Tripathi
Hello Wathsala,
On Mon, 13 Oct 2025 at 20:10, Wathsala Vithanage
<wathsala.vithanage@arm.com> wrote:
> > LGTM, though. I think that we also need to make similar changes in
> > rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
> > for CAS use 'acq_rel' order instead of simple 'acquire'.
> > Let me know would you have a bandwidth to do that.
>
> My bad, I forgot those two cases. I will send a v3.
Should we wait for this v3?
We want to close rc2 this week.
--
David Marchand
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 0/1] ring: correct ordering issue in head/tail update
2025-10-02 17:41 [PATCH v2 0/1] ring: correct ordering issue in head/tail update Wathsala Vithanage
2025-10-02 17:41 ` [PATCH v2 1/1] ring: safe partial ordering for " Wathsala Vithanage
@ 2025-11-10 10:17 ` Konstantin Ananyev
2025-11-10 10:17 ` [PATCH v3 1/1] ring: fix unsafe ordering for " Konstantin Ananyev
1 sibling, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2025-11-10 10:17 UTC (permalink / raw)
To: dev
Cc: wathsala.vithanage, honnappa.nagarahalli, ola.liljedahl,
dhruv.tripathi, david.marchand
This patch fixes a subtle ordering issue in the ring code that can lead
to incorrect behavior under certain conditions. The change adopts a
solution that balances performance with compatibility for dependent
libraries.
For background, motivation, and validation (including Herd7 litmus
tests), see the accompanying write-up:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/when-a-barrier-does-not-block-the-pitfalls-of-partial-order
V1->V2:
* Switched from squashing unsafe partial orders (solution #3) to
* establishing a pairwise happens-before relationship between the
* producer and consumer heads (solution #2).
V2->V3 (Konstantin):
* Added Fixes and CC to stable
* Extend patch to cover HTS and RTS mode
Wathsala Vithanage (1):
ring: fix unsafe ordering for head/tail update
lib/ring/rte_ring_c11_pvt.h | 9 +++------
lib/ring/rte_ring_hts_elem_pvt.h | 6 ++++--
lib/ring/rte_ring_rts_elem_pvt.h | 6 ++++--
3 files changed, 11 insertions(+), 10 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/1] ring: fix unsafe ordering for head/tail update
2025-11-10 10:17 ` [PATCH v3 0/1] ring: correct ordering issue in " Konstantin Ananyev
@ 2025-11-10 10:17 ` Konstantin Ananyev
0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Ananyev @ 2025-11-10 10:17 UTC (permalink / raw)
To: dev
Cc: wathsala.vithanage, honnappa.nagarahalli, ola.liljedahl,
dhruv.tripathi, david.marchand, stable
From: Wathsala Vithanage <wathsala.vithanage@arm.com>
The function __rte_ring_headtail_move_head() assumes that the barrier
(fence) between the load of the head and the load-acquire of the
opposing tail guarantees the following: if a first thread reads tail
and then writes head and a second thread reads the new value of head
and then reads tail, then it should observe the same (or a later)
value of tail.
This assumption is incorrect under the C11 memory model. If the barrier
(fence) is intended to establish a total ordering of ring operations,
it fails to do so. Instead, the current implementation only enforces a
partial ordering, which can lead to unsafe interleavings. In particular,
some partial orders can cause underflows in free slot or available
element computations, potentially resulting in data corruption.
The issue manifests when a CPU first acts as a producer and later as a
consumer. In this scenario, the barrier assumption may fail when another
core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
this violation. The problem has not been widely observed so far because:
(a) on strong memory models (e.g., x86-64) the assumption holds, and
(b) on relaxed models with RCsc semantics the ordering is still strong
enough to prevent hazards.
The problem becomes visible only on weaker models, when load-acquire is
implemented with RCpc semantics (e.g. some AArch64 CPUs which support
the LDAPR and LDAPUR instructions).
Three possible solutions exist:
1. Strengthen ordering by upgrading release/acquire semantics to
sequential consistency. This requires using seq-cst for stores,
loads, and CAS operations. However, this approach introduces a
significant performance penalty on relaxed-memory architectures.
2. Establish a safe partial order by enforcing a pair-wise
happens-before relationship between thread of same role by changing
the CAS and the preceding load of the head by converting them to
release and acquire respectively. This approach makes the original
barrier assumption unnecessary and allows its removal.
3. Retain partial ordering but ensure only safe partial orders are
committed. This can be done by detecting underflow conditions
(producer < consumer) and quashing the update in such cases.
This approach makes the original barrier assumption unnecessary
and allows its removal.
This patch implements solution (2) to preserve the “enqueue always
succeeds” contract expected by dependent libraries (e.g., mempool).
While solution (3) offers higher performance, adopting it now would
break that assumption.
Fixes: b5458e2cc483 ("ring: introduce staged ordered ring")
Fixes: 1cc363b8ce06 ("ring: introduce HTS ring mode")
Fixes: e6ba4731c0f3 ("ring: introduce RTS ring mode")
Fixes: 49594a63147a ("ring/c11: relax ordering for load and store of the head")
Cc: stable@dpdk.org
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
lib/ring/rte_ring_c11_pvt.h | 9 +++------
lib/ring/rte_ring_hts_elem_pvt.h | 6 ++++--
lib/ring/rte_ring_rts_elem_pvt.h | 6 ++++--
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index b9388af0da..98c6584edb 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -78,14 +78,11 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
unsigned int max = n;
*old_head = rte_atomic_load_explicit(&d->head,
- rte_memory_order_relaxed);
+ rte_memory_order_acquire);
do {
/* Reset n to the initial burst count */
n = max;
- /* Ensure the head is read before tail */
- rte_atomic_thread_fence(rte_memory_order_acquire);
-
/* load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -115,8 +112,8 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
/* on failure, *old_head is updated */
success = rte_atomic_compare_exchange_strong_explicit(
&d->head, old_head, *new_head,
- rte_memory_order_relaxed,
- rte_memory_order_relaxed);
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire);
} while (unlikely(success == 0));
return n;
}
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index e2b82dd1e6..1c1569e7e2 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -116,13 +116,15 @@ __rte_ring_hts_move_head(struct rte_ring_hts_headtail *d,
np.pos.head = op.pos.head + n;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+ * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to prevent:
* - OOO reads of cons tail value
* - OOO copy of elems from the ring
+ * Also RELEASE guarantees that latest tail value
+ * will become visible before the new head value.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->ht.raw,
(uint64_t *)(uintptr_t)&op.raw, np.raw,
- rte_memory_order_acquire,
+ rte_memory_order_acq_rel,
rte_memory_order_acquire) == 0);
*old_head = op.pos.head;
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 96825931f8..b270998683 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -131,13 +131,15 @@ __rte_ring_rts_move_head(struct rte_ring_rts_headtail *d,
nh.val.cnt = oh.val.cnt + 1;
/*
- * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+ * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to prevent:
* - OOO reads of cons tail value
* - OOO copy of elems to the ring
+ * Also RELEASE guarantees that latest tail value
+ * will become visible before the new head value.
*/
} while (rte_atomic_compare_exchange_strong_explicit(&d->head.raw,
(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
- rte_memory_order_acquire,
+ rte_memory_order_acq_rel,
rte_memory_order_acquire) == 0);
*old_head = oh.val.pos;
--
2.51.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-11-06 8:15 ` David Marchand
@ 2025-11-10 10:25 ` Konstantin Ananyev
2025-11-10 19:13 ` Wathsala Vithanage
0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2025-11-10 10:25 UTC (permalink / raw)
To: David Marchand, Wathsala Vithanage
Cc: Honnappa Nagarahalli, dev, Ola Liljedahl, Dhruv Tripathi
Hi David,
>
> Hello Wathsala,
>
> On Mon, 13 Oct 2025 at 20:10, Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> > > LGTM, though. I think that we also need to make similar changes in
> > > rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
> > > for CAS use 'acq_rel' order instead of simple 'acquire'.
> > > Let me know would you have a bandwidth to do that.
> >
> > My bad, I forgot those two cases. I will send a v3.
>
> Should we wait for this v3?
> We want to close rc2 this week.
>
Just submitted v3 with discussed extra changes.
Also added fixes, cc to stable, etc.
Hope it still can make 25.11
Wathsala, pls shout or submit v4 if you feel something is missing.
Thanks
Konstantin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-11-10 10:25 ` Konstantin Ananyev
@ 2025-11-10 19:13 ` Wathsala Vithanage
2025-11-11 7:48 ` Konstantin Ananyev
0 siblings, 1 reply; 10+ messages in thread
From: Wathsala Vithanage @ 2025-11-10 19:13 UTC (permalink / raw)
To: Konstantin Ananyev, David Marchand
Cc: Honnappa Nagarahalli, dev, Ola Liljedahl, Dhruv Tripathi
On 11/10/25 04:25, Konstantin Ananyev wrote:
> Hi David,
>
>> Hello Wathsala,
>>
>> On Mon, 13 Oct 2025 at 20:10, Wathsala Vithanage
>> <wathsala.vithanage@arm.com> wrote:
>>>> LGTM, though. I think that we also need to make similar changes in
>>>> rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
>>>> for CAS use 'acq_rel' order instead of simple 'acquire'.
>>>> Let me know would you have a bandwidth to do that.
>>> My bad, I forgot those two cases. I will send a v3.
>> Should we wait for this v3?
>> We want to close rc2 this week.
>>
> Just submitted v3 with discussed extra changes.
> Also added fixes, cc to stable, etc.
> Hope it still can make 25.11
> Wathsala, pls shout or submit v4 if you feel something is missing.
> Thanks
> Konstantin
Thanks Konstantin, you beat me to the punch. I have a slightly different
version with comments explaining the synchronizes with relationships. If
you don't mind I can send it as V4.
--wathsala
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v2 1/1] ring: safe partial ordering for head/tail update
2025-11-10 19:13 ` Wathsala Vithanage
@ 2025-11-11 7:48 ` Konstantin Ananyev
0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Ananyev @ 2025-11-11 7:48 UTC (permalink / raw)
To: Wathsala Vithanage, David Marchand
Cc: Honnappa Nagarahalli, dev, Ola Liljedahl, Dhruv Tripathi
> >> On Mon, 13 Oct 2025 at 20:10, Wathsala Vithanage
> >> <wathsala.vithanage@arm.com> wrote:
> >>>> LGTM, though. I think that we also need to make similar changes in
> >>>> rte_ring_hts_elem_pvt.h and rte_ring_rts_elem_pvt.h:
> >>>> for CAS use 'acq_rel' order instead of simple 'acquire'.
> >>>> Let me know would you have a bandwidth to do that.
> >>> My bad, I forgot those two cases. I will send a v3.
> >> Should we wait for this v3?
> >> We want to close rc2 this week.
> >>
> > Just submitted v3 with discussed extra changes.
> > Also added fixes, cc to stable, etc.
> > Hope it still can make 25.11
> > Wathsala, pls shout or submit v4 if you feel something is missing.
> > Thanks
> > Konstantin
>
> Thanks Konstantin, you beat me to the punch. I have a slightly different
> version with comments explaining the synchronizes with relationships. If
> you don't mind I can send it as V4.
Yep, sure feel free, but please try to make it asap.
Otherwise we risk to miss 25.11 deadline.
Konstantin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-11 7:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-02 17:41 [PATCH v2 0/1] ring: correct ordering issue in head/tail update Wathsala Vithanage
2025-10-02 17:41 ` [PATCH v2 1/1] ring: safe partial ordering for " Wathsala Vithanage
2025-10-08 8:00 ` Konstantin Ananyev
2025-10-13 18:09 ` Wathsala Vithanage
2025-11-06 8:15 ` David Marchand
2025-11-10 10:25 ` Konstantin Ananyev
2025-11-10 19:13 ` Wathsala Vithanage
2025-11-11 7:48 ` Konstantin Ananyev
2025-11-10 10:17 ` [PATCH v3 0/1] ring: correct ordering issue in " Konstantin Ananyev
2025-11-10 10:17 ` [PATCH v3 1/1] ring: fix unsafe ordering for " Konstantin Ananyev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).