RE: [PATCH v1 3/4] ring: fix potential sync issue between head and tail values

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
	"drc@linux.ibm.com" <drc@linux.ibm.com>, nd <nd@arm.com>
Subject: RE: [PATCH v1 3/4] ring: fix potential sync issue between head and tail values
Date: Mon, 26 May 2025 11:54:27 +0000	[thread overview]
Message-ID: <62ba814dc50a4ffd9bcfe8f4d4073cd9@huawei.com> (raw)
In-Reply-To: <PAWPR08MB8909895DF398ACB52C8D66719F99A@PAWPR08MB8909.eurprd08.prod.outlook.com>

Hi Wathsala,

> 
> Hi Konstanin,
> 
> In rte_ring the store-release on tail update guarantees that CAS
> won't get reordered with the store-released of the tail update.
> 
> So, the sequence of events would look like this (combined view
> of head and tail update)
> 
> Releaxed-load(new_head,  N)                              ----------------> (A)
> Relaxed-CAS(d->head, new_head, old_head)   ----------------> (B)
> Store-release-store(d->tail, new_head)             ----------------> (C)
> 
> If we look at address dependencies, then...
> 
> (B) depends on (A) due to new_head address dependency.
> (C) depends on (A) due to new_head address dependency.
> 
> So, dependency graph looks like this
>        (A)
>     /       \
>    v        v
>  (B)     (C)
> 
> There is no implicit dependence between (B) and (C), I think
> this is the issue you are brining up.
> Even though there is no dependence between the two,
> the store-release of (C) ensures that (B) won't drop below it.
> Therefore, the above graph can be turned into an ordered
> sequence as shown below..
> 
> (A) -> (B) -> (C)

I do agree that with current implementation of __rte_ring_headtail_move_head()
in lib/ring/rte_ring_c11_pvt.h the order of these 3 operations (A->B->C) should be sequential.
The problem I am talking about is a different one:
thread can see 'latest' 'cons.head' value, with 'previous' value for 'prod.tail' or visa-versa.
In other words: 'cons.head' value depends on 'prod.tail', so before making latest 'cons.head'
value visible to other threads, we need to ensure that latest 'prod.tail' is also visible. 
Let me try to explain it on the example:

Suppose at some moment we have:
prod={.head=2,.tail=1};
cons={.head=0,.tail=0};
I.e. thead #1 is in process to enqueue one more element into the ring.

       Thread #1                                                                             Thread #2
T0:
  store(&prod.tail, 2, release);
  /*AFAIU: this is somewhat equivalent to: wmb(); prod.tail=2;
  * I.E. - it guarantees that all stores initiated before that operation will be visible
  * by other threads at the same moment or before new value of prod.tail wlll become
  * visible, but it doesn't guarantee that new prod.tail  value will be visible  to other
  * threads immediately.
  */ 
...
move_cons_head(...,n=2)                                                    move_cons_head(...,n=1)
...                                                                                              ...
T1:
  *old_head = load(&cons.head, relaxed);                          
  fence(acquire);
  /*old_head == 0, no surprises */
  stail = load(&prod.tail, acquire);
  /*stail == 2, no surprises */
 *entries = (capacity + stail - *old_head);
*new_head = *old_head + n;
 /* *entries == (2 - 0) == 2; *new_head==2; all good */ 
...
T2:
                                                                                                *old_head = load(&cons.head, relaxed);
                                                                                                fence(acquire);
                                                                                                /*old_head == 0, no surprises */
                                                                                                stail = load(&prod.tail, acquire);
/* !!!!! stail == 1  !!!!! for Thread 2
 * Even though we do use acquire here - there was no *release* after store(&prod.tail).
 * So, still no guarantee that Thread 2 will see latest prod.tail value.
 */
                                                                                               *entries = (capacity + stail - *old_head);
                                                                                               /* *entries == (1 - 0) == 1, still ok */
                                                                                               *new_head = *old_head + n;
                                                                                               /* *new_head == 1 */
T3:
  success = CAS(&cons.head,
    old_head /*==0/, *new_head /*==2*/,
    relaxed, relaxed);
  /*success==1, cons.head==2*/
 ...   
T4:   
                                                                                           success = CAS(&cons.head,
                                                                                           old_head /*==0/, *new_head /*==1*/,
                                                                                          relaxed, relaxed);
                                                                                          /*success==0, *old_head==2*/
/* CAS() failed and provided Thread 2 with latest valued for cons.head(==2)
 *  Thread 2 repeats attempt, starts second iteration
 */
                                                                                          fence(acquire);
                                                                                          stail = load(&prod.tail, acquire);
/* !!!!! stail == 1  !!!!! for Thread 2
 * Still no *release* had happened after store(&prod.tail) at T0.
 * So, still no guarantee that Thread 2 will see latest prod.tail value.
 */
                                                                                          *entries = (capacity + stail - *old_head);
                                                                                          *new_head = *old_head + n;
                                                                                          
/* !!!!! *entries == (1 - 2) == -1(UINT32_MAX); *new_head==(2+1)==3; !!!!!
 *  we are about to corrupt our ring !!!!!
 */                                                                                        
                                                                          
> 
> I haven't looked at the so-ring yet. Could it be possible that the
> issue is due to something else introduced in that code?

Well, as I said, so far I wasn't able to re-produce this problem with conventional
ring (ring_stress_autotest), only soring_stress_autotest is failing and
for now - only on one specific ARM platform.
Regarding soring specific fix (without touching common code) - 
sure it is also possible, pls see patch #2.
There I just added 'fence(release);' straight after 'store(&tail);'   
That's seems enough to fix that problem within the soring only.
Though, from my understanding rte_ring might also be affected,
that's why I went ahead and prepared that patch. 
If you feel, that I a missing something here - pls shout.
Konstantin 


> Thanks,
> 
> --wathsala
> 
> > This patch aims several purposes:
> > - provide an alternative (and I think a better) way to fix the
> >   issue discussed in previous patch:
> >   "ring/soring: fix synchronization issue between head and tail values"
> > - make sure that such problem wouldn’t happen within other usages of
> >   __rte_ring_headtail_move_head() – both current rte_ring
> >   implementation and possible future use-cases.
> > - step towards unification of move_head() implementations and
> >   removing rte_ring_generic_pvt.h
> > It uses Acquire-Release memory ordering for CAS operation in move_head().
> > That guarantees that corresponding ‘tail’ updates will be visible before current
> > ‘head’ is updated.
> > As I said before: I think that in theory the problem described in previous patch
> > might happen with our conventional rte_ring too (when
> > RTE_USE_C11_MEM_MODEL enabled).
> > But, so far I didn’t manage to reproduce it in reality.
> > For that reason and also because it touches a critical rte_ring code-path, I put
> > these changes into a separate patch. Expect all interested stakeholders to come-
> > up with their comments and observations.
> > Regarding performance impact – on my boxes both ring_perf_autotest and
> > ring_stress_autotest – show a mixed set of results: some of them become few
> > cycles faster, another few cycles slower.
> > But so far, I didn’t notice any real degradations with that patch.
> >
> > Fixes: b5458e2cc483 ("ring: introduce staged ordered ring")
> > Fixes: 1cc363b8ce06 ("ring: introduce HTS ring mode")
> > Fixes: e6ba4731c0f3 ("ring: introduce RTS ring mode")
> > Fixes: 49594a63147a ("ring/c11: relax ordering for load and store of the head")
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > ---
> >  lib/ring/rte_ring_c11_pvt.h      | 27 +++++++++++++++++----------
> >  lib/ring/rte_ring_hts_elem_pvt.h |  6 ++++--  lib/ring/rte_ring_rts_elem_pvt.h
> > |  6 ++++--
> >  lib/ring/soring.c                |  5 -----
> >  4 files changed, 25 insertions(+), 19 deletions(-)
> >
> > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index
> > 0845cd6dcf..6d1c46df9a 100644
> > --- a/lib/ring/rte_ring_c11_pvt.h
> > +++ b/lib/ring/rte_ring_c11_pvt.h
> > @@ -77,20 +77,19 @@ __rte_ring_headtail_move_head(struct
> > rte_ring_headtail *d,
> >  	int success;
> >  	unsigned int max = n;
> >
> > +	/* Ensure the head is read before tail */
> >  	*old_head = rte_atomic_load_explicit(&d->head,
> > -			rte_memory_order_relaxed);
> > +			rte_memory_order_acquire);
> >  	do {
> >  		/* Reset n to the initial burst count */
> >  		n = max;
> >
> > -		/* Ensure the head is read before tail */
> > -		rte_atomic_thread_fence(rte_memory_order_acquire);
> > -
> > -		/* load-acquire synchronize with store-release of ht->tail
> > -		 * in update_tail.
> > +		/*
> > +		 * Read s->tail value. Note that it will be loaded after
> > +		 * d->head load, but before CAS operation for the d->head.
> >  		 */
> >  		stail = rte_atomic_load_explicit(&s->tail,
> > -					rte_memory_order_acquire);
> > +					rte_memory_order_relaxed);
> >
> >  		/* The subtraction is done between two unsigned 32bits value
> >  		 * (the result is always modulo 32 bits even if we have @@ -
> > 112,11 +111,19 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail
> > *d,
> >  			d->head = *new_head;
> >  			success = 1;
> >  		} else
> > -			/* on failure, *old_head is updated */
> > +			/*
> > +			 * on failure, *old_head is updated.
> > +			 * this CAS(ACQ_REL, ACQUIRE) serves as a hoist
> > +			 * barrier to prevent:
> > +			 *  - OOO reads of cons tail value
> > +			 *  - OOO copy of elems from the ring
> > +			 *  Also RELEASE guarantees that latest tail value
> > +			 *  will become visible before the new head value.
> > +			 */
> >  			success =
> > rte_atomic_compare_exchange_strong_explicit(
> >  					&d->head, old_head, *new_head,
> > -					rte_memory_order_relaxed,
> > -					rte_memory_order_relaxed);
> > +					rte_memory_order_acq_rel,
> > +					rte_memory_order_acquire);
> >  	} while (unlikely(success == 0));
> >  	return n;
> >  }
> > diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
> > index c59e5f6420..cc593433b9 100644
> > --- a/lib/ring/rte_ring_hts_elem_pvt.h
> > +++ b/lib/ring/rte_ring_hts_elem_pvt.h
> > @@ -116,13 +116,15 @@ __rte_ring_hts_move_head(struct
> > rte_ring_hts_headtail *d,
> >  		np.pos.head = op.pos.head + n;
> >
> >  	/*
> > -	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
> > +	 * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to prevent:
> >  	 *  - OOO reads of cons tail value
> >  	 *  - OOO copy of elems from the ring
> > +	 *   Also RELEASE guarantees that latest tail value
> > +	 *   will become visible before the new head value.
> >  	 */
> >  	} while (rte_atomic_compare_exchange_strong_explicit(&d->ht.raw,
> >  			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> > -			rte_memory_order_acquire,
> > +			rte_memory_order_acq_rel,
> >  			rte_memory_order_acquire) == 0);
> >
> >  	*old_head = op.pos.head;
> > diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
> > index 509fa674fb..860b13cc61 100644
> > --- a/lib/ring/rte_ring_rts_elem_pvt.h
> > +++ b/lib/ring/rte_ring_rts_elem_pvt.h
> > @@ -131,13 +131,15 @@ __rte_ring_rts_move_head(struct
> > rte_ring_rts_headtail *d,
> >  		nh.val.cnt = oh.val.cnt + 1;
> >
> >  	/*
> > -	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
> > +	 * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to prevent:
> >  	 *  - OOO reads of cons tail value
> >  	 *  - OOO copy of elems to the ring
> > +	 *  Also RELEASE guarantees that latest tail value
> > +	 *  will become visible before the new head value.
> >  	 */
> >  	} while (rte_atomic_compare_exchange_strong_explicit(&d->head.raw,
> >  			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> > -			rte_memory_order_acquire,
> > +			rte_memory_order_acq_rel,
> >  			rte_memory_order_acquire) == 0);
> >
> >  	*old_head = oh.val.pos;
> > diff --git a/lib/ring/soring.c b/lib/ring/soring.c index 7bcbf35516..21a1a27e24
> > 100644
> > --- a/lib/ring/soring.c
> > +++ b/lib/ring/soring.c
> > @@ -123,8 +123,6 @@ __rte_soring_stage_finalize(struct
> > soring_stage_headtail *sht, uint32_t stage,
> >  	rte_atomic_store_explicit(&sht->tail.raw, ot.raw,
> >  			rte_memory_order_release);
> >
> > -	/* make sure that new tail value is visible */
> > -	rte_atomic_thread_fence(rte_memory_order_release);
> >  	return i;
> >  }
> >
> > @@ -219,9 +217,6 @@ __rte_soring_update_tail(struct __rte_ring_headtail
> > *rht,
> >  		/* unsupported mode, shouldn't be here */
> >  		RTE_ASSERT(0);
> >  	}
> > -
> > -	/* make sure that new tail value is visible */
> > -	rte_atomic_thread_fence(rte_memory_order_release);
> >  }
> >
> >  static __rte_always_inline uint32_t
> > --
> > 2.43.0

next prev parent reply	other threads:[~2025-05-26 11:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21 11:14 [PATCH v1 0/4] ring: some fixes and improvements Konstantin Ananyev
2025-05-21 11:14 ` [PATCH v1 1/4] ring: introduce extra run-time checks Konstantin Ananyev
2025-05-21 12:14   ` Morten Brørup
2025-05-21 12:34     ` Konstantin Ananyev
2025-05-21 18:36       ` Morten Brørup
2025-05-21 19:38         ` Konstantin Ananyev
2025-05-21 22:02           ` Morten Brørup
2025-05-26  8:39             ` Konstantin Ananyev
2025-05-26  9:57               ` Morten Brørup
2025-05-21 11:14 ` [PATCH v1 2/4] ring/soring: fix head-tail synchronization issue Konstantin Ananyev
2025-05-21 11:14 ` [PATCH v1 3/4] ring: fix potential sync issue between head and tail values Konstantin Ananyev
2025-05-21 20:26   ` Morten Brørup
2025-05-22 22:03   ` Wathsala Wathawana Vithanage
2025-05-26 11:54     ` Konstantin Ananyev [this message]
2025-05-27 21:33       ` Wathsala Wathawana Vithanage
2025-05-27 22:47         ` Wathsala Wathawana Vithanage
2025-05-21 11:14 ` [PATCH v1 4/4] config/x86: enable RTE_USE_C11_MEM_MODEL by default Konstantin Ananyev
2025-05-21 19:47   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62ba814dc50a4ffd9bcfe8f4d4073cd9@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.ibm.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).