From: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"jerinj@marvell.com" <jerinj@marvell.com>,
"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
"drc@linux.ibm.com" <drc@linux.ibm.com>, nd <nd@arm.com>,
nd <nd@arm.com>
Subject: RE: [PATCH v1 3/4] ring: fix potential sync issue between head and tail values
Date: Wed, 4 Jun 2025 03:47:39 +0000 [thread overview]
Message-ID: <PAWPR08MB8909A1EAE3BF1F467B144B219F6CA@PAWPR08MB8909.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <0832db1027614c918301e48010d92e4a@huawei.com>
Hi Konstantin,
> > > That cannot happen in Arm v8/v9 because tail update is a
> > > store-release and a load-acquire that program order follows it can
> > > only be issued after all the cores have observed the store-release
> > > (there is a synchronization with relationship to store-release and
> > > load-acquire pairs).
> > >
> > > In the example you have provided Thread #1's store(&prod.tail, 2,
> > > release) is observed by all the cores in the system by the time same
> > > thread performs load(&prod.tail, acquire) at T2. As per Arm v8/v9
> > > memory model Thread #2 should observe
> > > prod.tail=2 (not prod.tail=1).
> > >
> > > Arm Architecture Reference Manual section B2.10.11 states…
> > >
> > > "Where a Load-Acquire appears in program order after a
> > > Store-Release, the memory access generated by the Store-Release
> > > instruction is observed by each PE to the extent that PE is required
> > > to observe the access coherently, before the memory access generated
> > > by the Load-Acquire instruction is observed by that PE, to the
> > > extent that the PE is required to observe the access coherently."
>
> Interesting, thanks for pointing to that.
> Indeed, according to the docs, acquire/release operations use a sequentially
> consistent model.
> But we not always getting pure LDAR instruction here, it depends on compiler
> march/mcpu flags.
> For some HW arch gcc (and clang) generate LDAR instruction, for others -
> LDPAPR.
> And according to the doc:
> https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-
> Store-Release-instructions
> for Load-AcquirePC the requirement that Load-Acquires are observed in order
> with preceding Store-Releases is dropped.
> To be more specific here, on my box:
> 1) meson with -Dmachine=default (gcc options: "-march=armv8-a+crc")
> generates code with LDAR instruction https://godbolt.org/z/5cEscdxrb
> 2) meson with -Dmachine=native (gcc options: " -mcpu=neoverse-n1") generates
> code with LDAPR instruction https://godbolt.org/z/8KjcvYxch
>
> soring_stress_test for 1) (LDAR) passes
> soring_stress_test for 2) (LDAPR) fails with symptoms I described earlier
>
This makes sense, Load-AcquirePC paired with Store-Release gives you
Release-Consistency-program-consistency (RCpc) whereas Load-Acquire
paired with a Store-Release gives you
Release-Consistency-sequential-consistency (RCsc) a stronger consistency
model compared to RCpc.
As you have pointed out this is what causes the issue you have observed.
LDAPR on the same address on the same core that program order follows
STLR doesn't cause other cores to observe the value written by STLR.
However, contrary to this behavior C11 expects __ATOMIC_RELEASE and
__ATOMIC_ACQUIRE pairs to be sequentially consistent which the ring relies
on. Thus, __ATOMIC_ACQUIRE cannot emit an LDAPR if paired with an
__ATOMIC_RELEASE.
To correct this behavior, I suggest changing both the tail update to
store(&prod.tail, 2, release) to store(&prod.tail, 2, ceq_cst) and
load(&prod.tail, acquire) to d(&prod.tail, ceq_cst);
This would result in an STLR and LDAR (instead of an LDAPR), so
essentially, we are going back to the desired old behavior.
I have raised this issue with the compiler folks to get some clarity on
apparent conflict with the C11 spec.
Additionally, when you specify " -mcpu=neoverse-n1" GCC knows that
Neoverse-n1 CPUs have FEAT_LRCPC which means it implements LDAPR
Instruction. But on the same platform when you only specify
"-march=armv8-a+crc" GCC doesn't know that LDAPR is supported unless
you also append +rcpc to that string, so you end up with LDAR.
(Why we made the recent build changes)
Thanks
--wathsala
> >
> > In soring is this the pair that update the tail and move head?
> >
> > __rte_soring_update_tail:
> > __rte_ring_update_tail(&rht->ht, head, next, st, enq);
>
> That function is used to update prod.tail and cons.tail
>
> > __rte_soring_move_cons_head:
> > __rte_ring_headtail_move_head(&r->cons.ht, &r->stage[stage].ht, 0,
> > ...);
> >
>
> __rte_ring_headtail_move_head is invoked to update prod.head (by
> __rte_soring_move_prod_head) and cons.head (by
> __rte_soring_move_cons_head).
>
> > If so, &rht->ht and &r->stage[stage].ht are the same address?
>
> No.
> In conventional ring we have just two stages: 'prod' and 'cons'.
> So moving prod.head depends on cons.tail and visa-versa: moving cons.head
> depends on prod.tail.
> With soring we have multiple (N) extra stages in between.
> So the dependency chain is a bit longer:
> prod.head depends on cons.tail
> stage[0].head depends on prod.tail,
> stage[1].head depends on stage[0].tail
> ...
> stage{N-1].head depends on stage[N-2].tail cons.head depends on stage[N-1].tail
>
> > If they are
> > not, then you will run into the issue you have seen (a.k.a
> > "Other-multi-copy atomic" which is legit in Arm v8 and above).
>
> Can you probably elaborate a bit more for me here?
> Thanks
> Konstantin
>
>
> > Thanks.
> >
> > --wathsala
> >
> >
> > > > -----Original Message-----
> > > > From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > > Sent: Monday, May 26, 2025 6:54 AM
> > > > To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>;
> > > > dev@dpdk.org
> > > > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > > > jerinj@marvell.com; hemant.agrawal@nxp.com; drc@linux.ibm.com; nd
> > > > <nd@arm.com>
> > > > Subject: RE: [PATCH v1 3/4] ring: fix potential sync issue between
> > > > head and
> > > tail
> > > > values
> > > >
> > > > Hi Wathsala,
> > > >
> > > > >
> > > > > Hi Konstanin,
> > > > >
> > > > > In rte_ring the store-release on tail update guarantees that CAS
> > > > > won't get reordered with the store-released of the tail update.
> > > > >
> > > > > So, the sequence of events would look like this (combined view
> > > > > of head and tail update)
> > > > >
> > > > > Releaxed-load(new_head, N) ----------------> (A)
> > > > > Relaxed-CAS(d->head, new_head, old_head) ----------------> (B)
> > > > > Store-release-store(d->tail, new_head) ----------------> (C)
> > > > >
> > > > > If we look at address dependencies, then...
> > > > >
> > > > > (B) depends on (A) due to new_head address dependency.
> > > > > (C) depends on (A) due to new_head address dependency.
> > > > >
> > > > > So, dependency graph looks like this
> > > > > (A)
> > > > > / \
> > > > > v v
> > > > > (B) (C)
> > > > >
> > > > > There is no implicit dependence between (B) and (C), I think
> > > > > this is the issue you are brining up.
> > > > > Even though there is no dependence between the two, the
> > > > > store-release of (C) ensures that (B) won't drop below it.
> > > > > Therefore, the above graph can be turned into an ordered
> > > > > sequence as shown below..
> > > > >
> > > > > (A) -> (B) -> (C)
> > > >
> > > > I do agree that with current implementation of
> > > > __rte_ring_headtail_move_head()
> > > > in lib/ring/rte_ring_c11_pvt.h the order of these 3 operations
> > > > (A->B->C) should be sequential.
> > > > The problem I am talking about is a different one:
> > > > thread can see 'latest' 'cons.head' value, with 'previous' value
> > > > for 'prod.tail' or visa-versa.
> > > > In other words: 'cons.head' value depends on 'prod.tail', so
> > > > before making latest 'cons.head'
> > > > value visible to other threads, we need to ensure that latest
> > > > 'prod.tail' is also visible.
> > > > Let me try to explain it on the example:
> > > >
> > > > Suppose at some moment we have:
> > > > prod={.head=2,.tail=1};
> > > > cons={.head=0,.tail=0};
> > > > I.e. thead #1 is in process to enqueue one more element into the ring.
> > > >
> > > > Thread #1 Thread #2
> > > > T0:
> > > > store(&prod.tail, 2, release);
> > > > /*AFAIU: this is somewhat equivalent to: wmb(); prod.tail=2;
> > > > * I.E. - it guarantees that all stores initiated before that
> > > > operation will be visible
> > > > * by other threads at the same moment or before new value of
> > > > prod.tail wlll become
> > > > * visible, but it doesn't guarantee that new prod.tail value
> > > > will be visible to other
> > > > * threads immediately.
> > > > */
> > > > ...
> > > > move_cons_head(...,n=2)
> > > move_cons_head(...,n=1)
> > > > ... ...
> > > > T1:
> > > > *old_head = load(&cons.head, relaxed);
> > > > fence(acquire);
> > > > /*old_head == 0, no surprises */
> > > > stail = load(&prod.tail, acquire);
> > > > /*stail == 2, no surprises */
> > > > *entries = (capacity + stail - *old_head); *new_head = *old_head
> > > > + n;
> > > > /* *entries == (2 - 0) == 2; *new_head==2; all good */ ...
> > > > T2:
> > > >
> > > > *old_head = load(&cons.head, relaxed);
> > > > fence(acquire);
> > > >
> > > > /*old_head == 0, no
> > > surprises
> > > > */
> > > >
> > > > stail = load(&prod.tail, acquire);
> > > > /* !!!!! stail == 1 !!!!! for Thread 2
> > > > * Even though we do use acquire here - there was no *release*
> > > > after store(&prod.tail).
> > > > * So, still no guarantee that Thread 2 will see latest prod.tail value.
> > > > */
> > > >
> > > > *entries = (capacity + stail - *old_head);
> > > >
> > > > /* *entries == (1 - 0) == 1,
> > > still
> > > > ok */
> > > > *new_head =
> *old_head + n;
> > > >
> > > > /* *new_head == 1 */
> > > > T3:
> > > > success = CAS(&cons.head,
> > > > old_head /*==0/, *new_head /*==2*/,
> > > > relaxed, relaxed);
> > > > /*success==1, cons.head==2*/
> > > > ...
> > > > T4:
> > > > success =
> CAS(&cons.head,
> > > >
> > > > old_head /*==0/, *new_head /*==1*/,
> > > > relaxed, relaxed);
> > > >
> > > > /*success==0, *old_head==2*/
> > > > /* CAS() failed and provided Thread 2 with latest valued for
> > > > cons.head(==2)
> > > > * Thread 2 repeats attempt, starts second iteration */
> > > > fence(acquire);
> > > >
> > > > stail = load(&prod.tail, acquire);
> > > > /* !!!!! stail == 1 !!!!! for Thread 2
> > > > * Still no *release* had happened after store(&prod.tail) at T0.
> > > > * So, still no guarantee that Thread 2 will see latest prod.tail value.
> > > > */
> > > >
> > > > *entries = (capacity + stail - *old_head);
> > > >
> > > > *new_head = *old_head + n;
> > > >
> > > > /* !!!!! *entries == (1 - 2) == -1(UINT32_MAX); *new_head==(2+1)==3; !!!!!
> > > > * we are about to corrupt our ring !!!!!
> > > > */
> > > >
> > > > >
> > > > > I haven't looked at the so-ring yet. Could it be possible that
> > > > > the issue is due to something else introduced in that code?
> > > >
> > > > Well, as I said, so far I wasn't able to re-produce this problem
> > > > with conventional ring (ring_stress_autotest), only
> > > > soring_stress_autotest is failing and for now - only on one
> > > > specific ARM platform.
> > > > Regarding soring specific fix (without touching common code) -
> > > > sure it is also possible, pls see patch #2.
> > > > There I just added 'fence(release);' straight after 'store(&tail);'
> > > > That's seems enough to fix that problem within the soring only.
> > > > Though, from my understanding rte_ring might also be affected,
> > > > that's why I went ahead and prepared that patch.
> > > > If you feel, that I a missing something here - pls shout.
> > > > Konstantin
> > > >
> > > >
> > > > > Thanks,
> > > > >
> > > > > --wathsala
> > > > >
> > > > > > This patch aims several purposes:
> > > > > > - provide an alternative (and I think a better) way to fix the
> > > > > > issue discussed in previous patch:
> > > > > > "ring/soring: fix synchronization issue between head and tail values"
> > > > > > - make sure that such problem wouldn’t happen within other usages of
> > > > > > __rte_ring_headtail_move_head() – both current rte_ring
> > > > > > implementation and possible future use-cases.
> > > > > > - step towards unification of move_head() implementations and
> > > > > > removing rte_ring_generic_pvt.h It uses Acquire-Release
> > > > > > memory ordering for CAS operation in
> > > > move_head().
> > > > > > That guarantees that corresponding ‘tail’ updates will be
> > > > > > visible before
> > > > current
> > > > > > ‘head’ is updated.
> > > > > > As I said before: I think that in theory the problem described
> > > > > > in previous
> > > > patch
> > > > > > might happen with our conventional rte_ring too (when
> > > > > > RTE_USE_C11_MEM_MODEL enabled).
> > > > > > But, so far I didn’t manage to reproduce it in reality.
> > > > > > For that reason and also because it touches a critical
> > > > > > rte_ring code-path, I
> > > > put
> > > > > > these changes into a separate patch. Expect all interested
> > > > > > stakeholders to
> > > > come-
> > > > > > up with their comments and observations.
> > > > > > Regarding performance impact – on my boxes both
> > > > > > ring_perf_autotest
> > > and
> > > > > > ring_stress_autotest – show a mixed set of results: some of
> > > > > > them
> > > become
> > > > few
> > > > > > cycles faster, another few cycles slower.
> > > > > > But so far, I didn’t notice any real degradations with that patch.
> > > > > >
> > > > > > Fixes: b5458e2cc483 ("ring: introduce staged ordered ring")
> > > > > > Fixes: 1cc363b8ce06 ("ring: introduce HTS ring mode")
> > > > > > Fixes: e6ba4731c0f3 ("ring: introduce RTS ring mode")
> > > > > > Fixes: 49594a63147a ("ring/c11: relax ordering for load and
> > > > > > store of the
> > > > head")
> > > > > >
> > > > > > Signed-off-by: Konstantin Ananyev
> > > > > > <konstantin.ananyev@huawei.com>
> > > > > > ---
> > > > > > lib/ring/rte_ring_c11_pvt.h | 27 +++++++++++++++++----------
> > > > > > lib/ring/rte_ring_hts_elem_pvt.h | 6 ++++--
> > > > lib/ring/rte_ring_rts_elem_pvt.h
> > > > > > | 6 ++++--
> > > > > > lib/ring/soring.c | 5 -----
> > > > > > 4 files changed, 25 insertions(+), 19 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/ring/rte_ring_c11_pvt.h
> > > > > > b/lib/ring/rte_ring_c11_pvt.h
> > > index
> > > > > > 0845cd6dcf..6d1c46df9a 100644
> > > > > > --- a/lib/ring/rte_ring_c11_pvt.h
> > > > > > +++ b/lib/ring/rte_ring_c11_pvt.h
> > > > > > @@ -77,20 +77,19 @@ __rte_ring_headtail_move_head(struct
> > > > > > rte_ring_headtail *d,
> > > > > > int success;
> > > > > > unsigned int max = n;
> > > > > >
> > > > > > + /* Ensure the head is read before tail */
> > > > > > *old_head = rte_atomic_load_explicit(&d->head,
> > > > > > - rte_memory_order_relaxed);
> > > > > > + rte_memory_order_acquire);
> > > > > > do {
> > > > > > /* Reset n to the initial burst count */
> > > > > > n = max;
> > > > > >
> > > > > > - /* Ensure the head is read before tail */
> > > > > > - rte_atomic_thread_fence(rte_memory_order_acquire);
> > > > > > -
> > > > > > - /* load-acquire synchronize with store-release of ht-
> >tail
> > > > > > - * in update_tail.
> > > > > > + /*
> > > > > > + * Read s->tail value. Note that it will be loaded after
> > > > > > + * d->head load, but before CAS operation for the d-
> >head.
> > > > > > */
> > > > > > stail = rte_atomic_load_explicit(&s->tail,
> > > > > > - rte_memory_order_acquire);
> > > > > > + rte_memory_order_relaxed);
> > > > > >
> > > > > > /* The subtraction is done between two unsigned 32bits
> value
> > > > > > * (the result is always modulo 32 bits even if we have
> @@
> > > > > > -
> > > > > > 112,11 +111,19 @@ __rte_ring_headtail_move_head(struct
> > > > rte_ring_headtail
> > > > > > *d,
> > > > > > d->head = *new_head;
> > > > > > success = 1;
> > > > > > } else
> > > > > > - /* on failure, *old_head is updated */
> > > > > > + /*
> > > > > > + * on failure, *old_head is updated.
> > > > > > + * this CAS(ACQ_REL, ACQUIRE) serves as a
> hoist
> > > > > > + * barrier to prevent:
> > > > > > + * - OOO reads of cons tail value
> > > > > > + * - OOO copy of elems from the ring
> > > > > > + * Also RELEASE guarantees that latest tail
> value
> > > > > > + * will become visible before the new head
> value.
> > > > > > + */
> > > > > > success =
> > > > > > rte_atomic_compare_exchange_strong_explicit(
> > > > > > &d->head, old_head,
> *new_head,
> > > > > > - rte_memory_order_relaxed,
> > > > > > - rte_memory_order_relaxed);
> > > > > > + rte_memory_order_acq_rel,
> > > > > > + rte_memory_order_acquire);
> > > > > > } while (unlikely(success == 0));
> > > > > > return n;
> > > > > > }
> > > > > > diff --git a/lib/ring/rte_ring_hts_elem_pvt.h
> > > > b/lib/ring/rte_ring_hts_elem_pvt.h
> > > > > > index c59e5f6420..cc593433b9 100644
> > > > > > --- a/lib/ring/rte_ring_hts_elem_pvt.h
> > > > > > +++ b/lib/ring/rte_ring_hts_elem_pvt.h
> > > > > > @@ -116,13 +116,15 @@ __rte_ring_hts_move_head(struct
> > > > > > rte_ring_hts_headtail *d,
> > > > > > np.pos.head = op.pos.head + n;
> > > > > >
> > > > > > /*
> > > > > > - * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to
> prevent:
> > > > > > + * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to
> prevent:
> > > > > > * - OOO reads of cons tail value
> > > > > > * - OOO copy of elems from the ring
> > > > > > + * Also RELEASE guarantees that latest tail value
> > > > > > + * will become visible before the new head value.
> > > > > > */
> > > > > > } while (rte_atomic_compare_exchange_strong_explicit(&d-
> >ht.raw,
> > > > > > (uint64_t *)(uintptr_t)&op.raw, np.raw,
> > > > > > - rte_memory_order_acquire,
> > > > > > + rte_memory_order_acq_rel,
> > > > > > rte_memory_order_acquire) == 0);
> > > > > >
> > > > > > *old_head = op.pos.head;
> > > > > > diff --git a/lib/ring/rte_ring_rts_elem_pvt.h
> > > > b/lib/ring/rte_ring_rts_elem_pvt.h
> > > > > > index 509fa674fb..860b13cc61 100644
> > > > > > --- a/lib/ring/rte_ring_rts_elem_pvt.h
> > > > > > +++ b/lib/ring/rte_ring_rts_elem_pvt.h
> > > > > > @@ -131,13 +131,15 @@ __rte_ring_rts_move_head(struct
> > > > > > rte_ring_rts_headtail *d,
> > > > > > nh.val.cnt = oh.val.cnt + 1;
> > > > > >
> > > > > > /*
> > > > > > - * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to
> prevent:
> > > > > > + * this CAS(ACQ_REL, ACQUIRE) serves as a hoist barrier to
> prevent:
> > > > > > * - OOO reads of cons tail value
> > > > > > * - OOO copy of elems to the ring
> > > > > > + * Also RELEASE guarantees that latest tail value
> > > > > > + * will become visible before the new head value.
> > > > > > */
> > > > > > } while (rte_atomic_compare_exchange_strong_explicit(&d-
> > > > >head.raw,
> > > > > > (uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> > > > > > - rte_memory_order_acquire,
> > > > > > + rte_memory_order_acq_rel,
> > > > > > rte_memory_order_acquire) == 0);
> > > > > >
> > > > > > *old_head = oh.val.pos;
> > > > > > diff --git a/lib/ring/soring.c b/lib/ring/soring.c index
> > > > 7bcbf35516..21a1a27e24
> > > > > > 100644
> > > > > > --- a/lib/ring/soring.c
> > > > > > +++ b/lib/ring/soring.c
> > > > > > @@ -123,8 +123,6 @@ __rte_soring_stage_finalize(struct
> > > > > > soring_stage_headtail *sht, uint32_t stage,
> > > > > > rte_atomic_store_explicit(&sht->tail.raw, ot.raw,
> > > > > > rte_memory_order_release);
> > > > > >
> > > > > > - /* make sure that new tail value is visible */
> > > > > > - rte_atomic_thread_fence(rte_memory_order_release);
> > > > > > return i;
> > > > > > }
> > > > > >
> > > > > > @@ -219,9 +217,6 @@ __rte_soring_update_tail(struct
> > > > __rte_ring_headtail
> > > > > > *rht,
> > > > > > /* unsupported mode, shouldn't be here */
> > > > > > RTE_ASSERT(0);
> > > > > > }
> > > > > > -
> > > > > > - /* make sure that new tail value is visible */
> > > > > > - rte_atomic_thread_fence(rte_memory_order_release);
> > > > > > }
> > > > > >
> > > > > > static __rte_always_inline uint32_t
> > > > > > --
> > > > > > 2.43.0
next prev parent reply other threads:[~2025-06-04 3:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 11:14 [PATCH v1 0/4] ring: some fixes and improvements Konstantin Ananyev
2025-05-21 11:14 ` [PATCH v1 1/4] ring: introduce extra run-time checks Konstantin Ananyev
2025-05-21 12:14 ` Morten Brørup
2025-05-21 12:34 ` Konstantin Ananyev
2025-05-21 18:36 ` Morten Brørup
2025-05-21 19:38 ` Konstantin Ananyev
2025-05-21 22:02 ` Morten Brørup
2025-05-26 8:39 ` Konstantin Ananyev
2025-05-26 9:57 ` Morten Brørup
2025-05-21 11:14 ` [PATCH v1 2/4] ring/soring: fix head-tail synchronization issue Konstantin Ananyev
2025-05-21 11:14 ` [PATCH v1 3/4] ring: fix potential sync issue between head and tail values Konstantin Ananyev
2025-05-21 20:26 ` Morten Brørup
2025-05-22 22:03 ` Wathsala Wathawana Vithanage
2025-05-26 11:54 ` Konstantin Ananyev
2025-05-27 21:33 ` Wathsala Wathawana Vithanage
2025-05-27 22:47 ` Wathsala Wathawana Vithanage
2025-06-02 11:11 ` Konstantin Ananyev
2025-06-04 3:47 ` Wathsala Wathawana Vithanage [this message]
2025-06-04 17:20 ` Wathsala Wathawana Vithanage
2025-05-21 11:14 ` [PATCH v1 4/4] config/x86: enable RTE_USE_C11_MEM_MODEL by default Konstantin Ananyev
2025-05-21 19:47 ` Morten Brørup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PAWPR08MB8909A1EAE3BF1F467B144B219F6CA@PAWPR08MB8909.eurprd08.prod.outlook.com \
--to=wathsala.vithanage@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=dev@dpdk.org \
--cc=drc@linux.ibm.com \
--cc=hemant.agrawal@nxp.com \
--cc=jerinj@marvell.com \
--cc=konstantin.ananyev@huawei.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).