DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Phil Yang (Arm Technology China)" <Phil.Yang@arm.com>
To: David Marchand <david.marchand@redhat.com>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	Gage Eads <gage.eads@intel.com>, dev <dev@dpdk.org>,
	"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
Date: Tue, 15 Oct 2019 11:32:30 +0000	[thread overview]
Message-ID: <VE1PR08MB4640CF1F1C82F03C75AF16FEE9930@VE1PR08MB4640.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CAJFAV8wvXy_Vrp6JtSEy56fFt2dUiOq2igw77+EnG6Z-q+7sjQ@mail.gmail.com>

Hi David,

Thanks for your comments. I have addressed most of them in v10.  Please review it.
Some comments inline.
 
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, October 14, 2019 11:44 PM
> To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>
> Cc: thomas@monjalon.net; jerinj@marvell.com; Gage Eads
> <gage.eads@intel.com>; dev <dev@dpdk.org>; hemant.agrawal@nxp.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic
> compare exchange
> 
> On Wed, Aug 14, 2019 at 10:29 AM Phil Yang <phil.yang@arm.com> wrote:
> >
> > Add 128-bit atomic compare exchange on aarch64.
> 
> A bit short, seeing the complexity of the code and the additional
> RTE_ARM_FEATURE_ATOMICS config flag.
Updated in v10. 

<snip>

> >
> > +/*------------------------ 128 bit atomic operations -------------------------*/
> > +
> > +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> __ATOMIC_RELEASE)
> > +#define __HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || (mo) ==
> __ATOMIC_ACQ_REL || \
> > +                                         (mo) == __ATOMIC_SEQ_CST)
> > +
> > +#define __MO_LOAD(mo)  (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE :
> __ATOMIC_RELAXED)
> > +#define __MO_STORE(mo) (__HAS_RLS((mo)) ? __ATOMIC_RELEASE :
> __ATOMIC_RELAXED)
> 
> Those 4 first macros only make sense when LSE is not available (see below
> [1]).
> Besides, they are used only once, why not directly use those
> conditions where needed?

Agree. I removed __MO_LOAD and __MO_STORE in v10 and kept the __HAS_ACQ and __HAS_REL under the non-LSE condition branch in v10. 
I think they can make the code easy to read.

> 
> 
> > +
> > +#if defined(__ARM_FEATURE_ATOMICS) ||
> defined(RTE_ARM_FEATURE_ATOMICS)
> > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                          \
> > +static __rte_noinline rte_int128_t                                          \
> > +cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
> > +               rte_int128_t updated)                                       \
> > +{                                                                           \
> > +       /* caspX instructions register pair must start from even-numbered
> > +        * register at operand 1.
> > +        * So, specify registers for local variables here.
> > +        */                                                                 \
> > +       register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];            \
> > +       register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];            \
> > +       register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];        \
> > +       register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];        \
> > +       asm volatile(                                                       \
> > +               op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]"   \
> > +               : [old0] "+r" (x0),                                         \
> > +               [old1] "+r" (x1)                                            \
> > +               : [upd0] "r" (x2),                                          \
> > +               [upd1] "r" (x3),                                            \
> > +               [dst] "r" (dst)                                             \
> > +               : "memory");                                                \
> > +       old.val[0] = x0;                                                    \
> > +       old.val[1] = x1;                                                    \
> > +       return old;                                                         \
> > +}
> > +
> > +__ATOMIC128_CAS_OP(__rte_cas_relaxed, "casp")
> > +__ATOMIC128_CAS_OP(__rte_cas_acquire, "caspa")
> > +__ATOMIC128_CAS_OP(__rte_cas_release, "caspl")
> > +__ATOMIC128_CAS_OP(__rte_cas_acq_rel, "caspal")
> 
> If LSE is available, we expose __rte_cas_XX (explicitely) *non*
> inlined functions, while without LSE, we expose inlined __rte_ldr_XX
> and __rte_stx_XX functions.
> So we have a first disparity with non-inlined vs inlined functions
> depending on a #ifdef.
> Then, we have a second disparity with two sets of "apis" depending on
> this #ifdef.
> 
> And we expose those sets with a rte_ prefix, meaning people will try
> to use them, but those are not part of a public api.
> 
> Can't we do without them ? (see below [2] for a proposal with ldr/stx,
> cas should be the same)

No, it doesn't work. 
Because we need to verify the return value at the end of the loop for these macros. 

> 
> 
> > +#else
> > +#define __ATOMIC128_LDX_OP(ldx_op_name, op_string)                          \
> > +static inline rte_int128_t                                                  \
> > +ldx_op_name(const rte_int128_t *src)                                        \
> > +{                                                                           \
> > +       rte_int128_t ret;                                                   \
> > +       asm volatile(                                                       \
> > +                       op_string " %0, %1, %2"                             \
> > +                       : "=&r" (ret.val[0]),                               \
> > +                         "=&r" (ret.val[1])                                \
> > +                       : "Q" (src->val[0])                                 \
> > +                       : "memory");                                        \
> > +       return ret;                                                         \
> > +}
> > +
> > +__ATOMIC128_LDX_OP(__rte_ldx_relaxed, "ldxp")
> > +__ATOMIC128_LDX_OP(__rte_ldx_acquire, "ldaxp")
> > +
> > +#define __ATOMIC128_STX_OP(stx_op_name, op_string)                          \
> > +static inline uint32_t                                                      \
> > +stx_op_name(rte_int128_t *dst, const rte_int128_t src)                      \
> > +{                                                                           \
> > +       uint32_t ret;                                                       \
> > +       asm volatile(                                                       \
> > +                       op_string " %w0, %1, %2, %3"                        \
> > +                       : "=&r" (ret)                                       \
> > +                       : "r" (src.val[0]),                                 \
> > +                         "r" (src.val[1]),                                 \
> > +                         "Q" (dst->val[0])                                 \
> > +                       : "memory");                                        \
> > +       /* Return 0 on success, 1 on failure */                             \
> > +       return ret;                                                         \
> > +}
> > +
> > +__ATOMIC128_STX_OP(__rte_stx_relaxed, "stxp")
> > +__ATOMIC128_STX_OP(__rte_stx_release, "stlxp")
> > +#endif
> > +
> > +static inline int __rte_experimental
> 
> The __rte_experimental tag comes first.

Updated in v10.

> 
> 
> > +rte_atomic128_cmp_exchange(rte_int128_t *dst,
> > +                               rte_int128_t *exp,
> > +                               const rte_int128_t *src,
> > +                               unsigned int weak,
> > +                               int success,
> > +                               int failure)
> > +{
> > +       /* Always do strong CAS */
> > +       RTE_SET_USED(weak);
> > +       /* Ignore memory ordering for failure, memory order for
> > +        * success must be stronger or equal
> > +        */
> > +       RTE_SET_USED(failure);
> > +       /* Find invalid memory order */
> > +       RTE_ASSERT(success == __ATOMIC_RELAXED
> > +                       || success == __ATOMIC_ACQUIRE
> > +                       || success == __ATOMIC_RELEASE
> > +                       || success == __ATOMIC_ACQ_REL
> > +                       || success == __ATOMIC_SEQ_CST);
> > +
> > +#if defined(__ARM_FEATURE_ATOMICS) ||
> defined(RTE_ARM_FEATURE_ATOMICS)
> > +       rte_int128_t expected = *exp;
> > +       rte_int128_t desired = *src;
> > +       rte_int128_t old;
> > +
> > +       if (success == __ATOMIC_RELAXED)
> > +               old = __rte_cas_relaxed(dst, expected, desired);
> > +       else if (success == __ATOMIC_ACQUIRE)
> > +               old = __rte_cas_acquire(dst, expected, desired);
> > +       else if (success == __ATOMIC_RELEASE)
> > +               old = __rte_cas_release(dst, expected, desired);
> > +       else
> > +               old = __rte_cas_acq_rel(dst, expected, desired);
> > +#else
> 
> 1: the four first macros (on the memory ordering constraints) can be
> moved here then undef'd once unused.
> Or you can just do without them.

Updated in v10.

> 
> 
> > +       int ldx_mo = __MO_LOAD(success);
> > +       int stx_mo = __MO_STORE(success);
> > +       uint32_t ret = 1;
> > +       register rte_int128_t expected = *exp;
> > +       register rte_int128_t desired = *src;
> > +       register rte_int128_t old;
> > +
> > +       /* ldx128 can not guarantee atomic,
> > +        * Must write back src or old to verify atomicity of ldx128;
> > +        */
> > +       do {
> > +               if (ldx_mo == __ATOMIC_RELAXED)
> > +                       old = __rte_ldx_relaxed(dst);
> > +               else
> > +                       old = __rte_ldx_acquire(dst);
> 
> 2: how about using a simple macro that gets passed the op string?
> 
> Something like (untested):
> 
> #define __READ_128(op_string, src, dst) \
>     asm volatile(                      \
>         op_string " %0, %1, %2"    \
>         : "=&r" (dst.val[0]),      \
>           "=&r" (dst.val[1])       \
>         : "Q" (src->val[0])        \
>         : "memory")
> 
> Then used like this:
> 
>         if (ldx_mo == __ATOMIC_RELAXED)
>             __READ_128("ldxp", dst, old);
>         else
>             __READ_128("ldaxp", dst, old);
> 
> #undef __READ_128
> 
> > +
> > +               if (likely(old.int128 == expected.int128)) {
> > +                       if (stx_mo == __ATOMIC_RELAXED)
> > +                               ret = __rte_stx_relaxed(dst, desired);
> > +                       else
> > +                               ret = __rte_stx_release(dst, desired);
> > +               } else {
> > +                       /* In the failure case (since 'weak' is ignored and only
> > +                        * weak == 0 is implemented), expected should contain
> > +                        * the atomically read value of dst. This means, 'old'
> > +                        * needs to be stored back to ensure it was read
> > +                        * atomically.
> > +                        */
> > +                       if (stx_mo == __ATOMIC_RELAXED)
> > +                               ret = __rte_stx_relaxed(dst, old);
> > +                       else
> > +                               ret = __rte_stx_release(dst, old);
> 
> And:
> 
> #define __STORE_128(op_string, dst, val, ret) \
>     asm volatile(                        \
>         op_string " %w0, %1, %2, %3"     \
>         : "=&r" (ret)                    \
>         : "r" (val.val[0]),              \
>           "r" (val.val[1]),              \
>           "Q" (dst->val[0])              \
>         : "memory")
> 
> Used like this:
> 
>         if (likely(old.int128 == expected.int128)) {
>             if (stx_mo == __ATOMIC_RELAXED)
>                 __STORE_128("stxp", dst, desired, ret);
>             else
>                 __STORE_128("stlxp", dst, desired, ret);
>         } else {
>             /* In the failure case (since 'weak' is ignored and only
>              * weak == 0 is implemented), expected should contain
>              * the atomically read value of dst. This means, 'old'
>              * needs to be stored back to ensure it was read
>              * atomically.
>              */
>             if (stx_mo == __ATOMIC_RELAXED)
>                 __STORE_128("stxp", dst, old, ret);
>             else
>                 __STORE_128("stlxp", dst, old, ret);
>         }
> 
> #undef __STORE_128
> 
> 
> > +               }
> > +       } while (unlikely(ret));
> > +#endif
> > +
> > +       /* Unconditionally updating expected removes
> > +        * an 'if' statement.
> > +        * expected should already be in register if
> > +        * not in the cache.
> > +        */
> > +       *exp = old;
> > +
> > +       return (old.int128 == expected.int128);
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > index 1335d92..cfe7067 100644
> > --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> > @@ -183,18 +183,6 @@ static inline void
> rte_atomic64_clear(rte_atomic64_t *v)
> >
> >  /*------------------------ 128 bit atomic operations -------------------------*/
> >
> > -/**
> > - * 128-bit integer structure.
> > - */
> > -RTE_STD_C11
> > -typedef struct {
> > -       RTE_STD_C11
> > -       union {
> > -               uint64_t val[2];
> > -               __extension__ __int128 int128;
> > -       };
> > -} __rte_aligned(16) rte_int128_t;
> > -
> >  __rte_experimental
> >  static inline int
> >  rte_atomic128_cmp_exchange(rte_int128_t *dst,
> > diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h
> b/lib/librte_eal/common/include/generic/rte_atomic.h
> > index 24ff7dc..e6ab15a 100644
> > --- a/lib/librte_eal/common/include/generic/rte_atomic.h
> > +++ b/lib/librte_eal/common/include/generic/rte_atomic.h
> > @@ -1081,6 +1081,20 @@ static inline void
> rte_atomic64_clear(rte_atomic64_t *v)
> >
> >  /*------------------------ 128 bit atomic operations -------------------------*/
> >
> > +/**
> > + * 128-bit integer structure.
> > + */
> > +RTE_STD_C11
> > +typedef struct {
> > +       RTE_STD_C11
> > +       union {
> > +               uint64_t val[2];
> > +#ifdef RTE_ARCH_64
> > +               __extension__ __int128 int128;
> > +#endif
> 
> You hid this field for x86.
> What is the reason?
No, we are not hid it for x86. The RTE_ARCH_64 flag covered x86 as well.

Thanks,
Phil

  reply	other threads:[~2019-10-15 11:32 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-23  2:41 [dpdk-dev] [PATCH v1 " Phil Yang
2019-06-23  2:41 ` [dpdk-dev] [PATCH v1 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-06-23  2:41 ` [dpdk-dev] [PATCH v1 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-06-23  3:15 ` [dpdk-dev] [PATCH v2 1/3] eal/arm64: add 128-bit atomic compare exchange Phil Yang
2019-06-23  3:15   ` [dpdk-dev] [PATCH v2 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-06-24 15:09     ` Eads, Gage
2019-06-24 15:29       ` Phil Yang (Arm Technology China)
2019-06-23  3:15   ` [dpdk-dev] [PATCH v2 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-06-24 15:15     ` Eads, Gage
2019-06-24 15:22       ` Phil Yang (Arm Technology China)
2019-06-24 14:46   ` [dpdk-dev] [PATCH v2 1/3] eal/arm64: add 128-bit atomic compare exchange Eads, Gage
2019-06-24 15:35     ` Phil Yang (Arm Technology China)
2019-06-28  8:11 ` [dpdk-dev] [PATCH v3 " Phil Yang
2019-06-28  8:11   ` [dpdk-dev] [PATCH v3 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-06-29  0:17     ` Eads, Gage
2019-07-19  4:03     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-06-28  8:11   ` [dpdk-dev] [PATCH v3 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-06-29  0:18     ` Eads, Gage
2019-07-19  4:18     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-19  4:42       ` Eads, Gage
2019-07-19  5:02         ` Jerin Jacob Kollanukkaran
2019-07-19  5:15           ` Phil Yang (Arm Technology China)
2019-07-03 12:25   ` [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
2019-07-03 13:07     ` Jerin Jacob Kollanukkaran
2019-07-05  4:20       ` Honnappa Nagarahalli
2019-07-05  4:37         ` Pavan Nikhilesh Bhagavatula
2019-07-09  9:27           ` Phil Yang (Arm Technology China)
2019-07-09 11:14             ` Jerin Jacob Kollanukkaran
2019-07-19  6:24   ` Jerin Jacob Kollanukkaran
2019-07-19 11:01     ` Phil Yang (Arm Technology China)
2019-07-19 12:35       ` Jerin Jacob Kollanukkaran
2019-07-19 13:56         ` Phil Yang (Arm Technology China)
2019-07-19 14:50           ` Eads, Gage
2019-07-22  8:44 ` [dpdk-dev] [PATCH v4 " Phil Yang
2019-07-22  8:44   ` [dpdk-dev] [PATCH v4 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-07-22  8:44   ` [dpdk-dev] [PATCH v4 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-07-22 10:22     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-22 11:51       ` Phil Yang (Arm Technology China)
2019-07-22 10:20   ` [dpdk-dev] [EXT] [PATCH v4 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
2019-07-22 11:50     ` Phil Yang (Arm Technology China)
2019-07-22 13:06 ` [dpdk-dev] [PATCH v5 " Phil Yang
2019-07-22 13:06   ` [dpdk-dev] [PATCH v5 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-07-22 13:06   ` [dpdk-dev] [PATCH v5 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-07-22 14:14     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-22 15:19       ` Phil Yang (Arm Technology China)
2019-07-22 14:34     ` [dpdk-dev] " Eads, Gage
2019-07-22 14:43       ` Phil Yang (Arm Technology China)
2019-07-22 14:19   ` [dpdk-dev] [EXT] [PATCH v5 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
2019-07-22 16:23     ` Phil Yang (Arm Technology China)
2019-07-22 16:22 ` [dpdk-dev] [PATCH v6 " Phil Yang
2019-07-22 16:22   ` [dpdk-dev] [PATCH v6 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-07-22 16:22   ` [dpdk-dev] [PATCH v6 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-07-22 16:59     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-22 16:57   ` [dpdk-dev] [EXT] [PATCH v6 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
2019-07-23  3:28     ` Phil Yang (Arm Technology China)
2019-07-23  7:09       ` Jerin Jacob Kollanukkaran
2019-07-23  7:53         ` Phil Yang (Arm Technology China)
2019-07-23  5:57 ` [dpdk-dev] [PATCH v7 " Phil Yang
2019-07-23  5:57   ` [dpdk-dev] [PATCH v7 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-07-23  5:57   ` [dpdk-dev] [PATCH v7 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-07-23  7:05   ` [dpdk-dev] [PATCH v8 1/3] eal/arm64: add 128-bit atomic compare exchange jerinj
2019-07-23  7:05     ` [dpdk-dev] [PATCH v8 2/3] test/atomic: add 128b compare and swap test jerinj
2019-07-23  7:05     ` [dpdk-dev] [PATCH v8 3/3] eal/stack: enable lock-free stack for aarch64 jerinj
2019-08-14  8:27     ` [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange Phil Yang
2019-08-14  8:27       ` [dpdk-dev] [PATCH v9 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-10-14 15:45         ` David Marchand
2019-10-15 11:32           ` Phil Yang (Arm Technology China)
2019-08-14  8:27       ` [dpdk-dev] [PATCH v9 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-10-14 15:45         ` David Marchand
2019-10-15 11:32           ` Phil Yang (Arm Technology China)
2019-10-14 15:43       ` [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange David Marchand
2019-10-15 11:32         ` Phil Yang (Arm Technology China) [this message]
2019-10-15 12:16           ` David Marchand
2019-10-16  9:04             ` Phil Yang (Arm Technology China)
2019-10-17 12:45               ` David Marchand
2019-10-15 11:38       ` [dpdk-dev] [PATCH v10 " Phil Yang
2019-10-15 11:38         ` [dpdk-dev] [PATCH v10 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-10-15 11:38         ` [dpdk-dev] [PATCH v10 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-10-18 11:21         ` [dpdk-dev] [PATCH v11 1/3] eal/arm64: add 128-bit atomic compare exchange Phil Yang
2019-10-18 11:21           ` [dpdk-dev] [PATCH v11 2/3] test/atomic: add 128b compare and swap test Phil Yang
2019-10-21  8:25             ` David Marchand
2019-10-18 11:21           ` [dpdk-dev] [PATCH v11 3/3] eal/stack: enable lock-free stack for aarch64 Phil Yang
2019-10-21  8:26             ` David Marchand
2019-10-18 14:16           ` [dpdk-dev] [PATCH v11 1/3] eal/arm64: add 128-bit atomic compare exchange David Marchand
2019-10-18 14:24             ` Jerin Jacob
2019-10-18 14:33               ` David Marchand
2019-10-18 14:36                 ` Jerin Jacob
2019-10-21  8:24                   ` David Marchand
2019-08-14  8:45 [dpdk-dev] [PATCH v9 " Jerin Jacob Kollanukkaran
2019-08-14 10:24 ` Phil Yang (Arm Technology China)
2019-08-14 12:40   ` Jerin Jacob Kollanukkaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VE1PR08MB4640CF1F1C82F03C75AF16FEE9930@VE1PR08MB4640.eurprd08.prod.outlook.com \
    --to=phil.yang@arm.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=gage.eads@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).