DPDK patches and discussions
 help / color / Atom feed
* Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
@ 2019-08-14  8:45 Jerin Jacob Kollanukkaran
  2019-08-14 10:24 ` Phil Yang (Arm Technology China)
  0 siblings, 1 reply; 4+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-08-14  8:45 UTC (permalink / raw)
  To: Phil Yang, thomas, gage.eads, dev
  Cc: hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd

> -----Original Message-----
> From: Phil Yang <phil.yang@arm.com>
> Sent: Wednesday, August 14, 2019 1:58 PM
> To: thomas@monjalon.net; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> gage.eads@intel.com; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; Honnappa.Nagarahalli@arm.com;
> gavin.hu@arm.com; nd@arm.com
> Subject: [EXT] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> +__ATOMIC_RELEASE) #define __HAS_RLS(mo) ((mo) ==
> __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \
> +					  (mo) == __ATOMIC_SEQ_CST)
> +
> +#define __MO_LOAD(mo)  (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE :
> +__ATOMIC_RELAXED) #define __MO_STORE(mo) (__HAS_RLS((mo)) ?
> +__ATOMIC_RELEASE : __ATOMIC_RELAXED)
> +
> +#if defined(__ARM_FEATURE_ATOMICS) ||
> defined(RTE_ARM_FEATURE_ATOMICS)
> +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                          \
> +static __rte_noinline rte_int128_t                                          \


Could you check the cost of making it as __rte_noinline?
If it is costly, How about having two versions, one with __rte_noinline
to make compliance with arm64 procedure call standard for
old gcc and clang.
Other one without explicit register hardcoding + inline for latest
gcc


> +cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
> +		rte_int128_t updated)                                       \
> +{                                                                           \
> +	/* caspX instructions register pair must start from even-numbered
> +	 * register at operand 1.
> +	 * So, specify registers for local variables here.
> +	 */                                                                 \
> +	register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];            \
> +	register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];            \
> +	register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];        \
> +	register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];        \
> +	asm volatile(                                                       \
> +		op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]"   \
> +		: [old0] "+r" (x0),                                         \
> +		[old1] "+r" (x1)                                            \
> +		: [upd0] "r" (x2),                                          \
> +		[upd1] "r" (x3),                                            \
> +		[dst] "r" (dst)                                             \
> +		: "memory");                                                \
> +	old.val[0] = x0;                                                    \
> +	old.val[1] = x1;                                                    \
> +	return old;                                                         \
> +}
> +

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
  2019-08-14  8:45 [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
@ 2019-08-14 10:24 ` Phil Yang (Arm Technology China)
  2019-08-14 12:40   ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Yang (Arm Technology China) @ 2019-08-14 10:24 UTC (permalink / raw)
  To: jerinj, thomas, gage.eads, dev
  Cc: hemant.agrawal, Honnappa Nagarahalli,
	Gavin Hu (Arm Technology China),
	nd, nd

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Sent: Wednesday, August 14, 2019 4:46 PM
> To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>;
> thomas@monjalon.net; gage.eads@intel.com; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
> 
> > -----Original Message-----
> > From: Phil Yang <phil.yang@arm.com>
> > Sent: Wednesday, August 14, 2019 1:58 PM
> > To: thomas@monjalon.net; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>;
> > gage.eads@intel.com; dev@dpdk.org
> > Cc: hemant.agrawal@nxp.com; Honnappa.Nagarahalli@arm.com;
> > gavin.hu@arm.com; nd@arm.com
> > Subject: [EXT] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare
> > exchange
> > +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> > +__ATOMIC_RELEASE) #define __HAS_RLS(mo) ((mo) ==
> > __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \
> > +					  (mo) == __ATOMIC_SEQ_CST)
> > +
> > +#define __MO_LOAD(mo)  (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE :
> > +__ATOMIC_RELAXED) #define __MO_STORE(mo) (__HAS_RLS((mo)) ?
> > +__ATOMIC_RELEASE : __ATOMIC_RELAXED)
> > +
> > +#if defined(__ARM_FEATURE_ATOMICS) ||
> > defined(RTE_ARM_FEATURE_ATOMICS)
> > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                          \
> > +static __rte_noinline rte_int128_t                                          \
> 
> 
> Could you check the cost of making it as __rte_noinline?
> If it is costly, How about having two versions, one with __rte_noinline
> to make compliance with arm64 procedure call standard for
> old gcc and clang.
> Other one without explicit register hardcoding + inline for latest
> gcc

Hi Jerin,

According to the stack_lf_perf_autotest, making it as __rte_noinline has no overhead on ThunderX2 with GCC 8.3.
The 'Average cycles per object push/pop' numbers for __rte_noinline and __rte_always_inline versions are nearly the same.

Test results :
###### Two NUMA Node ######
#### __rte_noinline ####

RTE>>stack_lf_perf_autotest
<snip>
### Testing using two NUMA nodes ###
Average cycles per object push/pop (bulk size: 8): 24.10
Average cycles per object push/pop (bulk size: 32): 6.85

### Testing on all 18 lcores ###
Average cycles per object push/pop (bulk size: 8): 680.39
Average cycles per object push/pop (bulk size: 32): 146.38
Test OK

#### __rte_always-inline ####
RTE>>stack_lf_perf_autotest
<snip>
### Testing using two NUMA nodes ###
Average cycles per object push/pop (bulk size: 8): 24.29
Average cycles per object push/pop (bulk size: 32): 6.92

### Testing on all 18 lcores ###
Average cycles per object push/pop (bulk size: 8): 683.92
Average cycles per object push/pop (bulk size: 32): 145.11
Test OK

###### Single NUMA ######
#### __rte_always-inline ####

RTE>>stack_lf_perf_autotest
<snip>
### Testing on all 18 lcores ###
Average cycles per object push/pop (bulk size: 8): 582.92
Average cycles per object push/pop (bulk size: 32): 125.57
Test OK
#### __rte_noinline ####

RTE>>stack_lf_perf_autotest
<snip>
### Testing on all 18 lcores ###
Average cycles per object push/pop (bulk size: 8): 537.56
Average cycles per object push/pop (bulk size: 32): 122.98
Test OK

Thanks,
Phil Yang

> 
> 
> > +cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
> > +		rte_int128_t updated)                                       \
> > +{                                                                           \
> > +	/* caspX instructions register pair must start from even-numbered
> > +	 * register at operand 1.
> > +	 * So, specify registers for local variables here.
> > +	 */                                                                 \
> > +	register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];            \
> > +	register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];            \
> > +	register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];        \
> > +	register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];        \
> > +	asm volatile(                                                       \
> > +		op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]"   \
> > +		: [old0] "+r" (x0),                                         \
> > +		[old1] "+r" (x1)                                            \
> > +		: [upd0] "r" (x2),                                          \
> > +		[upd1] "r" (x3),                                            \
> > +		[dst] "r" (dst)                                             \
> > +		: "memory");                                                \
> > +	old.val[0] = x0;                                                    \
> > +	old.val[1] = x1;                                                    \
> > +	return old;                                                         \
> > +}
> > +

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
  2019-08-14 10:24 ` Phil Yang (Arm Technology China)
@ 2019-08-14 12:40   ` Jerin Jacob Kollanukkaran
  0 siblings, 0 replies; 4+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-08-14 12:40 UTC (permalink / raw)
  To: Phil Yang (Arm Technology China), thomas, gage.eads, dev
  Cc: hemant.agrawal, Honnappa Nagarahalli,
	Gavin Hu (Arm Technology China),
	nd, nd

> -----Original Message-----
> From: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>
> Sent: Wednesday, August 14, 2019 3:55 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; thomas@monjalon.net;
> gage.eads@intel.com; dev@dpdk.org
> Cc: hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> External Email
> 
> ----------------------------------------------------------------------
> > -----Original Message-----
> > From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> > Sent: Wednesday, August 14, 2019 4:46 PM
> > To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>;
> > thomas@monjalon.net; gage.eads@intel.com; dev@dpdk.org
> > Cc: hemant.agrawal@nxp.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China)
> > <Gavin.Hu@arm.com>; nd <nd@arm.com>
> > Subject: RE: [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare
> > exchange
> >
> > > -----Original Message-----
> > > From: Phil Yang <phil.yang@arm.com>
> > > Sent: Wednesday, August 14, 2019 1:58 PM
> > > To: thomas@monjalon.net; Jerin Jacob Kollanukkaran
> > <jerinj@marvell.com>;
> > > gage.eads@intel.com; dev@dpdk.org
> > > Cc: hemant.agrawal@nxp.com; Honnappa.Nagarahalli@arm.com;
> > > gavin.hu@arm.com; nd@arm.com
> > > Subject: [EXT] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare
> > > exchange
> > > +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> > > +__ATOMIC_RELEASE) #define __HAS_RLS(mo) ((mo) ==
> > > __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \
> > > +					  (mo) == __ATOMIC_SEQ_CST)
> > > +
> > > +#define __MO_LOAD(mo)  (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE :
> > > +__ATOMIC_RELAXED) #define __MO_STORE(mo) (__HAS_RLS((mo)) ?
> > > +__ATOMIC_RELEASE : __ATOMIC_RELAXED)
> > > +
> > > +#if defined(__ARM_FEATURE_ATOMICS) ||
> > > defined(RTE_ARM_FEATURE_ATOMICS)
> > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)
> \
> > > +static __rte_noinline rte_int128_t                                          \
> >
> >
> > Could you check the cost of making it as __rte_noinline?
> > If it is costly, How about having two versions, one with
> > __rte_noinline to make compliance with arm64 procedure call standard
> > for old gcc and clang.
> > Other one without explicit register hardcoding + inline for latest gcc
> 
> Hi Jerin,

Hi Phil Yang,

> According to the stack_lf_perf_autotest, making it as __rte_noinline has no
> overhead on ThunderX2 with GCC 8.3.
> The 'Average cycles per object push/pop' numbers for __rte_noinline and
> __rte_always_inline versions are nearly the same.

I tested with octeontx2 as well. It is yielding similar result. 
No change is expected in this patch then.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange
  2019-07-23  7:05 [dpdk-dev] [PATCH v8 " jerinj
@ 2019-08-14  8:27 ` " Phil Yang
  0 siblings, 0 replies; 4+ messages in thread
From: Phil Yang @ 2019-08-14  8:27 UTC (permalink / raw)
  To: thomas, jerinj, gage.eads, dev
  Cc: hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd

Add 128-bit atomic compare exchange on aarch64.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
---

v9:
Updated 19.11 release note.

v8:
Fixed "WARNING:LONG_LINE: line over 80 characters" warnings with latest kernel
checkpatch.pl

v7:
1. Adjust code comment.

v6:
1. Put the RTE_ARM_FEATURE_ATOMICS flag into EAL group. (Jerin Jocob)
2. Keep rte_stack_lf_stubs.h doing nothing. (Gage Eads)
3. Fixed 32 bit build issue.

v5:
1. Enable RTE_ARM_FEATURE_ATOMICS on octeontx2 in default. (Jerin Jocob)
2. Record the reason of introducing "rte_stack_lf_stubs.h" in git
commit.
(Jerin, Jocob)
3. Fixed a conditional MACRO error in rte_atomic128_cmp_exchange. (Jerin
Jocob)

v4:
1. Add RTE_ARM_FEATURE_ATOMICS flag to support LSE CASP instructions.
(Jerin Jocob)
2. Fix possible arm64 ABI break by making casp_op_name noinline. (Jerin
Jocob)
3. Add rte_stack_lf_stubs.h to reduce the ifdef clutter. (Gage
Eads/Jerin Jocob)

v3:
1. Avoid duplication code with macro. (Jerin Jocob)
2. Make invalid memory order to strongest barrier. (Jerin Jocob)
3. Update doc/guides/prog_guide/env_abstraction_layer.rst. (Gage Eads)
4. Fix 32-bit x86 builds issue. (Gage Eads)
5. Correct documentation issues in UT. (Gage Eads)

v2:
Initial version.

 config/arm/meson.build                             |   2 +
 config/common_base                                 |   3 +
 config/defconfig_arm64-octeontx2-linuxapp-gcc      |   1 +
 config/defconfig_arm64-thunderx2-linuxapp-gcc      |   1 +
 .../common/include/arch/arm/rte_atomic_64.h        | 163 +++++++++++++++++++++
 .../common/include/arch/x86/rte_atomic_64.h        |  12 --
 lib/librte_eal/common/include/generic/rte_atomic.h |  17 ++-
 7 files changed, 186 insertions(+), 13 deletions(-)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 979018e..9f28271 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -71,11 +71,13 @@ flags_thunderx2_extra = [
 	['RTE_CACHE_LINE_SIZE', 64],
 	['RTE_MAX_NUMA_NODES', 2],
 	['RTE_MAX_LCORE', 256],
+	['RTE_ARM_FEATURE_ATOMICS', true],
 	['RTE_USE_C11_MEM_MODEL', true]]
 flags_octeontx2_extra = [
 	['RTE_MACHINE', '"octeontx2"'],
 	['RTE_MAX_NUMA_NODES', 1],
 	['RTE_MAX_LCORE', 24],
+	['RTE_ARM_FEATURE_ATOMICS', true],
 	['RTE_EAL_IGB_UIO', false],
 	['RTE_USE_C11_MEM_MODEL', true]]
 
diff --git a/config/common_base b/config/common_base
index 8ef75c2..2054480 100644
--- a/config/common_base
+++ b/config/common_base
@@ -82,6 +82,9 @@ CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
+
+# Use ARM LSE ATOMIC instructions
+CONFIG_RTE_ARM_FEATURE_ATOMICS=n
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
 CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
diff --git a/config/defconfig_arm64-octeontx2-linuxapp-gcc b/config/defconfig_arm64-octeontx2-linuxapp-gcc
index f20da24..7687dbe 100644
--- a/config/defconfig_arm64-octeontx2-linuxapp-gcc
+++ b/config/defconfig_arm64-octeontx2-linuxapp-gcc
@@ -9,6 +9,7 @@ CONFIG_RTE_MACHINE="octeontx2"
 CONFIG_RTE_CACHE_LINE_SIZE=128
 CONFIG_RTE_MAX_NUMA_NODES=1
 CONFIG_RTE_MAX_LCORE=24
+CONFIG_RTE_ARM_FEATURE_ATOMICS=y
 
 # Doesn't support NUMA
 CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
diff --git a/config/defconfig_arm64-thunderx2-linuxapp-gcc b/config/defconfig_arm64-thunderx2-linuxapp-gcc
index cc5c64b..af4a89c 100644
--- a/config/defconfig_arm64-thunderx2-linuxapp-gcc
+++ b/config/defconfig_arm64-thunderx2-linuxapp-gcc
@@ -9,3 +9,4 @@ CONFIG_RTE_MACHINE="thunderx2"
 CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_MAX_NUMA_NODES=2
 CONFIG_RTE_MAX_LCORE=256
+CONFIG_RTE_ARM_FEATURE_ATOMICS=y
diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 97060e4..14d869b 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2015 Cavium, Inc
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_ATOMIC_ARM64_H_
@@ -14,6 +15,9 @@ extern "C" {
 #endif
 
 #include "generic/rte_atomic.h"
+#include <rte_branch_prediction.h>
+#include <rte_compat.h>
+#include <rte_debug.h>
 
 #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
 #define dmb(opt) asm volatile("dmb " #opt : : : "memory")
@@ -40,6 +44,165 @@ extern "C" {
 
 #define rte_cio_rmb() dmb(oshld)
 
+/*------------------------ 128 bit atomic operations -------------------------*/
+
+#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != __ATOMIC_RELEASE)
+#define __HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \
+					  (mo) == __ATOMIC_SEQ_CST)
+
+#define __MO_LOAD(mo)  (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED)
+#define __MO_STORE(mo) (__HAS_RLS((mo)) ? __ATOMIC_RELEASE : __ATOMIC_RELAXED)
+
+#if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
+#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                          \
+static __rte_noinline rte_int128_t                                          \
+cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
+		rte_int128_t updated)                                       \
+{                                                                           \
+	/* caspX instructions register pair must start from even-numbered
+	 * register at operand 1.
+	 * So, specify registers for local variables here.
+	 */                                                                 \
+	register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];            \
+	register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];            \
+	register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];        \
+	register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];        \
+	asm volatile(                                                       \
+		op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]"   \
+		: [old0] "+r" (x0),                                         \
+		[old1] "+r" (x1)                                            \
+		: [upd0] "r" (x2),                                          \
+		[upd1] "r" (x3),                                            \
+		[dst] "r" (dst)                                             \
+		: "memory");                                                \
+	old.val[0] = x0;                                                    \
+	old.val[1] = x1;                                                    \
+	return old;                                                         \
+}
+
+__ATOMIC128_CAS_OP(__rte_cas_relaxed, "casp")
+__ATOMIC128_CAS_OP(__rte_cas_acquire, "caspa")
+__ATOMIC128_CAS_OP(__rte_cas_release, "caspl")
+__ATOMIC128_CAS_OP(__rte_cas_acq_rel, "caspal")
+#else
+#define __ATOMIC128_LDX_OP(ldx_op_name, op_string)                          \
+static inline rte_int128_t                                                  \
+ldx_op_name(const rte_int128_t *src)                                        \
+{                                                                           \
+	rte_int128_t ret;                                                   \
+	asm volatile(                                                       \
+			op_string " %0, %1, %2"                             \
+			: "=&r" (ret.val[0]),                               \
+			  "=&r" (ret.val[1])                                \
+			: "Q" (src->val[0])                                 \
+			: "memory");                                        \
+	return ret;                                                         \
+}
+
+__ATOMIC128_LDX_OP(__rte_ldx_relaxed, "ldxp")
+__ATOMIC128_LDX_OP(__rte_ldx_acquire, "ldaxp")
+
+#define __ATOMIC128_STX_OP(stx_op_name, op_string)                          \
+static inline uint32_t                                                      \
+stx_op_name(rte_int128_t *dst, const rte_int128_t src)                      \
+{                                                                           \
+	uint32_t ret;                                                       \
+	asm volatile(                                                       \
+			op_string " %w0, %1, %2, %3"                        \
+			: "=&r" (ret)                                       \
+			: "r" (src.val[0]),                                 \
+			  "r" (src.val[1]),                                 \
+			  "Q" (dst->val[0])                                 \
+			: "memory");                                        \
+	/* Return 0 on success, 1 on failure */                             \
+	return ret;                                                         \
+}
+
+__ATOMIC128_STX_OP(__rte_stx_relaxed, "stxp")
+__ATOMIC128_STX_OP(__rte_stx_release, "stlxp")
+#endif
+
+static inline int __rte_experimental
+rte_atomic128_cmp_exchange(rte_int128_t *dst,
+				rte_int128_t *exp,
+				const rte_int128_t *src,
+				unsigned int weak,
+				int success,
+				int failure)
+{
+	/* Always do strong CAS */
+	RTE_SET_USED(weak);
+	/* Ignore memory ordering for failure, memory order for
+	 * success must be stronger or equal
+	 */
+	RTE_SET_USED(failure);
+	/* Find invalid memory order */
+	RTE_ASSERT(success == __ATOMIC_RELAXED
+			|| success == __ATOMIC_ACQUIRE
+			|| success == __ATOMIC_RELEASE
+			|| success == __ATOMIC_ACQ_REL
+			|| success == __ATOMIC_SEQ_CST);
+
+#if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
+	rte_int128_t expected = *exp;
+	rte_int128_t desired = *src;
+	rte_int128_t old;
+
+	if (success == __ATOMIC_RELAXED)
+		old = __rte_cas_relaxed(dst, expected, desired);
+	else if (success == __ATOMIC_ACQUIRE)
+		old = __rte_cas_acquire(dst, expected, desired);
+	else if (success == __ATOMIC_RELEASE)
+		old = __rte_cas_release(dst, expected, desired);
+	else
+		old = __rte_cas_acq_rel(dst, expected, desired);
+#else
+	int ldx_mo = __MO_LOAD(success);
+	int stx_mo = __MO_STORE(success);
+	uint32_t ret = 1;
+	register rte_int128_t expected = *exp;
+	register rte_int128_t desired = *src;
+	register rte_int128_t old;
+
+	/* ldx128 can not guarantee atomic,
+	 * Must write back src or old to verify atomicity of ldx128;
+	 */
+	do {
+		if (ldx_mo == __ATOMIC_RELAXED)
+			old = __rte_ldx_relaxed(dst);
+		else
+			old = __rte_ldx_acquire(dst);
+
+		if (likely(old.int128 == expected.int128)) {
+			if (stx_mo == __ATOMIC_RELAXED)
+				ret = __rte_stx_relaxed(dst, desired);
+			else
+				ret = __rte_stx_release(dst, desired);
+		} else {
+			/* In the failure case (since 'weak' is ignored and only
+			 * weak == 0 is implemented), expected should contain
+			 * the atomically read value of dst. This means, 'old'
+			 * needs to be stored back to ensure it was read
+			 * atomically.
+			 */
+			if (stx_mo == __ATOMIC_RELAXED)
+				ret = __rte_stx_relaxed(dst, old);
+			else
+				ret = __rte_stx_release(dst, old);
+		}
+	} while (unlikely(ret));
+#endif
+
+	/* Unconditionally updating expected removes
+	 * an 'if' statement.
+	 * expected should already be in register if
+	 * not in the cache.
+	 */
+	*exp = old;
+
+	return (old.int128 == expected.int128);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
index 1335d92..cfe7067 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
@@ -183,18 +183,6 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v)
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
-/**
- * 128-bit integer structure.
- */
-RTE_STD_C11
-typedef struct {
-	RTE_STD_C11
-	union {
-		uint64_t val[2];
-		__extension__ __int128 int128;
-	};
-} __rte_aligned(16) rte_int128_t;
-
 __rte_experimental
 static inline int
 rte_atomic128_cmp_exchange(rte_int128_t *dst,
diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h b/lib/librte_eal/common/include/generic/rte_atomic.h
index 24ff7dc..e6ab15a 100644
--- a/lib/librte_eal/common/include/generic/rte_atomic.h
+++ b/lib/librte_eal/common/include/generic/rte_atomic.h
@@ -1081,6 +1081,20 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v)
 
 /*------------------------ 128 bit atomic operations -------------------------*/
 
+/**
+ * 128-bit integer structure.
+ */
+RTE_STD_C11
+typedef struct {
+	RTE_STD_C11
+	union {
+		uint64_t val[2];
+#ifdef RTE_ARCH_64
+		__extension__ __int128 int128;
+#endif
+	};
+} __rte_aligned(16) rte_int128_t;
+
 #ifdef __DOXYGEN__
 
 /**
@@ -1093,7 +1107,8 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v)
  *     *exp = *dst
  * @endcode
  *
- * @note This function is currently only available for the x86-64 platform.
+ * @note This function is currently available for the x86-64 and aarch64
+ * platforms.
  *
  * @note The success and failure arguments must be one of the __ATOMIC_* values
  * defined in the C++11 standard. For details on their behavior, refer to the
-- 
2.7.4


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-14  8:45 [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic compare exchange Jerin Jacob Kollanukkaran
2019-08-14 10:24 ` Phil Yang (Arm Technology China)
2019-08-14 12:40   ` Jerin Jacob Kollanukkaran
  -- strict thread matches above, loose matches on Subject: below --
2019-07-23  7:05 [dpdk-dev] [PATCH v8 " jerinj
2019-08-14  8:27 ` [dpdk-dev] [PATCH v9 " Phil Yang

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox