From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 14F95A046B for ; Tue, 23 Jul 2019 09:06:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 036691BF97; Tue, 23 Jul 2019 09:06:05 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 93F901BF6B for ; Tue, 23 Jul 2019 09:06:03 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x6N752sE013428; Tue, 23 Jul 2019 00:05:59 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=rjT9IRqtmgeZN/X9RxRmFj0E8lGhRe7rcEmkDehjnNc=; b=Y3DJBQA9VJY44gOfPPzs/b/dWW/X7lBDXNEMoxgNu50Z030HB1h+lG381j3qJr5o+iPK Op27ZJoXU7roY0nDlo2hrUOrD/Zi/MB+/i0iki2V0WK0H2eosCnxUQu/Zvan/JYFRuKY fkS2cD57umz2ngNHFFfuBuhpKStu1kuT5P9eyNkkKRoahA47lI0hnS6KTLZ+gUhyMQMB ATX9Hm3zk4m5BVmctFyQRnoWz6kQWjzGfvhP60S+3L4g75X+pVMVBaAI35YQbjc4vT2v lnnHmgKLA+x9FuBeaYno6CxORuzO4bkE5pRIYSXJc2je+UFLj6WgWYRrkdG2D9Qw+x/p nw== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0a-0016f401.pphosted.com with ESMTP id 2tv0ap2kut-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 23 Jul 2019 00:05:59 -0700 Received: from SC-EXCH01.marvell.com (10.93.176.81) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 23 Jul 2019 00:05:58 -0700 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server id 15.0.1367.3 via Frontend Transport; Tue, 23 Jul 2019 00:05:58 -0700 Received: from jerin-lab.marvell.com (jerin-lab.marvell.com [10.28.34.14]) by maili.marvell.com (Postfix) with ESMTP id B24663F7040; Tue, 23 Jul 2019 00:05:55 -0700 (PDT) From: To: , Thomas Monjalon , Jerin Jacob , Gavin Hu , Jan Viktorin , Bruce Richardson , Konstantin Ananyev CC: Phil Yang , Honnappa Nagarahalli Date: Tue, 23 Jul 2019 12:35:34 +0530 Message-ID: <20190723070536.30342-1-jerinj@marvell.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <1563861443-14491-1-git-send-email-phil.yang@arm.com> References: <1563861443-14491-1-git-send-email-phil.yang@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:5.22.84,1.0.8 definitions=2019-07-23_03:2019-07-23,2019-07-23 signatures=0 Subject: [dpdk-dev] [PATCH v8 1/3] eal/arm64: add 128-bit atomic compare exchange X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Phil Yang Add 128-bit atomic compare exchange on aarch64. Suggested-by: Jerin Jacob Signed-off-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Tested-by: Honnappa Nagarahalli Acked-by: Jerin Jacob --- v8: Fixed "WARNING:LONG_LINE: line over 80 characters" warnings with latest kernel checkpatch.pl v7: 1. Adjust code comment. v6: 1. Put the RTE_ARM_FEATURE_ATOMICS flag into EAL group. (Jerin Jocob) 2. Keep rte_stack_lf_stubs.h doing nothing. (Gage Eads) 3. Fixed 32 bit build issue. v5: 1. Enable RTE_ARM_FEATURE_ATOMICS on octeontx2 in default. (Jerin Jocob) 2. Record the reason of introducing "rte_stack_lf_stubs.h" in git commit. (Jerin, Jocob) 3. Fixed a conditional MACRO error in rte_atomic128_cmp_exchange. (Jerin Jocob) v4: 1. Add RTE_ARM_FEATURE_ATOMICS flag to support LSE CASP instructions. (Jerin Jocob) 2. Fix possible arm64 ABI break by making casp_op_name noinline. (Jerin Jocob) 3. Add rte_stack_lf_stubs.h to reduce the ifdef clutter. (Gage Eads/Jerin Jocob) v3: 1. Avoid duplication code with macro. (Jerin Jocob) 2. Make invalid memory order to strongest barrier. (Jerin Jocob) 3. Update doc/guides/prog_guide/env_abstraction_layer.rst. (Gage Eads) 4. Fix 32-bit x86 builds issue. (Gage Eads) 5. Correct documentation issues in UT. (Gage Eads) v2: Initial version. config/arm/meson.build | 2 + config/common_base | 3 + config/defconfig_arm64-octeontx2-linuxapp-gcc | 1 + config/defconfig_arm64-thunderx2-linuxapp-gcc | 1 + .../common/include/arch/arm/rte_atomic_64.h | 163 ++++++++++++++++++ .../common/include/arch/x86/rte_atomic_64.h | 12 -- .../common/include/generic/rte_atomic.h | 17 +- 7 files changed, 186 insertions(+), 13 deletions(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 979018e16..9f2827140 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -71,11 +71,13 @@ flags_thunderx2_extra = [ ['RTE_CACHE_LINE_SIZE', 64], ['RTE_MAX_NUMA_NODES', 2], ['RTE_MAX_LCORE', 256], + ['RTE_ARM_FEATURE_ATOMICS', true], ['RTE_USE_C11_MEM_MODEL', true]] flags_octeontx2_extra = [ ['RTE_MACHINE', '"octeontx2"'], ['RTE_MAX_NUMA_NODES', 1], ['RTE_MAX_LCORE', 24], + ['RTE_ARM_FEATURE_ATOMICS', true], ['RTE_EAL_IGB_UIO', false], ['RTE_USE_C11_MEM_MODEL', true]] diff --git a/config/common_base b/config/common_base index 8ef75c203..205448013 100644 --- a/config/common_base +++ b/config/common_base @@ -82,6 +82,9 @@ CONFIG_RTE_MAX_LCORE=128 CONFIG_RTE_MAX_NUMA_NODES=8 CONFIG_RTE_MAX_HEAPS=32 CONFIG_RTE_MAX_MEMSEG_LISTS=64 + +# Use ARM LSE ATOMIC instructions +CONFIG_RTE_ARM_FEATURE_ATOMICS=n # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192 diff --git a/config/defconfig_arm64-octeontx2-linuxapp-gcc b/config/defconfig_arm64-octeontx2-linuxapp-gcc index f20da2442..7687dbec8 100644 --- a/config/defconfig_arm64-octeontx2-linuxapp-gcc +++ b/config/defconfig_arm64-octeontx2-linuxapp-gcc @@ -9,6 +9,7 @@ CONFIG_RTE_MACHINE="octeontx2" CONFIG_RTE_CACHE_LINE_SIZE=128 CONFIG_RTE_MAX_NUMA_NODES=1 CONFIG_RTE_MAX_LCORE=24 +CONFIG_RTE_ARM_FEATURE_ATOMICS=y # Doesn't support NUMA CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n diff --git a/config/defconfig_arm64-thunderx2-linuxapp-gcc b/config/defconfig_arm64-thunderx2-linuxapp-gcc index cc5c64ba0..af4a89c48 100644 --- a/config/defconfig_arm64-thunderx2-linuxapp-gcc +++ b/config/defconfig_arm64-thunderx2-linuxapp-gcc @@ -9,3 +9,4 @@ CONFIG_RTE_MACHINE="thunderx2" CONFIG_RTE_CACHE_LINE_SIZE=64 CONFIG_RTE_MAX_NUMA_NODES=2 CONFIG_RTE_MAX_LCORE=256 +CONFIG_RTE_ARM_FEATURE_ATOMICS=y diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h index 97060e444..14d869bc9 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2015 Cavium, Inc + * Copyright(c) 2019 Arm Limited */ #ifndef _RTE_ATOMIC_ARM64_H_ @@ -14,6 +15,9 @@ extern "C" { #endif #include "generic/rte_atomic.h" +#include +#include +#include #define dsb(opt) asm volatile("dsb " #opt : : : "memory") #define dmb(opt) asm volatile("dmb " #opt : : : "memory") @@ -40,6 +44,165 @@ extern "C" { #define rte_cio_rmb() dmb(oshld) +/*------------------------ 128 bit atomic operations -------------------------*/ + +#define __HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != __ATOMIC_RELEASE) +#define __HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || (mo) == __ATOMIC_ACQ_REL || \ + (mo) == __ATOMIC_SEQ_CST) + +#define __MO_LOAD(mo) (__HAS_ACQ((mo)) ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) +#define __MO_STORE(mo) (__HAS_RLS((mo)) ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) + +#if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS) +#define __ATOMIC128_CAS_OP(cas_op_name, op_string) \ +static __rte_noinline rte_int128_t \ +cas_op_name(rte_int128_t *dst, rte_int128_t old, \ + rte_int128_t updated) \ +{ \ + /* caspX instructions register pair must start from even-numbered + * register at operand 1. + * So, specify registers for local variables here. + */ \ + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; \ + register uint64_t x1 __asm("x1") = (uint64_t)old.val[1]; \ + register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0]; \ + register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1]; \ + asm volatile( \ + op_string " %[old0], %[old1], %[upd0], %[upd1], [%[dst]]" \ + : [old0] "+r" (x0), \ + [old1] "+r" (x1) \ + : [upd0] "r" (x2), \ + [upd1] "r" (x3), \ + [dst] "r" (dst) \ + : "memory"); \ + old.val[0] = x0; \ + old.val[1] = x1; \ + return old; \ +} + +__ATOMIC128_CAS_OP(__rte_cas_relaxed, "casp") +__ATOMIC128_CAS_OP(__rte_cas_acquire, "caspa") +__ATOMIC128_CAS_OP(__rte_cas_release, "caspl") +__ATOMIC128_CAS_OP(__rte_cas_acq_rel, "caspal") +#else +#define __ATOMIC128_LDX_OP(ldx_op_name, op_string) \ +static inline rte_int128_t \ +ldx_op_name(const rte_int128_t *src) \ +{ \ + rte_int128_t ret; \ + asm volatile( \ + op_string " %0, %1, %2" \ + : "=&r" (ret.val[0]), \ + "=&r" (ret.val[1]) \ + : "Q" (src->val[0]) \ + : "memory"); \ + return ret; \ +} + +__ATOMIC128_LDX_OP(__rte_ldx_relaxed, "ldxp") +__ATOMIC128_LDX_OP(__rte_ldx_acquire, "ldaxp") + +#define __ATOMIC128_STX_OP(stx_op_name, op_string) \ +static inline uint32_t \ +stx_op_name(rte_int128_t *dst, const rte_int128_t src) \ +{ \ + uint32_t ret; \ + asm volatile( \ + op_string " %w0, %1, %2, %3" \ + : "=&r" (ret) \ + : "r" (src.val[0]), \ + "r" (src.val[1]), \ + "Q" (dst->val[0]) \ + : "memory"); \ + /* Return 0 on success, 1 on failure */ \ + return ret; \ +} + +__ATOMIC128_STX_OP(__rte_stx_relaxed, "stxp") +__ATOMIC128_STX_OP(__rte_stx_release, "stlxp") +#endif + +static inline int __rte_experimental +rte_atomic128_cmp_exchange(rte_int128_t *dst, + rte_int128_t *exp, + const rte_int128_t *src, + unsigned int weak, + int success, + int failure) +{ + /* Always do strong CAS */ + RTE_SET_USED(weak); + /* Ignore memory ordering for failure, memory order for + * success must be stronger or equal + */ + RTE_SET_USED(failure); + /* Find invalid memory order */ + RTE_ASSERT(success == __ATOMIC_RELAXED + || success == __ATOMIC_ACQUIRE + || success == __ATOMIC_RELEASE + || success == __ATOMIC_ACQ_REL + || success == __ATOMIC_SEQ_CST); + +#if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS) + rte_int128_t expected = *exp; + rte_int128_t desired = *src; + rte_int128_t old; + + if (success == __ATOMIC_RELAXED) + old = __rte_cas_relaxed(dst, expected, desired); + else if (success == __ATOMIC_ACQUIRE) + old = __rte_cas_acquire(dst, expected, desired); + else if (success == __ATOMIC_RELEASE) + old = __rte_cas_release(dst, expected, desired); + else + old = __rte_cas_acq_rel(dst, expected, desired); +#else + int ldx_mo = __MO_LOAD(success); + int stx_mo = __MO_STORE(success); + uint32_t ret = 1; + register rte_int128_t expected = *exp; + register rte_int128_t desired = *src; + register rte_int128_t old; + + /* ldx128 can not guarantee atomic, + * Must write back src or old to verify atomicity of ldx128; + */ + do { + if (ldx_mo == __ATOMIC_RELAXED) + old = __rte_ldx_relaxed(dst); + else + old = __rte_ldx_acquire(dst); + + if (likely(old.int128 == expected.int128)) { + if (stx_mo == __ATOMIC_RELAXED) + ret = __rte_stx_relaxed(dst, desired); + else + ret = __rte_stx_release(dst, desired); + } else { + /* In the failure case (since 'weak' is ignored and only + * weak == 0 is implemented), expected should contain + * the atomically read value of dst. This means, 'old' + * needs to be stored back to ensure it was read + * atomically. + */ + if (stx_mo == __ATOMIC_RELAXED) + ret = __rte_stx_relaxed(dst, old); + else + ret = __rte_stx_release(dst, old); + } + } while (unlikely(ret)); +#endif + + /* Unconditionally updating expected removes + * an 'if' statement. + * expected should already be in register if + * not in the cache. + */ + *exp = old; + + return (old.int128 == expected.int128); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h index e087c6c32..12171296b 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h @@ -212,18 +212,6 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) /*------------------------ 128 bit atomic operations -------------------------*/ -/** - * 128-bit integer structure. - */ -RTE_STD_C11 -typedef struct { - RTE_STD_C11 - union { - uint64_t val[2]; - __extension__ __int128 int128; - }; -} __rte_aligned(16) rte_int128_t; - __rte_experimental static inline int rte_atomic128_cmp_exchange(rte_int128_t *dst, diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h b/lib/librte_eal/common/include/generic/rte_atomic.h index 24ff7dcae..e6ab15a97 100644 --- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h @@ -1081,6 +1081,20 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) /*------------------------ 128 bit atomic operations -------------------------*/ +/** + * 128-bit integer structure. + */ +RTE_STD_C11 +typedef struct { + RTE_STD_C11 + union { + uint64_t val[2]; +#ifdef RTE_ARCH_64 + __extension__ __int128 int128; +#endif + }; +} __rte_aligned(16) rte_int128_t; + #ifdef __DOXYGEN__ /** @@ -1093,7 +1107,8 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) * *exp = *dst * @endcode * - * @note This function is currently only available for the x86-64 platform. + * @note This function is currently available for the x86-64 and aarch64 + * platforms. * * @note The success and failure arguments must be one of the __ATOMIC_* values * defined in the C++11 standard. For details on their behavior, refer to the -- 2.22.0