From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 75489A32A2 for ; Thu, 24 Oct 2019 12:21:37 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 017D91D50F; Thu, 24 Oct 2019 12:21:37 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id B6E7D1C2F6 for ; Thu, 24 Oct 2019 12:21:34 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Oct 2019 03:21:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,224,1569308400"; d="scan'208";a="201430549" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by orsmga003.jf.intel.com with ESMTP; 24 Oct 2019 03:21:30 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX108.ger.corp.intel.com (163.33.3.3) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 24 Oct 2019 11:21:30 +0100 Received: from irsmsx104.ger.corp.intel.com ([169.254.5.252]) by irsmsx155.ger.corp.intel.com ([169.254.14.193]) with mapi id 14.03.0439.000; Thu, 24 Oct 2019 11:21:29 +0100 From: "Ananyev, Konstantin" To: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" CC: nd , "thomas@monjalon.net" , "stephen@networkplumber.org" , "hemant.agrawal@nxp.com" , "jerinj@marvell.com" , "pbhagavatula@marvell.com" , Honnappa Nagarahalli , "Ruifeng Wang (Arm Technology China)" , "Phil Yang (Arm Technology China)" , Steve Capper , nd , nd Thread-Topic: [dpdk-dev] [PATCH v7 2/7] eal: add the APIs to wait until equal Thread-Index: AQHVdPZsIn7e7nyRGEuInrt1oftpBKde7MQggAA7nkCACVhrAIAAAqaAgAE7iaA= Date: Thu, 24 Oct 2019 10:21:29 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725801A8C6F66F@IRSMSX104.ger.corp.intel.com> References: <1561911676-37718-1-git-send-email-gavin.hu@arm.com> <1569562904-43950-3-git-send-email-gavin.hu@arm.com> <2601191342CEEE43887BDE71AB97725801A8C6AB30@IRSMSX104.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjAwYzkwNGQtMDVjYi00Njg2LTkxNmEtYzI2Y2NmZWZiMDIxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiZVUxWTllbnlOWHZteUk0emdsUDJHZktyRlkrSWdXU2c1ZWVIS3N6OG5ybWl2YlFwTWhRRUVGVndJR1lOaUJFciJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v7 2/7] eal: add the APIs to wait until equal X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Gavin, > > > > > The rte_wait_until_equal_xx APIs abstract the functionality of > > > > > 'polling for a memory location to become equal to a given value'. > > > > > > > > > > Add the RTE_ARM_USE_WFE configuration entry for aarch64, disabled > > > > > by default. When it is enabled, the above APIs will call WFE inst= ruction > > > > > to save CPU cycles and power. > > > > > > > > > > Signed-off-by: Gavin Hu > > > > > Reviewed-by: Ruifeng Wang > > > > > Reviewed-by: Steve Capper > > > > > Reviewed-by: Ola Liljedahl > > > > > Reviewed-by: Honnappa Nagarahalli > > > > > > > Reviewed-by: Phil Yang > > > > > Acked-by: Pavan Nikhilesh > > > > > --- > > > > > config/arm/meson.build | 1 + > > > > > config/common_base | 5 + > > > > > .../common/include/arch/arm/rte_pause_64.h | 30 ++++++ > > > > > lib/librte_eal/common/include/generic/rte_pause.h | 106 > > > +++++++++++++++++++++ > > > > > 4 files changed, 142 insertions(+) > > > > > > > > > > diff --git a/config/arm/meson.build b/config/arm/meson.build > > > > > index 979018e..b4b4cac 100644 > > > > > --- a/config/arm/meson.build > > > > > +++ b/config/arm/meson.build > > > > > @@ -26,6 +26,7 @@ flags_common_default =3D [ > > > > > ['RTE_LIBRTE_AVP_PMD', false], > > > > > > > > > > ['RTE_SCHED_VECTOR', false], > > > > > + ['RTE_ARM_USE_WFE', false], > > > > > ] > > > > > > > > > > flags_generic =3D [ > > > > > diff --git a/config/common_base b/config/common_base > > > > > index 8ef75c2..8861713 100644 > > > > > --- a/config/common_base > > > > > +++ b/config/common_base > > > > > @@ -111,6 +111,11 @@ CONFIG_RTE_MAX_VFIO_CONTAINERS=3D64 > > > > > CONFIG_RTE_MALLOC_DEBUG=3Dn > > > > > CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=3Dn > > > > > CONFIG_RTE_USE_LIBBSD=3Dn > > > > > +# Use WFE instructions to implement the rte_wait_for_equal_xxx A= PIs, > > > > > +# calling these APIs put the cores in low power state while wait= ing > > > > > +# for the memory address to become equal to the expected value. > > > > > +# This is supported only by aarch64. > > > > > +CONFIG_RTE_ARM_USE_WFE=3Dn > > > > > > > > > > # > > > > > # Recognize/ignore the AVX/AVX512 CPU flags for performance/powe= r > > > testing. > > > > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_64.= h > > > b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h > > > > > index 93895d3..dabde17 100644 > > > > > --- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h > > > > > +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h > > > > > @@ -1,5 +1,6 @@ > > > > > /* SPDX-License-Identifier: BSD-3-Clause > > > > > * Copyright(c) 2017 Cavium, Inc > > > > > + * Copyright(c) 2019 Arm Limited > > > > > */ > > > > > > > > > > #ifndef _RTE_PAUSE_ARM64_H_ > > > > > @@ -17,6 +18,35 @@ static inline void rte_pause(void) > > > > > asm volatile("yield" ::: "memory"); > > > > > } > > > > > > > > > > +#ifdef RTE_ARM_USE_WFE > > > > > +#define __WAIT_UNTIL_EQUAL(name, asm_op, wide, type) \ > > > > > +static __rte_always_inline void \ > > > > > +rte_wait_until_equal_##name(volatile type * addr, type expected)= \ > > > > > +{ \ > > > > > + type tmp; \ > > > > > + asm volatile( \ > > > > > + #asm_op " %" #wide "[tmp], %[addr]\n" \ > > > > > + "cmp %" #wide "[tmp], %" #wide "[expected]\n" \ > > > > > + "b.eq 2f\n" \ > > > > > + "sevl\n" \ > > > > > + "1: wfe\n" \ > > > > > + #asm_op " %" #wide "[tmp], %[addr]\n" \ > > > > > + "cmp %" #wide "[tmp], %" #wide "[expected]\n" \ > > > > > + "bne 1b\n" \ > > > > > + "2:\n" \ > > > > > + : [tmp] "=3D&r" (tmp) \ > > > > > + : [addr] "Q"(*addr), [expected] "r"(expected) \ > > > > > + : "cc", "memory"); \ > > > > > +} > > > > > > One more thought: > > > Why do you need to write asm code for the whole procedure? > > > Why not to do like linux kernel: > > > define wfe() and sev() macros and use them inside normal C code? > > > > > > #define sev() asm volatile("sev" : : : "memory") > > > #define wfe() asm volatile("wfe" : : : "memory") > > > > > > Then: > > > rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected, i= nt > > > memorder) > > > { > > > if (__atomic_load_n(addr, memorder) !=3D expected) { > > > sev(); > > > do { > > > wfe(); > > > } while ((__atomic_load_n(addr, memorder) !=3D expected); > > > } > > > } > > > > > > ? > > A really good suggestion, I made corresponding changes to v8 already, b= ut it > > missed a armv8 specific feature after internal discussion. > > We call wfe to wait/sleep on the 'monitored' address, it will be waken = up > > upon someone write to the monitor address, so before wfe, we have to ca= ll > > load-exclusive instruction to 'monitor'. > > __atomic_load_n - disassembled to "ldr" does not do so. We have to use > > "ldxrh" for relaxed mem ordering and "ldaxrh" for acquire ordering, in > > example of 16-bit. Didn't realize that, sorry for confusion caused... > > > > Let me re-think coming back to the full assembly procedure or implement= ing > > a 'load-exclusive' function. What do you think? After some thought I am leaning towards 'load-exclusive' function - Hopefully it would help you avoid ras asm here and in other places. What do you think? Konstantin > > /Gavin > Forgot to mention, kernel uses wfe() without preceding load-exclusive ins= tructions because: > 1) it replies on the timer, to wake up, i.e. __delay() > 2) explicit calling sev to send wake events, for all kinds of locks > 3) IPI instructions. >=20 > Our patches can't count on these events, due to of lack of these events o= r performance impact. > /Gavin > > > > > > > +/* Wait for *addr to be updated with expected value */ > > > > > +__WAIT_UNTIL_EQUAL(relaxed_16, ldxrh, w, uint16_t) > > > > > +__WAIT_UNTIL_EQUAL(acquire_16, ldaxrh, w, uint16_t) > > > > > +__WAIT_UNTIL_EQUAL(relaxed_32, ldxr, w, uint32_t) > > > > > +__WAIT_UNTIL_EQUAL(acquire_32, ldaxr, w, uint32_t) > > > > > +__WAIT_UNTIL_EQUAL(relaxed_64, ldxr, x, uint64_t) > > > > > +__WAIT_UNTIL_EQUAL(acquire_64, ldaxr, x, uint64_t) > > > > > +#endif > > > > > + > > > > > #ifdef __cplusplus > > > > > } > > > > > #endif