From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 705201B5E5 for ; Thu, 2 Nov 2017 17:16:48 +0100 (CET) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 09:16:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,335,1505804400"; d="scan'208";a="171115184" Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155]) by fmsmga006.fm.intel.com with ESMTP; 02 Nov 2017 09:16:42 -0700 Received: from irsmsx112.ger.corp.intel.com (10.108.20.5) by IRSMSX102.ger.corp.intel.com (163.33.3.155) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 2 Nov 2017 16:16:35 +0000 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.67]) by irsmsx112.ger.corp.intel.com ([169.254.1.12]) with mapi id 14.03.0319.002; Thu, 2 Nov 2017 16:16:34 +0000 From: "Ananyev, Konstantin" To: Jia He , "jerin.jacob@caviumnetworks.com" , "dev@dpdk.org" , "olivier.matz@6wind.com" CC: "Richardson, Bruce" , "jianbo.liu@arm.com" , "hemant.agrawal@nxp.com" , "jie2.liu@hxt-semitech.com" , "bing.zhao@hxt-semitech.com" , "jia.he@hxt-semitech.com" Thread-Topic: [PATCH v2] ring: guarantee ordering of cons/prod loading when doing Thread-Index: AQHTU7bdS9BT0ZrvvEiT0+3nvC0c0aMBEfUggAApH4CAAAWVYA== Date: Thu, 2 Nov 2017 16:16:33 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772585FAB8891@irsmsx105.ger.corp.intel.com> References: <1509612210-5499-1-git-send-email-hejianet@gmail.com> <2601191342CEEE43887BDE71AB9772585FAB8703@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDUyOTA0ODEtYWQ2OC00NzliLThkOGEtNGU5OWQ4MWRhYzRmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkJxdEppZWhOOVRhbVAwRXRJYjdjU2N0aCtZQ3RuZ3RPYjYyU0o2SEs0emM9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] ring: guarantee ordering of cons/prod loading when doing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 16:16:49 -0000 > -----Original Message----- > From: Jia He [mailto:hejianet@gmail.com] > Sent: Thursday, November 2, 2017 3:43 PM > To: Ananyev, Konstantin ; jerin.jacob@caviu= mnetworks.com; dev@dpdk.org; olivier.matz@6wind.com > Cc: Richardson, Bruce ; jianbo.liu@arm.com; h= emant.agrawal@nxp.com; jie2.liu@hxt-semitech.com; > bing.zhao@hxt-semitech.com; jia.he@hxt-semitech.com > Subject: Re: [PATCH v2] ring: guarantee ordering of cons/prod loading whe= n doing >=20 > Hi Ananyev >=20 >=20 > On 11/2/2017 9:26 PM, Ananyev, Konstantin Wrote: > > Hi Jia, > > > >> -----Original Message----- > >> From: Jia He [mailto:hejianet@gmail.com] > >> Sent: Thursday, November 2, 2017 8:44 AM > >> To: jerin.jacob@caviumnetworks.com; dev@dpdk.org; olivier.matz@6wind.c= om > >> Cc: Ananyev, Konstantin ; Richardson, Br= uce ; jianbo.liu@arm.com; > >> hemant.agrawal@nxp.com; Jia He ; jie2.liu@hxt-semi= tech.com; bing.zhao@hxt-semitech.com; jia.he@hxt- > >> semitech.com > >> Subject: [PATCH v2] ring: guarantee ordering of cons/prod loading when= doing > >> > >> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server. > >> As for the possible race condition, please refer to [1]. > >> > >> Furthermore, there are 2 options as suggested by Jerin: > >> 1. use rte_smp_rmb > >> 2. use load_acquire/store_release(refer to [2]). > >> CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER is provided, and by > >> default it is n; > >> > >> The reason why providing 2 options is due to the performance benchmark > >> difference in different arm machines, please refer to [3]. > >> > >> Already fuctionally tested on the machines as follows: > >> on X86(passed the compilation) > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=3Dy > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=3Dn > >> > >> [1] http://dpdk.org/ml/archives/dev/2017-October/078366.html > >> [2] https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#= L170 > >> [3] http://dpdk.org/ml/archives/dev/2017-October/080861.html > >> > >> --- > >> Changelog: > >> V2: let users choose whether using load_acquire/store_release > >> V1: rte_smp_rmb() between 2 loads > >> > >> Signed-off-by: Jia He > >> Signed-off-by: jie2.liu@hxt-semitech.com > >> Signed-off-by: bing.zhao@hxt-semitech.com > >> Signed-off-by: jia.he@hxt-semitech.com > >> Suggested-by: jerin.jacob@caviumnetworks.com > >> --- > >> lib/librte_ring/Makefile | 4 +++- > >> lib/librte_ring/rte_ring.h | 38 ++++++++++++++++++++++++----= -- > >> lib/librte_ring/rte_ring_arm64.h | 48 ++++++++++++++++++++++++++++= ++++++++++ > >> lib/librte_ring/rte_ring_generic.h | 45 ++++++++++++++++++++++++++++= +++++++ > >> 4 files changed, 127 insertions(+), 8 deletions(-) > >> create mode 100644 lib/librte_ring/rte_ring_arm64.h > >> create mode 100644 lib/librte_ring/rte_ring_generic.h > >> > >> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile > >> index e34d9d9..fa57a86 100644 > >> --- a/lib/librte_ring/Makefile > >> +++ b/lib/librte_ring/Makefile > >> @@ -45,6 +45,8 @@ LIBABIVER :=3D 1 > >> SRCS-$(CONFIG_RTE_LIBRTE_RING) :=3D rte_ring.c > >> > >> # install includes > >> -SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=3D rte_ring.h > >> +SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=3D rte_ring.h \ > >> + rte_ring_arm64.h \ > >> + rte_ring_generic.h > >> > >> include $(RTE_SDK)/mk/rte.lib.mk > >> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > >> index 5e9b3b7..943b1f9 100644 > >> --- a/lib/librte_ring/rte_ring.h > >> +++ b/lib/librte_ring/rte_ring.h > >> @@ -103,6 +103,18 @@ extern "C" { > >> #include > >> #include > >> > >> +/* In those strong memory models (e.g. x86), there is no need to add = rmb() > >> + * between load and load. > >> + * In those weak models(powerpc/arm), there are 2 choices for the use= rs > >> + * 1.use rmb() memory barrier > >> + * 2.use one-direcion load_acquire/store_release barrier > >> + * It depends on performance test results. */ > >> +#ifdef RTE_ARCH_ARM64 > >> +#include "rte_ring_arm64.h" > >> +#else > >> +#include "rte_ring_generic.h" > >> +#endif > >> + > >> #define RTE_TAILQ_RING_NAME "RTE_RING" > >> > >> enum rte_ring_queue_behavior { > >> @@ -368,7 +380,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t= old_val, uint32_t new_val, > >> while (unlikely(ht->tail !=3D old_val)) > >> rte_pause(); > >> > >> - ht->tail =3D new_val; > >> + arch_rte_atomic_store(&ht->tail, new_val, __ATOMIC_RELEASE); > >> } > >> > >> /** > >> @@ -408,7 +420,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int = is_sp, > >> /* Reset n to the initial burst count */ > >> n =3D max; > >> > >> - *old_head =3D r->prod.head; > >> + *old_head =3D arch_rte_atomic_load(&r->prod.head, > >> + __ATOMIC_ACQUIRE); > >> const uint32_t cons_tail =3D r->cons.tail; > > The code starts to look a bit messy with all these arch specific macros= ... > > So I wonder wouldn't it be more cleaner to: > > > > 1. move existing __rte_ring_move_prod_head/__rte_ring_move_cons_head/up= date_tail > > into rte_ring_generic.h > > 2. Add rte_smp_rmb into generic __rte_ring_move_prod_head/__rte_ring_mo= ve_cons_head > > (as was in v1 of your patch). > > 3. Introduce ARM specific versions of __rte_ring_move_prod_head/__rte_r= ing_move_cons_head/update_tail > > in the rte_ring_arm64.h > > > > That way we will keep ogneric code simple and clean, while still allowi= ng arch specific optimizations. > Thanks for your review. > But as per your suggestion, there will be at least 2 copies of > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail. > Thus, if there are any bugs in the future, both 2 copies have to be > changed, right? Yes, there would be some code duplication. Though generic code would be much cleaner and simpler to read in that case. Which is worth some dups I think. > > > >> /* > >> * The subtraction is done between two unsigned 32bits value > >> @@ -430,8 +443,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int= is_sp, > >> if (is_sp) > >> r->prod.head =3D *new_head, success =3D 1; > >> else > >> - success =3D rte_atomic32_cmpset(&r->prod.head, > >> - *old_head, *new_head); > >> + success =3D arch_rte_atomic32_cmpset(&r->prod.head, > >> + old_head, *new_head, > >> + 0, __ATOMIC_ACQUIRE, > >> + __ATOMIC_RELAXED); > >> } while (unlikely(success =3D=3D 0)); > >> return n; > >> } > >> @@ -470,7 +485,10 @@ __rte_ring_do_enqueue(struct rte_ring *r, void * = const *obj_table, > >> goto end; > >> > >> ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *); > >> + > >> +#ifndef RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER > > I wonder why do we need that macro? > > Would be there situations when smp_wmb() are not needed here? > If the dpdk user chooses the config acquire/release, the store_release > barrier in update_tail together with > the load_acquire barrier pair in __rte_ring_move_{prod,cons}_head > guarantee the order. So smp_wmb() is not required here. Please refer to > the freebsd ring implementation and Jerin's debug patch. > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h > https://github.com/jerinjacobk/mytests/blob/master/ring/0001-ring-using-c= 11-memory-model.patch Ok, so because you are doing arch_rte_atomic_store(&ht->tail, new_val, __ATOMIC_RELEASE); inside update_tail() you don't need mb() anymore here... In that case, can we probably hide all that logic inside update_tail()? I.E. move sp_rmb/smp_wmb into update_tail() and then for arm64 you'll Have your special one with atomic_store() Something like that for generic version : static __rte_always_inline void update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_va= l, - uint32_t single) + uint32_t single, uint32_t enqueue) { + if (enqueue) + rte_smp_wmb(); + else + rte_smp_rmb(); /* * If there are other enqueues/dequeues in progress that preceded u= s, * we need to wait for them to complete */ if (!single) while (unlikely(ht->tail !=3D old_val)) rte_pause(); ht->tail =3D new_val; } .... static __rte_always_inline unsigned int __rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table, unsigned int n, enum rte_ring_queue_behavior behavior, int is_sp, unsigned int *free_space) { ..... ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *); - rte_smp_wmb(); - update_tail(&r->prod, prod_head, prod_next, is_sp); + update_tail(&r->prod, prod_head, prod_next, is_sp, 1); .... static __rte_always_inline unsigned int __rte_ring_do_dequeue(struct rte_ring *r, void **obj_table, unsigned int n, enum rte_ring_queue_behavior behavior, int is_sc, unsigned int *available) { .... DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *); - rte_smp_rmb(); - update_tail(&r->cons, cons_head, cons_next, is_sc); + update_tail(&r->cons, cons_head, cons_next, is_sc, 0); Konstantin >=20 > --- > Cheers, > Jia > > Konstantin > > > > >=20 > -- > Cheers, > Jia