From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id D16A89FE for ; Thu, 2 Nov 2017 14:26:28 +0100 (CET) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 06:26:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,334,1505804400"; d="scan'208";a="145134412" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga004.jf.intel.com with ESMTP; 02 Nov 2017 06:26:20 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.67]) by IRSMSX107.ger.corp.intel.com ([169.254.10.239]) with mapi id 14.03.0319.002; Thu, 2 Nov 2017 13:26:19 +0000 From: "Ananyev, Konstantin" To: Jia He , "jerin.jacob@caviumnetworks.com" , "dev@dpdk.org" , "olivier.matz@6wind.com" CC: "Richardson, Bruce" , "jianbo.liu@arm.com" , "hemant.agrawal@nxp.com" , "jie2.liu@hxt-semitech.com" , "bing.zhao@hxt-semitech.com" , "jia.he@hxt-semitech.com" Thread-Topic: [PATCH v2] ring: guarantee ordering of cons/prod loading when doing Thread-Index: AQHTU7bdS9BT0ZrvvEiT0+3nvC0c0aMBEfUg Date: Thu, 2 Nov 2017 13:26:19 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772585FAB8703@irsmsx105.ger.corp.intel.com> References: <1509612210-5499-1-git-send-email-hejianet@gmail.com> In-Reply-To: <1509612210-5499-1-git-send-email-hejianet@gmail.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYzZhNjIxMDQtMmU5MC00ZDI0LThhM2EtNDAyYjFmZDI3NmFjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Im9XQWpDWktXUFNnRm1RXC9vRmZlRGdqXC9RZmNXeDBadkxFRHo0ZktFNmJkcz0ifQ== x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] ring: guarantee ordering of cons/prod loading when doing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 13:26:29 -0000 Hi Jia, > -----Original Message----- > From: Jia He [mailto:hejianet@gmail.com] > Sent: Thursday, November 2, 2017 8:44 AM > To: jerin.jacob@caviumnetworks.com; dev@dpdk.org; olivier.matz@6wind.com > Cc: Ananyev, Konstantin ; Richardson, Bruce= ; jianbo.liu@arm.com; > hemant.agrawal@nxp.com; Jia He ; jie2.liu@hxt-semitec= h.com; bing.zhao@hxt-semitech.com; jia.he@hxt- > semitech.com > Subject: [PATCH v2] ring: guarantee ordering of cons/prod loading when do= ing >=20 > We watched a rte panic of mbuf_autotest in our qualcomm arm64 server. > As for the possible race condition, please refer to [1]. >=20 > Furthermore, there are 2 options as suggested by Jerin: > 1. use rte_smp_rmb > 2. use load_acquire/store_release(refer to [2]). > CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER is provided, and by > default it is n; >=20 > The reason why providing 2 options is due to the performance benchmark > difference in different arm machines, please refer to [3]. >=20 > Already fuctionally tested on the machines as follows: > on X86(passed the compilation) > on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=3Dy > on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=3Dn >=20 > [1] http://dpdk.org/ml/archives/dev/2017-October/078366.html > [2] https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L17= 0 > [3] http://dpdk.org/ml/archives/dev/2017-October/080861.html >=20 > --- > Changelog: > V2: let users choose whether using load_acquire/store_release > V1: rte_smp_rmb() between 2 loads >=20 > Signed-off-by: Jia He > Signed-off-by: jie2.liu@hxt-semitech.com > Signed-off-by: bing.zhao@hxt-semitech.com > Signed-off-by: jia.he@hxt-semitech.com > Suggested-by: jerin.jacob@caviumnetworks.com > --- > lib/librte_ring/Makefile | 4 +++- > lib/librte_ring/rte_ring.h | 38 ++++++++++++++++++++++++------ > lib/librte_ring/rte_ring_arm64.h | 48 ++++++++++++++++++++++++++++++++= ++++++ > lib/librte_ring/rte_ring_generic.h | 45 ++++++++++++++++++++++++++++++++= +++ > 4 files changed, 127 insertions(+), 8 deletions(-) > create mode 100644 lib/librte_ring/rte_ring_arm64.h > create mode 100644 lib/librte_ring/rte_ring_generic.h >=20 > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile > index e34d9d9..fa57a86 100644 > --- a/lib/librte_ring/Makefile > +++ b/lib/librte_ring/Makefile > @@ -45,6 +45,8 @@ LIBABIVER :=3D 1 > SRCS-$(CONFIG_RTE_LIBRTE_RING) :=3D rte_ring.c >=20 > # install includes > -SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=3D rte_ring.h > +SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=3D rte_ring.h \ > + rte_ring_arm64.h \ > + rte_ring_generic.h >=20 > include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 5e9b3b7..943b1f9 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -103,6 +103,18 @@ extern "C" { > #include > #include >=20 > +/* In those strong memory models (e.g. x86), there is no need to add rmb= () > + * between load and load. > + * In those weak models(powerpc/arm), there are 2 choices for the users > + * 1.use rmb() memory barrier > + * 2.use one-direcion load_acquire/store_release barrier > + * It depends on performance test results. */ > +#ifdef RTE_ARCH_ARM64 > +#include "rte_ring_arm64.h" > +#else > +#include "rte_ring_generic.h" > +#endif > + > #define RTE_TAILQ_RING_NAME "RTE_RING" >=20 > enum rte_ring_queue_behavior { > @@ -368,7 +380,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t ol= d_val, uint32_t new_val, > while (unlikely(ht->tail !=3D old_val)) > rte_pause(); >=20 > - ht->tail =3D new_val; > + arch_rte_atomic_store(&ht->tail, new_val, __ATOMIC_RELEASE); > } >=20 > /** > @@ -408,7 +420,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_= sp, > /* Reset n to the initial burst count */ > n =3D max; >=20 > - *old_head =3D r->prod.head; > + *old_head =3D arch_rte_atomic_load(&r->prod.head, > + __ATOMIC_ACQUIRE); > const uint32_t cons_tail =3D r->cons.tail; The code starts to look a bit messy with all these arch specific macros... So I wonder wouldn't it be more cleaner to: 1. move existing __rte_ring_move_prod_head/__rte_ring_move_cons_head/update= _tail into rte_ring_generic.h 2. Add rte_smp_rmb into generic __rte_ring_move_prod_head/__rte_ring_move_c= ons_head=20 (as was in v1 of your patch). 3. Introduce ARM specific versions of __rte_ring_move_prod_head/__rte_ring_= move_cons_head/update_tail in the rte_ring_arm64.h That way we will keep ogneric code simple and clean, while still allowing a= rch specific optimizations. > /* > * The subtraction is done between two unsigned 32bits value > @@ -430,8 +443,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is= _sp, > if (is_sp) > r->prod.head =3D *new_head, success =3D 1; > else > - success =3D rte_atomic32_cmpset(&r->prod.head, > - *old_head, *new_head); > + success =3D arch_rte_atomic32_cmpset(&r->prod.head, > + old_head, *new_head, > + 0, __ATOMIC_ACQUIRE, > + __ATOMIC_RELAXED); > } while (unlikely(success =3D=3D 0)); > return n; > } > @@ -470,7 +485,10 @@ __rte_ring_do_enqueue(struct rte_ring *r, void * con= st *obj_table, > goto end; >=20 > ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *); > + > +#ifndef RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER I wonder why do we need that macro? Would be there situations when smp_wmb() are not needed here? Konstantin