From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id D25343977; Fri, 5 Oct 2018 10:22:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Oct 2018 01:22:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,343,1534834800"; d="scan'208";a="76399260" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by fmsmga008.fm.intel.com with ESMTP; 05 Oct 2018 01:22:00 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.45]) by IRSMSX103.ger.corp.intel.com ([169.254.3.248]) with mapi id 14.03.0319.002; Fri, 5 Oct 2018 09:21:59 +0100 From: "Ananyev, Konstantin" To: "Gavin Hu (Arm Technology China)" , Jerin Jacob CC: "dev@dpdk.org" , Honnappa Nagarahalli , Steve Capper , "Ola Liljedahl" , nd , "stable@dpdk.org" Thread-Topic: [PATCH v3 1/3] ring: read tail using atomic load Thread-Index: AQHUXEUzcUjYt7AXu0OsxPH+FH4ynqUQTyGg Date: Fri, 5 Oct 2018 08:21:59 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580102FE261A@IRSMSX106.ger.corp.intel.com> References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <20180929104857.GA30457@jerin> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZjdjM2JmN2ItZjhjZS00NDU5LWJlN2UtOWExMjcwODhhY2Y2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiWTJxaDgrZWlabDdXQjNqUFF2VElCVWtKU3hEZndvNlNnYWcxc2E4dU9zN1dvMzFBTHFxQTRsb3hXZ1RQOXpEZCJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2018 08:22:11 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Gavin Hu (Arm Techno= logy China) > Sent: Friday, October 5, 2018 1:47 AM > To: Jerin Jacob > Cc: dev@dpdk.org; Honnappa Nagarahalli ; St= eve Capper ; Ola Liljedahl > ; nd ; stable@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load >=20 > Hi Jerin, >=20 > Thanks for your review, inline comments from our internal discussions. >=20 > BR. Gavin >=20 > > -----Original Message----- > > From: Jerin Jacob > > Sent: Saturday, September 29, 2018 6:49 PM > > To: Gavin Hu (Arm Technology China) > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > ; Steve Capper > > ; Ola Liljedahl ; nd > > ; stable@dpdk.org > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > > -----Original Message----- > > > Date: Mon, 17 Sep 2018 16:17:22 +0800 > > > From: Gavin Hu > > > To: dev@dpdk.org > > > CC: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, > > > steve.capper@arm.com, Ola.Liljedahl@arm.com, > > > jerin.jacob@caviumnetworks.com, nd@arm.com, stable@dpdk.org > > > Subject: [PATCH v3 1/3] ring: read tail using atomic load > > > X-Mailer: git-send-email 2.7.4 > > > > > > External Email > > > > > > In update_tail, read ht->tail using __atomic_load.Although the > > > compiler currently seems to be doing the right thing even without > > > _atomic_load, we don't want to give the compiler freedom to optimise > > > what should be an atomic load, it should not be arbitarily moved > > > around. > > > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Gavin Hu > > > Reviewed-by: Honnappa Nagarahalli > > > Reviewed-by: Steve Capper > > > Reviewed-by: Ola Liljedahl > > > --- > > > lib/librte_ring/rte_ring_c11_mem.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > > b/lib/librte_ring/rte_ring_c11_mem.h > > > index 94df3c4..234fea0 100644 > > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > > @@ -21,7 +21,8 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > > old_val, uint32_t new_val, > > > * we need to wait for them to complete > > > */ > > > if (!single) > > > - while (unlikely(ht->tail !=3D old_val)) > > > + while (unlikely(old_val !=3D __atomic_load_n(&ht->tai= l, > > > + __ATOMIC_RELAXED))) > > > rte_pause(); > > > > Since it is a while loop with rte_pause(), IMO, There is no scope of fa= lse > > compiler optimization. > > IMO, this change may not required though I don't see any performance > > difference with two core ring_perf_autotest test. May be more core case= it > > may have effect. IMO, If it not absolutely required, we can avoid this = change. > > >=20 > Using __atomic_load_n() has two purposes: > 1) the old code only works because ht->tail is declared volatile which is= not a requirement for C11 or for the use of __atomic builtins. If ht- > >tail was not declared volatile and __atomic_load_n() not used, the compi= ler would likely hoist the load above the loop. > 2) I think all memory locations used for synchronization should use __ato= mic operations for access in order to clearly indicate that these > locations (and these accesses) are used for synchronization. >=20 > The read of ht->tail needs to be atomic, a non-atomic read would not be c= orrect. That's a 32bit value load. AFAIK on all CPUs that we support it is an atomic operation. > But there are no memory ordering requirements (with > regards to other loads and/or stores by this thread) so relaxed memory or= der is sufficient. > Another aspect of using __atomic_load_n() is that the compiler cannot "op= timise" this load (e.g. combine, hoist etc), it has to be done as > specified in the source code which is also what we need here. I think Jerin points that rte_pause() acts here as compiler barrier too, so no need to worry that compiler would optimize out the loop. Konstantin >=20 > One point worth mentioning though is that this change is for the rte_ring= _c11_mem.h file, not the legacy ring. It may be worth persisting > with getting the C11 code right when people are less excited about sendin= g a release out? >=20 > We can explain that for C11 we would prefer to do loads and stores as per= the C11 memory model. In the case of rte_ring, the code is > separated cleanly into C11 specific files anyway. >=20 > I think reading ht->tail using __atomic_load_n() is the most appropriate = way. We show that ht->tail is used for synchronization, we > acknowledge that ht->tail may be written by other threads without any oth= er kind of synchronization (e.g. no lock involved) and we require > an atomic load (any write to ht->tail must also be atomic). >=20 > Using volatile and explicit compiler (or processor) memory barriers (fenc= es) is the legacy pre-C11 way of accomplishing these things. There's > a reason why C11/C++11 moved away from the old ways. > > > > > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > > > -- > > > 2.7.4 > > >