From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 8014BA0679 for ; Mon, 1 Apr 2019 22:21:47 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 526064C99; Mon, 1 Apr 2019 22:21:47 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id E90EA4C96 for ; Mon, 1 Apr 2019 22:21:44 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Apr 2019 13:21:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,297,1549958400"; d="scan'208";a="136682339" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by fmsmga008.fm.intel.com with ESMTP; 01 Apr 2019 13:21:44 -0700 Received: from fmsmsx161.amr.corp.intel.com (10.18.125.9) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.408.0; Mon, 1 Apr 2019 13:21:43 -0700 Received: from fmsmsx108.amr.corp.intel.com ([169.254.9.216]) by FMSMSX161.amr.corp.intel.com ([169.254.12.31]) with mapi id 14.03.0415.000; Mon, 1 Apr 2019 13:21:43 -0700 From: "Eads, Gage" To: Honnappa Nagarahalli , "'dev@dpdk.org'" CC: "'olivier.matz@6wind.com'" , "'arybchenko@solarflare.com'" , "Richardson, Bruce" , "Ananyev, Konstantin" , "Gavin Hu (Arm Technology China)" , nd , "thomas@monjalon.net" , nd Thread-Topic: [PATCH v3 6/8] stack: add C11 atomic implementation Thread-Index: AQHU1CtbzfcoFG40VUqutGZl2oW8e6Yhl7KggAFxNMCAA4rc0IABN+yggAAU1yA= Date: Mon, 1 Apr 2019 20:21:42 +0000 Message-ID: <9184057F7FC11744A2107296B6B8EB1E5420E6E0@FMSMSX108.amr.corp.intel.com> References: <20190305164256.2367-1-gage.eads@intel.com> <20190306144559.391-1-gage.eads@intel.com> <20190306144559.391-7-gage.eads@intel.com> <9184057F7FC11744A2107296B6B8EB1E5420D940@FMSMSX108.amr.corp.intel.com> <9184057F7FC11744A2107296B6B8EB1E5420DDF2@FMSMSX108.amr.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZWI4Njc2MTctMzQ3ZC00NzRmLTgxNmEtNjU4YjI3NDkzYjBjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoidHFCUnRyWHAwQTk2TDBqQkQxTWk4cUw0a213M0Z4K2RGXC9iaFRWQzJ1Yjd3N3o0OERTQ3NuMENwSGJ6OXp4WXcifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.1.200.106] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 6/8] stack: add C11 atomic implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190401202142.aTGqckiKkbFEaZJFQgI4uyx7wxJP_3vx7ru5XMAy_2U@z> > -----Original Message----- > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > Sent: Monday, April 1, 2019 2:07 PM > To: Eads, Gage ; 'dev@dpdk.org' > Cc: 'olivier.matz@6wind.com' ; > 'arybchenko@solarflare.com' ; Richardson, > Bruce ; Ananyev, Konstantin > ; Gavin Hu (Arm Technology China) > ; nd ; thomas@monjalon.net; nd > > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation >=20 > > > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation > > > > > > [snip] > > > > > > > > +static __rte_always_inline void __rte_stack_lf_push(struct > > > > > +rte_stack_lf_list *list, > > > > > + struct rte_stack_lf_elem *first, > > > > > + struct rte_stack_lf_elem *last, > > > > > + unsigned int num) > > > > > +{ > > > > > +#ifndef RTE_ARCH_X86_64 > > > > > + RTE_SET_USED(first); > > > > > + RTE_SET_USED(last); > > > > > + RTE_SET_USED(list); > > > > > + RTE_SET_USED(num); > > > > > +#else > > > > > + struct rte_stack_lf_head old_head; > > > > > + int success; > > > > > + > > > > > + old_head =3D list->head; > > > > This can be a torn read (same as you have mentioned in > > > > __rte_stack_lf_pop). I suggest we use acquire thread fence here as > > > > well (please see the comments in __rte_stack_lf_pop). > > > > > > Agreed. I'll add the acquire fence. > > > > > > > On second thought, an acquire fence isn't necessary. The acquire fence > > in > > __rte_stack_lf_pop() ensures the list->head is ordered before the list > > element reads. That isn't necessary here; we need to ensure that the > > last->next write occurs (and is observed) before the list->head write, > > which the CAS's RELEASE success memorder accomplishes. > > > > If a torn read occurs, the CAS will fail and will atomically re-load &o= ld_head. >=20 > Following is my understanding: > The general guideline is there should be a load-acquire for every store- > release. In both xxx_lf_pop and xxx_lf_push, the head is store-released, > hence the load of the head should be load-acquire. > From the code (for ex: in function _xxx_lf_push), you can notice that the= re is > dependency from 'old_head to new_head to list->head(in > compare_exchange)'. When such a dependency exists, if the memory > orderings have to be avoided, one needs to use __ATOMIC_CONSUME. > Currently, the compilers will use a stronger memory order (which is > __ATOMIC_ACQUIRE) as __ATOMIC_CONSUME is not well defined. Please > refer to [1] and [2] for more info. >=20 > IMO, since, for 128b, we do not have a pure load-acquire, I suggest we us= e > thread_fence with acquire semantics. It is a heavier barrier, but I think= it is a > safer code which will adhere to C11 memory model. >=20 > [1] https://preshing.com/20140709/the-purpose-of- > memory_order_consume-in-cpp11/ > [2] http://www.open- > std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html Thanks for those two links, they're good resources. I agree with your understanding. I admit I'm not fully convinced the synchr= onized-with relationship is needed between pop's list->head store and push'= s list->head load (or between push's list->head store and its list->head lo= ad), but it's better to err on the side of caution to ensure it's functiona= lly correct...at least until I can manage to convince you :). I'll send out a V6 with the acquire thread fence. Thanks, Gage