From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 1A94B39EA for ; Fri, 15 Jul 2016 12:35:00 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 15 Jul 2016 03:34:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,367,1464678000"; d="scan'208";a="995821714" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by orsmga001.jf.intel.com with ESMTP; 15 Jul 2016 03:34:43 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.51]) by IRSMSX108.ger.corp.intel.com ([169.254.11.125]) with mapi id 14.03.0248.002; Fri, 15 Jul 2016 11:34:41 +0100 From: "Ananyev, Konstantin" To: Jerin Jacob CC: "Kuusisaari, Juhamatti" , "'dev@dpdk.org'" , "Jan Viktorin (viktorin@rehivetech.com)" , "Chao Zhu (bjzhuc@cn.ibm.com)" Thread-Topic: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location Thread-Index: AQHR3mJIhUt+xGnfC0q4RGQkXZmtUaAZQJUg Date: Fri, 15 Jul 2016 10:34:40 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B7E280@irsmsx105.ger.corp.intel.com> References: <2601191342CEEE43887BDE71AB97725836B7C858@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D850@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7DD85@irsmsx105.ger.corp.intel.com> <20160715062905.GA6473@localhost.localdomain> In-Reply-To: <20160715062905.GA6473@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2016 10:35:01 -0000 Hi Jerin, > > > > > > > The CPU also > > > knows already the value that will be written to cons.tail and that > > > value does not depend on the previous read either. The CPU does not k= now we are planning to do a spinlock there, so it might do things > out-of-order without proper dependencies. > > > > > > > For __rte_ring_sc_do_dequeue(), I think you right, we might need > > > > something stronger. > > > > I don't want to put rte_smp_mb() here as it would cause full HW > > > > barrier even on machines with strong memory order (IA). > > > > I think that rte_smp_wmb() might be enough here: > > > > it would force cpu to wait till writes in DEQUEUE_PTRS() are > > > > become visible, which means reads have to be completed too. > > > > > > In practice I think that rte_smp_wmb() would work fine, even though > > > it is not strictly according to the book. Below solution would be my > > > proposal as a fix to the issue of sc dequeueing (and also to mc deque= ueing, if we have the problem of CPU completely ignoring the > spinlock in reality there): > > > > > > DEQUEUE_PTRS(); > > > .. > > > rte_smp_wmb(); > > > r->cons.tail =3D cons_next; > > > > As I said in previous email - it looks good for me for > > _rte_ring_sc_do_dequeue(), but I am interested to hear what ARM and PP= C maintainers think about it. > > Jan, Jerin do you have any comments on it? >=20 > Actually it is NOT performance effective and difficult to capture the ORD= ER dependency with plane store and load barriers on WEAK > ordered machines. > Beyond plane store and load barriers, We need to express #LoadLoad, #Loa= dStore,#StoreStore barrier dependency with Acquire and > Release Semantics in Arch neutral code(Looks like this is compiler barrie= r on IA) http://preshing.com/20120913/acquire-and-release- > semantics/ >=20 > For instance, Full barrier CAS(__sync_bool_compare_and_swap) will not be = required for weak ordered machine in MP case. > I can send out a RFC version of ring implementation changes required with= acquire-and-release-semantics. > If it has performance degradation on IA then we can separate it out throu= gh conditional compilation flag. >=20 > GCC Built-in Functions for Memory Model Aware Atomic Operations https://g= cc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html I am not sure what exactly changes you are planning, but I suppose I'd just wait for your RFC here. Though my question was: what do you think about current _rte_ring_sc_do_deq= ueue()?=20 Do you agree that rmb() is not sufficient here and does Juhamatti patch: http://dpdk.org/dev/patchwork/patch/14846/ looks good to you? It looks good to me ,and I am going to ACK it, but thought you'd better have a look too.=20 Thanks Konstantin >=20 > Thoughts ? >=20 > Jerin >=20 > > Chao, sorry but I still not sure why PPC is considered as architecture = with strong memory ordering? > > Might be I am missing something obvious here. > > Thank > > Konstantin > >