From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 3CEAA37A4 for ; Thu, 14 Jul 2016 14:56:15 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP; 14 Jul 2016 05:56:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,362,1464678000"; d="scan'208";a="139120869" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by fmsmga004.fm.intel.com with ESMTP; 14 Jul 2016 05:56:13 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.51]) by IRSMSX153.ger.corp.intel.com ([169.254.9.105]) with mapi id 14.03.0248.002; Thu, 14 Jul 2016 13:56:11 +0100 From: "Ananyev, Konstantin" To: "Kuusisaari, Juhamatti" , "'dev@dpdk.org'" CC: "Jerin Jacob (jerin.jacob@caviumnetworks.com)" , "Jan Viktorin (viktorin@rehivetech.com)" , "Chao Zhu (bjzhuc@cn.ibm.com)" Thread-Topic: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location Thread-Index: AQHR214YM0lag5RGeUO4brWmDxiYS6ATCkbA///7T4CAACNU4IABC8IAgABhnkCAAH6EEIAAsicAgACKNzCAAPSCgIAAnknA Date: Thu, 14 Jul 2016 12:56:11 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B7DD85@irsmsx105.ger.corp.intel.com> References: <20160711102055.14855-1-juhamatti.kuusisaari@coriant.com> <2601191342CEEE43887BDE71AB97725836B7C858@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D850@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2016 12:56:17 -0000 Hi Juhamatti, =20 >=20 > Hi Konstantin, >=20 > > > > > > It is quite safe to move the barrier before DEQUEUE because > > > > > > after the DEQUEUE there is nothing really that we would want > > > > > > to protect > > with a > > > > read barrier. > > > > > > > > > > I don't think so. > > > > > If you remove barrier after DEQUEUE(), that means on systems > > > > > with relaxed memory ordering cons.tail could be updated before > > > > > DEQUEUE() will be finished and producer can overwrite queue > > > > > entries that were > > not > > > > yet dequeued. > > > > > So if cpu can really do such speculative out of order loads, > > > > > then we do need for __rte_ring_sc_do_dequeue() something like: > > > > > > > > > > rte_smp_rmb(); > > > > > DEQUEUE_PTRS(); > > > > > rte_smp_rmb(); > > > > > > You have a valid point here, there needs to be a guarantee that > > > cons_tail > > cannot > > > be updated before DEQUEUE is completed. Nevertheless, my point was > > that it is > > > not guaranteed with a read barrier anyway. The implementation has > > > the > > following > > > sequence > > > > > > DEQUEUE_PTRS(); (i.e. READ/LOAD) > > > rte_smp_rmb(); > > > .. > > > r->cons.tail =3D cons_next; (i.e WRITE/STORE) > > > > > > Above read barrier does not guarantee any ordering for the following > > writes/stores. > > > As a guarantee is needed, I think we in fact need to change the read > > > barrier > > on the > > > dequeue to a full barrier, which guarantees the read+write order, as > > follows > > > > > > DEQUEUE_PTRS(); > > > rte_smp_mb(); > > > .. > > > r->cons.tail =3D cons_next; > > > > > > If you agree, I can for sure prepare another patch for this issue. > > > > Hmm, I think for __rte_ring_mc_do_dequeue() we are ok with smp_rmb(), > > as we have to read cons.tail anyway. >=20 > Are you certain that this read creates strong enough dependency between r= ead of cons.tail and the write of it on the mc_do_dequeue()?=20 Yes, I believe so. > I think it does not really create any control dependency there as the nex= t write is not dependent of the result of the read. I think it is dependent: cons.tail can be updated only if it's current valu= e is eual to precomputed before cons_head. So cpu has to read cons.tail value first. > The CPU also > knows already the value that will be written to cons.tail and that value = does not depend on the previous read either. The CPU does not > know we are planning to do a spinlock there, so it might do things out-of= -order without proper dependencies. >=20 > > For __rte_ring_sc_do_dequeue(), I think you right, we might need > > something stronger. > > I don't want to put rte_smp_mb() here as it would cause full HW > > barrier even on machines with strong memory order (IA). > > I think that rte_smp_wmb() might be enough here: > > it would force cpu to wait till writes in DEQUEUE_PTRS() are become > > visible, which means reads have to be completed too. >=20 > In practice I think that rte_smp_wmb() would work fine, even though it is= not strictly according to the book. Below solution would be my > proposal as a fix to the issue of sc dequeueing (and also to mc dequeuein= g, if we have the problem of CPU completely ignoring the spinlock > in reality there): >=20 > DEQUEUE_PTRS(); > .. > rte_smp_wmb(); > r->cons.tail =3D cons_next; As I said in previous email - it looks good for me for _rte_ring_sc_do_dequ= eue(), but I am interested to hear what ARM and PPC maintainers think about it. Jan, Jerin do you have any comments on it? Chao, sorry but I still not sure why PPC is considered as architecture with= strong memory ordering? Might be I am missing something obvious here. Thank Konstantin >=20 > -- > Juhamatti >=20 > > Another option would be to define a new macro: rte_weak_mb() or so, > > that would be expanded into CB on boxes with strong memory model, and > > to full MB on machines with relaxed ones. > > Interested to hear what ARM and PPC guys think. > > Konstantin > > > > P.S. Another thing a bit off-topic - for PPC guys: > > As I can see smp_rmb/smp_wmb are just a complier barriers: > > find lib/librte_eal/common/include/arch/ppc_64/ -type f | xargs grep > > smp_ lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define > > rte_smp_mb() rte_mb() > > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define > > rte_smp_wmb() rte_compiler_barrier() > > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define > > rte_smp_rmb() rte_compiler_barrier() > > My knowledge about PPC architecture is rudimental, but is that really e= nough? > > > > > > > > Thanks, > > > -- > > > Juhamatti > > > > > > > > Konstantin > > > > > > > > > > > The read > > > > > > barrier is mapped to a compiler barrier on strong memory model > > > > > > systems and this works fine too as the order of the head,tail > > > > > > updates is still guaranteed on the new location. Even if the > > > > > > problem would be theoretical on most systems, it is worth > > > > > > fixing as the risk for > > > > problems is very low. > > > > > > > > > > > > -- > > > > > > Juhamatti > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Juhamatti Kuusisaari > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > lib/librte_ring/rte_ring.h | 6 ++++-- > > > > > > > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h > > > > > > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49 > > > > > > > > > > 100644 > > > > > > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > > > > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct > > > > > > > > > > rte_ring *r, > > > > > > > > > void **obj_table, > > > > > > > > > > cons_next= ); > > > > > > > > > > } while (unlikely(success =3D=3D 0)); > > > > > > > > > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > > > > > + > > > > > > > > > > /* copy in table */ > > > > > > > > > > DEQUEUE_PTRS(); > > > > > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > * If there are other dequeues in progress > > > > > > > > > > that preceded us, @@ -746,9 +747,10 @@ > > > > > > > > > > __rte_ring_sc_do_dequeue(struct rte_ring *r, > > > > > > > > > void **obj_table, > > > > > > > > > > cons_next =3D cons_head + n; > > > > > > > > > > r->cons.head =3D cons_next; > > > > > > > > > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > > > > > + > > > > > > > > > > /* copy in table */ > > > > > > > > > > DEQUEUE_PTRS(); > > > > > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > > > > > > > > > __RING_STAT_ADD(r, deq_success, n); > > > > > > > > > > r->cons.tail =3D cons_next; > > > > > > > > > > -- > > > > > > > > > > 2.9.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > =3D=3D > > > > > > > > > > The information contained in this message may be > > > > > > > > > > privileged and confidential and protected from > > > > > > > > > > disclosure. If the reader of this message is not the > > > > > > > > > > intended recipient, or an employee or agent > > > > > > > > > > responsible for delivering this message to the > > > > > > > > > > intended recipient, you are hereby notified that any > > > > > > > > > > reproduction, dissemination or distribution of this > > > > > > > > > > communication is strictly prohibited. If you have > > > > > > > > > > received this communication in error, please notify us > > > > > > > > > > immediately by replying to the message and deleting it > > > > > > > > > > from your > > > > > > > computer. Thank you. > > > > > > > > > > Coriant-Tellabs > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > =3D=3D