From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 3CEAA37A4
 for <dev@dpdk.org>; Thu, 14 Jul 2016 14:56:15 +0200 (CEST)
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by orsmga101.jf.intel.com with ESMTP; 14 Jul 2016 05:56:14 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,362,1464678000"; d="scan'208";a="139120869"
Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75])
 by fmsmga004.fm.intel.com with ESMTP; 14 Jul 2016 05:56:13 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.51]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.105]) with mapi id 14.03.0248.002;
 Thu, 14 Jul 2016 13:56:11 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Kuusisaari, Juhamatti" <Juhamatti.Kuusisaari@coriant.com>,
 "'dev@dpdk.org'" <dev@dpdk.org>
CC: "Jerin Jacob (jerin.jacob@caviumnetworks.com)"
 <jerin.jacob@caviumnetworks.com>, "Jan Viktorin (viktorin@rehivetech.com)"
 <viktorin@rehivetech.com>, "Chao Zhu (bjzhuc@cn.ibm.com)" <bjzhuc@cn.ibm.com>
Thread-Topic: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct
 location
Thread-Index: AQHR214YM0lag5RGeUO4brWmDxiYS6ATCkbA///7T4CAACNU4IABC8IAgABhnkCAAH6EEIAAsicAgACKNzCAAPSCgIAAnknA
Date: Thu, 14 Jul 2016 12:56:11 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836B7DD85@irsmsx105.ger.corp.intel.com>
References: <20160711102055.14855-1-juhamatti.kuusisaari@coriant.com>
 <2601191342CEEE43887BDE71AB97725836B7C858@irsmsx105.ger.corp.intel.com>
 <HE1PR04MB1337E26DB33E9E9F8196FF0B9D3F0@HE1PR04MB1337.eurprd04.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com>
 <HE1PR04MB1337546AF3AF7176E7B5FCF79D300@HE1PR04MB1337.eurprd04.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com>
 <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com>
 <HE1PR04MB1337C6F5F76738289CD95F5B9D310@HE1PR04MB1337.eurprd04.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725836B7D850@irsmsx105.ger.corp.intel.com>
 <HE1PR04MB1337FF2B8D5258B9606303039D320@HE1PR04MB1337.eurprd04.prod.outlook.com>
In-Reply-To: <HE1PR04MB1337FF2B8D5258B9606303039D320@HE1PR04MB1337.eurprd04.prod.outlook.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to
	correct	location
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Jul 2016 12:56:17 -0000


Hi Juhamatti,
=20
>=20
> Hi Konstantin,
>=20
> > > > > > It is quite safe to move the barrier before DEQUEUE because
> > > > > > after the DEQUEUE there is nothing really that we would want
> > > > > > to protect
> > with a
> > > > read barrier.
> > > > >
> > > > > I don't think so.
> > > > > If you remove barrier after DEQUEUE(), that means on systems
> > > > > with relaxed memory ordering cons.tail could be updated before
> > > > > DEQUEUE() will be finished and producer can overwrite queue
> > > > > entries that were
> > not
> > > > yet dequeued.
> > > > > So if cpu can really do such speculative out of order loads,
> > > > > then we do need for  __rte_ring_sc_do_dequeue() something like:
> > > > >
> > > > > rte_smp_rmb();
> > > > > DEQUEUE_PTRS();
> > > > > rte_smp_rmb();
> > >
> > > You have a valid point here, there needs to be a guarantee that
> > > cons_tail
> > cannot
> > > be updated before DEQUEUE is completed. Nevertheless, my point was
> > that it is
> > > not guaranteed with a read barrier anyway. The implementation has
> > > the
> > following
> > > sequence
> > >
> > > DEQUEUE_PTRS(); (i.e. READ/LOAD)
> > > rte_smp_rmb();
> > > ..
> > > r->cons.tail =3D cons_next; (i.e WRITE/STORE)
> > >
> > > Above read barrier does not guarantee any ordering for the following
> > writes/stores.
> > > As a guarantee is needed, I think we in fact need to change the read
> > > barrier
> > on the
> > > dequeue to a full barrier, which guarantees the read+write order, as
> > follows
> > >
> > > DEQUEUE_PTRS();
> > > rte_smp_mb();
> > > ..
> > > r->cons.tail =3D cons_next;
> > >
> > > If you agree, I can for sure prepare another patch for this issue.
> >
> > Hmm, I think for __rte_ring_mc_do_dequeue() we are ok with smp_rmb(),
> > as we have to read cons.tail anyway.
>=20
> Are you certain that this read creates strong enough dependency between r=
ead of cons.tail and the write of it on the mc_do_dequeue()?=20

Yes, I believe so.

> I think it does not really create any control dependency there as the nex=
t write is not dependent of the result of the read.

I think it is dependent: cons.tail can be updated only if it's current valu=
e is eual to precomputed before cons_head.
So cpu has to read cons.tail value first.

> The CPU also
> knows already the value that will be written to cons.tail and that value =
does not depend on the previous read either. The CPU does not
> know we are planning to do a spinlock there, so it might do things out-of=
-order without proper dependencies.
>=20
> > For  __rte_ring_sc_do_dequeue(), I think you right, we might need
> > something stronger.
> > I don't want to put rte_smp_mb() here as it would cause full HW
> > barrier even on machines with strong memory order (IA).
> > I think that rte_smp_wmb() might be enough here:
> > it would force cpu to wait till writes in DEQUEUE_PTRS() are become
> > visible, which means reads have to be completed too.
>=20
> In practice I think that rte_smp_wmb() would work fine, even though it is=
 not strictly according to the book. Below solution would be my
> proposal as a fix to the issue of sc dequeueing (and also to mc dequeuein=
g, if we have the problem of CPU completely ignoring the spinlock
> in reality there):
>=20
> DEQUEUE_PTRS();
> ..
> rte_smp_wmb();
> r->cons.tail =3D cons_next;

As I said in previous email - it looks good for me for _rte_ring_sc_do_dequ=
eue(),
but I am interested to hear what  ARM and PPC maintainers think about it.
Jan, Jerin do you have any comments on it?
Chao, sorry but I still not sure why PPC is considered as architecture with=
 strong memory ordering?
Might be I am missing something obvious here.
Thank
Konstantin

>=20
> --
>  Juhamatti
>=20
> > Another option would be to define a new macro: rte_weak_mb() or so,
> > that would be expanded into CB on boxes with strong memory model, and
> > to full MB on machines with relaxed ones.
> > Interested to hear what ARM and PPC guys think.
> > Konstantin
> >
> > P.S. Another thing a bit off-topic - for PPC guys:
> > As I can see smp_rmb/smp_wmb are just a complier barriers:
> > find lib/librte_eal/common/include/arch/ppc_64/ -type f | xargs grep
> > smp_ lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > rte_smp_mb() rte_mb()
> > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > rte_smp_wmb() rte_compiler_barrier()
> > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > rte_smp_rmb() rte_compiler_barrier()
> > My knowledge about PPC architecture is rudimental, but is that really e=
nough?
> >
> > >
> > > Thanks,
> > > --
> > >  Juhamatti
> > >
> > > > > Konstantin
> > > > >
> > > > > > The read
> > > > > > barrier is mapped to a compiler barrier on strong memory model
> > > > > > systems and this works fine too as the order of the head,tail
> > > > > > updates is still guaranteed on the new location. Even if the
> > > > > > problem would be theoretical on most systems, it is worth
> > > > > > fixing as the risk for
> > > > problems is very low.
> > > > > >
> > > > > > --
> > > > > >  Juhamatti
> > > > > >
> > > > > > > Konstantin
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Juhamatti Kuusisaari
> > > > > > > > > > <juhamatti.kuusisaari@coriant.com>
> > > > > > > > > > ---
> > > > > > > > > >  lib/librte_ring/rte_ring.h | 6 ++++--
> > > > > > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49
> > > > > > > > > > 100644
> > > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct
> > > > > > > > > > rte_ring *r,
> > > > > > > > > void **obj_table,
> > > > > > > > > >                                               cons_next=
);
> > > > > > > > > >         } while (unlikely(success =3D=3D 0));
> > > > > > > > > >
> > > > > > > > > > +       rte_smp_rmb();
> > > > > > > > > > +
> > > > > > > > > >         /* copy in table */
> > > > > > > > > >         DEQUEUE_PTRS();
> > > > > > > > > > -       rte_smp_rmb();
> > > > > > > > > >
> > > > > > > > > >         /*
> > > > > > > > > >          * If there are other dequeues in progress
> > > > > > > > > > that preceded us, @@ -746,9 +747,10 @@
> > > > > > > > > > __rte_ring_sc_do_dequeue(struct rte_ring *r,
> > > > > > > > > void **obj_table,
> > > > > > > > > >         cons_next =3D cons_head + n;
> > > > > > > > > >         r->cons.head =3D cons_next;
> > > > > > > > > >
> > > > > > > > > > +       rte_smp_rmb();
> > > > > > > > > > +
> > > > > > > > > >         /* copy in table */
> > > > > > > > > >         DEQUEUE_PTRS();
> > > > > > > > > > -       rte_smp_rmb();
> > > > > > > > > >
> > > > > > > > > >         __RING_STAT_ADD(r, deq_success, n);
> > > > > > > > > >         r->cons.tail =3D cons_next;
> > > > > > > > > > --
> > > > > > > > > > 2.9.0
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > > > > > > > =3D=3D
> > > > > > > > > > The information contained in this message may be
> > > > > > > > > > privileged and confidential and protected from
> > > > > > > > > > disclosure. If the reader of this message is not the
> > > > > > > > > > intended recipient, or an employee or agent
> > > > > > > > > > responsible for delivering this message to the
> > > > > > > > > > intended recipient, you are hereby notified that any
> > > > > > > > > > reproduction, dissemination or distribution of this
> > > > > > > > > > communication is strictly prohibited. If you have
> > > > > > > > > > received this communication in error, please notify us
> > > > > > > > > > immediately by replying to the message and deleting it
> > > > > > > > > > from your
> > > > > > > computer. Thank you.
> > > > > > > > > > Coriant-Tellabs
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > > > > > > > =3D=3D