Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Kuusisaari, Juhamatti" <Juhamatti.Kuusisaari@coriant.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"'dev@dpdk.org'" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct	location
Date: Fri, 15 Jul 2016 05:40:56 +0000	[thread overview]
Message-ID: <HE1PR04MB133704D0F24EDBDB3AB9D95A9D330@HE1PR04MB1337.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B7DD85@irsmsx105.ger.corp.intel.com>


Hi Konstantin,

> Hi Juhamatti,
> 
> >
> > Hi Konstantin,
> >
> > > > > > > It is quite safe to move the barrier before DEQUEUE because
> > > > > > > after the DEQUEUE there is nothing really that we would want
> > > > > > > to protect
> > > with a
> > > > > read barrier.
> > > > > >
> > > > > > I don't think so.
> > > > > > If you remove barrier after DEQUEUE(), that means on systems
> > > > > > with relaxed memory ordering cons.tail could be updated before
> > > > > > DEQUEUE() will be finished and producer can overwrite queue
> > > > > > entries that were
> > > not
> > > > > yet dequeued.
> > > > > > So if cpu can really do such speculative out of order loads,
> > > > > > then we do need for  __rte_ring_sc_do_dequeue() something like:
> > > > > >
> > > > > > rte_smp_rmb();
> > > > > > DEQUEUE_PTRS();
> > > > > > rte_smp_rmb();
> > > >
> > > > You have a valid point here, there needs to be a guarantee that
> > > > cons_tail
> > > cannot
> > > > be updated before DEQUEUE is completed. Nevertheless, my point was
> > > that it is
> > > > not guaranteed with a read barrier anyway. The implementation has
> > > > the
> > > following
> > > > sequence
> > > >
> > > > DEQUEUE_PTRS(); (i.e. READ/LOAD)
> > > > rte_smp_rmb();
> > > > ..
> > > > r->cons.tail = cons_next; (i.e WRITE/STORE)
> > > >
> > > > Above read barrier does not guarantee any ordering for the
> > > > following
> > > writes/stores.
> > > > As a guarantee is needed, I think we in fact need to change the
> > > > read barrier
> > > on the
> > > > dequeue to a full barrier, which guarantees the read+write order,
> > > > as
> > > follows
> > > >
> > > > DEQUEUE_PTRS();
> > > > rte_smp_mb();
> > > > ..
> > > > r->cons.tail = cons_next;
> > > >
> > > > If you agree, I can for sure prepare another patch for this issue.
> > >
> > > Hmm, I think for __rte_ring_mc_do_dequeue() we are ok with
> > > smp_rmb(), as we have to read cons.tail anyway.
> >
> > Are you certain that this read creates strong enough dependency between
> read of cons.tail and the write of it on the mc_do_dequeue()?
> 
> Yes, I believe so.
> 
> > I think it does not really create any control dependency there as the next
> write is not dependent of the result of the read.
> 
> I think it is dependent: cons.tail can be updated only if it's current value is
> eual to precomputed before cons_head.
> So cpu has to read cons.tail value first.

I was thinking that it might match to this processing pattern:

S1.         if (a == b)
S2.             a = a + b
S3.         b = a + b

Above S3 has no control dependency to S1 and can be in theory executed
in parallel. 

In the (simplified) implementation we have:

X1.         while (a != b)
X2.             { }
X3.         a = c

If we would consider S3 and X3 equal, there might be a problem with coherence 
without a write barrier after the while(). It may of course be purely theoretical, 
depending on the HW. This is anyway the reason for my suggestion to have a 
write-barrier also on the mc_do_dequeue() just before tail update.

--
 Juhamatti 

> > The CPU also
> > knows already the value that will be written to cons.tail and that
> > value does not depend on the previous read either. The CPU does not
> know we are planning to do a spinlock there, so it might do things out-of-
> order without proper dependencies.
> >
> > > For  __rte_ring_sc_do_dequeue(), I think you right, we might need
> > > something stronger.
> > > I don't want to put rte_smp_mb() here as it would cause full HW
> > > barrier even on machines with strong memory order (IA).
> > > I think that rte_smp_wmb() might be enough here:
> > > it would force cpu to wait till writes in DEQUEUE_PTRS() are become
> > > visible, which means reads have to be completed too.
> >
> > In practice I think that rte_smp_wmb() would work fine, even though it
> > is not strictly according to the book. Below solution would be my
> > proposal as a fix to the issue of sc dequeueing (and also to mc dequeueing,
> if we have the problem of CPU completely ignoring the spinlock in reality
> there):
> >
> > DEQUEUE_PTRS();
> > ..
> > rte_smp_wmb();
> > r->cons.tail = cons_next;
> 
> As I said in previous email - it looks good for me for
> _rte_ring_sc_do_dequeue(), but I am interested to hear what  ARM and PPC
> maintainers think about it.
> Jan, Jerin do you have any comments on it?
> Chao, sorry but I still not sure why PPC is considered as architecture with
> strong memory ordering?
> Might be I am missing something obvious here.
> Thank
> Konstantin
> 
> >
> > --
> >  Juhamatti
> >
> > > Another option would be to define a new macro: rte_weak_mb() or so,
> > > that would be expanded into CB on boxes with strong memory model,
> > > and to full MB on machines with relaxed ones.
> > > Interested to hear what ARM and PPC guys think.
> > > Konstantin
> > >
> > > P.S. Another thing a bit off-topic - for PPC guys:
> > > As I can see smp_rmb/smp_wmb are just a complier barriers:
> > > find lib/librte_eal/common/include/arch/ppc_64/ -type f | xargs grep
> > > smp_ lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > > rte_smp_mb() rte_mb()
> > > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > > rte_smp_wmb() rte_compiler_barrier()
> > > lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:#define
> > > rte_smp_rmb() rte_compiler_barrier() My knowledge about PPC
> > > architecture is rudimental, but is that really enough?
> > >
> > > >
> > > > Thanks,
> > > > --
> > > >  Juhamatti
> > > >
> > > > > > Konstantin
> > > > > >
> > > > > > > The read
> > > > > > > barrier is mapped to a compiler barrier on strong memory
> > > > > > > model systems and this works fine too as the order of the
> > > > > > > head,tail updates is still guaranteed on the new location.
> > > > > > > Even if the problem would be theoretical on most systems, it
> > > > > > > is worth fixing as the risk for
> > > > > problems is very low.
> > > > > > >
> > > > > > > --
> > > > > > >  Juhamatti
> > > > > > >
> > > > > > > > Konstantin
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Juhamatti Kuusisaari
> > > > > > > > > > > <juhamatti.kuusisaari@coriant.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  lib/librte_ring/rte_ring.h | 6 ++++--
> > > > > > > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49
> > > > > > > > > > > 100644
> > > > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > > > @@ -662,9 +662,10 @@
> __rte_ring_mc_do_dequeue(struct
> > > > > > > > > > > rte_ring *r,
> > > > > > > > > > void **obj_table,
> > > > > > > > > > >                                               cons_next);
> > > > > > > > > > >         } while (unlikely(success == 0));
> > > > > > > > > > >
> > > > > > > > > > > +       rte_smp_rmb();
> > > > > > > > > > > +
> > > > > > > > > > >         /* copy in table */
> > > > > > > > > > >         DEQUEUE_PTRS();
> > > > > > > > > > > -       rte_smp_rmb();
> > > > > > > > > > >
> > > > > > > > > > >         /*
> > > > > > > > > > >          * If there are other dequeues in progress
> > > > > > > > > > > that preceded us, @@ -746,9 +747,10 @@
> > > > > > > > > > > __rte_ring_sc_do_dequeue(struct rte_ring *r,
> > > > > > > > > > void **obj_table,
> > > > > > > > > > >         cons_next = cons_head + n;
> > > > > > > > > > >         r->cons.head = cons_next;
> > > > > > > > > > >
> > > > > > > > > > > +       rte_smp_rmb();
> > > > > > > > > > > +
> > > > > > > > > > >         /* copy in table */
> > > > > > > > > > >         DEQUEUE_PTRS();
> > > > > > > > > > > -       rte_smp_rmb();
> > > > > > > > > > >
> > > > > > > > > > >         __RING_STAT_ADD(r, deq_success, n);
> > > > > > > > > > >         r->cons.tail = cons_next;
> > > > > > > > > > > --
> > > > > > > > > > > 2.9.0
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> ==========================================================
> > > > > > > > > > ==
> > > > > > > > > > > The information contained in this message may be
> > > > > > > > > > > privileged and confidential and protected from
> > > > > > > > > > > disclosure. If the reader of this message is not the
> > > > > > > > > > > intended recipient, or an employee or agent
> > > > > > > > > > > responsible for delivering this message to the
> > > > > > > > > > > intended recipient, you are hereby notified that any
> > > > > > > > > > > reproduction, dissemination or distribution of this
> > > > > > > > > > > communication is strictly prohibited. If you have
> > > > > > > > > > > received this communication in error, please notify
> > > > > > > > > > > us immediately by replying to the message and
> > > > > > > > > > > deleting it from your
> > > > > > > > computer. Thank you.
> > > > > > > > > > > Coriant-Tellabs
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > >
> ==========================================================
> > > > > > > > > > ==

next prev parent reply	other threads:[~2016-07-15  5:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 10:20 Juhamatti Kuusisaari
2016-07-11 10:41 ` Ananyev, Konstantin
2016-07-11 11:22   ` Kuusisaari, Juhamatti
2016-07-11 11:40     ` Olivier Matz
2016-07-12  4:10       ` Kuusisaari, Juhamatti
2016-07-11 12:34     ` Ananyev, Konstantin
2016-07-12  5:27       ` Kuusisaari, Juhamatti
2016-07-12 11:01         ` Ananyev, Konstantin
2016-07-12 17:58           ` Ananyev, Konstantin
2016-07-13  5:27             ` Kuusisaari, Juhamatti
2016-07-13 13:00               ` Ananyev, Konstantin
2016-07-14  4:17                 ` Kuusisaari, Juhamatti
2016-07-14 12:56                   ` Ananyev, Konstantin
2016-07-15  5:40                     ` Kuusisaari, Juhamatti [this message]
2016-07-15  6:29                     ` Jerin Jacob
2016-07-15 10:34                       ` Ananyev, Konstantin
2016-07-18  2:47                         ` Jerin Jacob

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR04MB133704D0F24EDBDB3AB9D95A9D330@HE1PR04MB1337.eurprd04.prod.outlook.com \
    --to=juhamatti.kuusisaari@coriant.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).