DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Kuusisaari, Juhamatti" <Juhamatti.Kuusisaari@coriant.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct	location
Date: Tue, 12 Jul 2016 05:27:30 +0000	[thread overview]
Message-ID: <HE1PR04MB1337546AF3AF7176E7B5FCF79D300@HE1PR04MB1337.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com>


Hello,

> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Juhamatti
> > > > Kuusisaari
> > > > Sent: Monday, July 11, 2016 11:21 AM
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to
> > > > correct location
> > > >
> > > > Fix the location of the rte_ring data dependency read barrier.
> > > > It needs to be called before accessing indexed data to ensure that
> > > > the data itself is guaranteed to be correctly updated.
> > > >
> > > > See more details at kernel/Documentation/memory-barriers.txt
> > > > section 'Data dependency barriers'.
> > >
> > >
> > > Any explanation why?
> > > From my point smp_rmb()s are on the proper places here :) Konstantin
> >
> > The problem here is that on a weak memory model system the CPU is
> > allowed to load the address data out-of-order in advance.
> > If the read barrier is after the DEQUEUE, you might end up having the
> > old data there on a race situation when the buffer is continuously full.
> > Having it before the DEQUEUE guarantees that the load is not done in
> > advance.
> 
> Sorry, still didn't see any race condition in the current code.
> Can you provide any particular example?
> From other side, moving smp_rmb() before dequeueing the objects, could
> introduce a race condition, on cpus where later writes can be reordered with
> earlier reads.

Here is a simplified example sequence from time perspective:
1. Consumer CPU (CCPU) loads value y from r->ring[x] out-of-order 
(the key of the problem)
2. Producer CPU (PCPU) updates r->ring[x] to value be z
3. PCPU updates prod_tail to be x
4. CCPU updates cons_head to be x
5. CCPU loads r->ring[x] by using out-of-order loaded value y [is z in reality]

The problem here is that on weak memory model, the CCPU is allowed to load
r->ring[x] value in advance, if it decides to do so (CCPU needs to be able to see
in advance that x will be an interesting index worth loading). The index value x 
is updated atomically,  but it does not matter here. Also, the write barrier on PCPU 
side guarantees that CCPU cannot see update of x before PCPU has really updated 
the r->ring[x] to z and moved the tail, but still allows to do the out-of-order loads 
without proper read barrier. 

When the read barrier is moved between steps 4 and 5, it disallows to use 
any out-of-order loads so far and forces to drop r->ring[x] y value and
load current value z. 

The ring queue appears to work well as this is a rare corner case. Due to the 
head,tail-structure the problem needs queue to be full and also CCPU needs 
to see r->ring[x] update later than it does the out-of-order load. In addition,
the HW needs to be able to predict and choose the load to the future index 
(which should be quite possible, considering modern CPUs). If you have seen 
in the past problems and noticed that a larger ring queue works better as a 
workaround, you may have encountered the problem already.

It is quite safe to move the barrier before DEQUEUE because after the DEQUEUE 
there is nothing really that we would want to protect with a read barrier. The read 
barrier is mapped to a compiler barrier on strong memory model systems and this 
works fine too as the order of the head,tail updates is still guaranteed on the new 
location. Even if the problem would be theoretical on most systems, it is worth fixing 
as the risk for problems is very low.

--
 Juhamatti

> Konstantin




> > > >
> > > > Signed-off-by: Juhamatti Kuusisaari
> > > > <juhamatti.kuusisaari@coriant.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r,
> > > void **obj_table,
> > > >                                               cons_next);
> > > >         } while (unlikely(success == 0));
> > > >
> > > > +       rte_smp_rmb();
> > > > +
> > > >         /* copy in table */
> > > >         DEQUEUE_PTRS();
> > > > -       rte_smp_rmb();
> > > >
> > > >         /*
> > > >          * If there are other dequeues in progress that preceded
> > > > us, @@ -746,9 +747,10 @@ __rte_ring_sc_do_dequeue(struct rte_ring
> > > > *r,
> > > void **obj_table,
> > > >         cons_next = cons_head + n;
> > > >         r->cons.head = cons_next;
> > > >
> > > > +       rte_smp_rmb();
> > > > +
> > > >         /* copy in table */
> > > >         DEQUEUE_PTRS();
> > > > -       rte_smp_rmb();
> > > >
> > > >         __RING_STAT_ADD(r, deq_success, n);
> > > >         r->cons.tail = cons_next;
> > > > --
> > > > 2.9.0
> > > >
> > > >
> > > >
> > >
> ==========================================================
> > > ==
> > > > The information contained in this message may be privileged and
> > > > confidential and protected from disclosure. If the reader of this
> > > > message is not the intended recipient, or an employee or agent
> > > > responsible for delivering this message to the intended recipient,
> > > > you are hereby notified that any reproduction, dissemination or
> > > > distribution of this communication is strictly prohibited. If you
> > > > have received this communication in error, please notify us
> > > > immediately by replying to the message and deleting it from your
> computer. Thank you.
> > > > Coriant-Tellabs
> > > >
> > >
> ==========================================================
> > > ==

  reply	other threads:[~2016-07-12  5:27 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 10:20 Juhamatti Kuusisaari
2016-07-11 10:41 ` Ananyev, Konstantin
2016-07-11 11:22   ` Kuusisaari, Juhamatti
2016-07-11 11:40     ` Olivier Matz
2016-07-12  4:10       ` Kuusisaari, Juhamatti
2016-07-11 12:34     ` Ananyev, Konstantin
2016-07-12  5:27       ` Kuusisaari, Juhamatti [this message]
2016-07-12 11:01         ` Ananyev, Konstantin
2016-07-12 17:58           ` Ananyev, Konstantin
2016-07-13  5:27             ` Kuusisaari, Juhamatti
2016-07-13 13:00               ` Ananyev, Konstantin
2016-07-14  4:17                 ` Kuusisaari, Juhamatti
2016-07-14 12:56                   ` Ananyev, Konstantin
2016-07-15  5:40                     ` Kuusisaari, Juhamatti
2016-07-15  6:29                     ` Jerin Jacob
2016-07-15 10:34                       ` Ananyev, Konstantin
2016-07-18  2:47                         ` Jerin Jacob

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=HE1PR04MB1337546AF3AF7176E7B5FCF79D300@HE1PR04MB1337.eurprd04.prod.outlook.com \
    --to=juhamatti.kuusisaari@coriant.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).