From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 0C89B952 for ; Tue, 12 Jul 2016 19:58:03 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP; 12 Jul 2016 10:58:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,352,1464678000"; d="scan'208";a="1020318854" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by fmsmga002.fm.intel.com with ESMTP; 12 Jul 2016 10:58:02 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.51]) by IRSMSX154.ger.corp.intel.com ([169.254.12.28]) with mapi id 14.03.0248.002; Tue, 12 Jul 2016 18:58:01 +0100 From: "Ananyev, Konstantin" To: "'Kuusisaari, Juhamatti'" , "'dev@dpdk.org'" Thread-Topic: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location Thread-Index: AQHR214YM0lag5RGeUO4brWmDxiYS6ATCkbA///7T4CAACNU4IABC8IAgABhnkCAAH6EEA== Date: Tue, 12 Jul 2016 17:58:00 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com> References: <20160711102055.14855-1-juhamatti.kuusisaari@coriant.com> <2601191342CEEE43887BDE71AB97725836B7C858@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2016 17:58:04 -0000 >=20 >=20 > Hi Juhamatti, >=20 > > > > Hello, > > > > > > > > -----Original Message----- > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Juhamatti > > > > > > Kuusisaari > > > > > > Sent: Monday, July 11, 2016 11:21 AM > > > > > > To: dev@dpdk.org > > > > > > Subject: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to > > > > > > correct location > > > > > > > > > > > > Fix the location of the rte_ring data dependency read barrier. > > > > > > It needs to be called before accessing indexed data to ensure t= hat > > > > > > the data itself is guaranteed to be correctly updated. > > > > > > > > > > > > See more details at kernel/Documentation/memory-barriers.txt > > > > > > section 'Data dependency barriers'. > > > > > > > > > > > > > > > Any explanation why? > > > > > From my point smp_rmb()s are on the proper places here :) Konstan= tin > > > > > > > > The problem here is that on a weak memory model system the CPU is > > > > allowed to load the address data out-of-order in advance. > > > > If the read barrier is after the DEQUEUE, you might end up having t= he > > > > old data there on a race situation when the buffer is continuously = full. > > > > Having it before the DEQUEUE guarantees that the load is not done i= n > > > > advance. > > > > > > Sorry, still didn't see any race condition in the current code. > > > Can you provide any particular example? > > > From other side, moving smp_rmb() before dequeueing the objects, coul= d > > > introduce a race condition, on cpus where later writes can be reorder= ed with > > > earlier reads. > > > > Here is a simplified example sequence from time perspective: > > 1. Consumer CPU (CCPU) loads value y from r->ring[x] out-of-order > > (the key of the problem) >=20 > To read the value of ring[x] cpu has to calculate x first. > And to calculate x it needs to read cons.head and prod.tail first. > Are you saying that some modern cpu can: > -'speculate' value of cons.head and prod.tail > (based on what?) > -calculate x based on these speculated values. > - read ring[x] > - read cons.head and prod.tail > - if read values are not equal to speculated ones , then > re-caluclate x and re-read ring[x] > - else use speculatively read ring[x] > ? > If such thing is possible (is it really? and if yes on which cpu?), As I can see, neither ARM or PPC support such things. Both of them do obey address dependency. (ARM & PPC guys feel free to correct me here, if I am wrong here). So what cpu we are talking about? Konstantin > then yes, we might need an extra smp_rmb() before DEQUEUE_PTRS() > for __rte_ring_sc_do_dequeue(). > For __rte_ring_mc_do_dequeue(), I think we are ok, as > there is CAS just before DEQUEUE_PTRS(). >=20 > > 2. Producer CPU (PCPU) updates r->ring[x] to value be z > > 3. PCPU updates prod_tail to be x > > 4. CCPU updates cons_head to be x > > 5. CCPU loads r->ring[x] by using out-of-order loaded value y [is z in = reality] > > > > The problem here is that on weak memory model, the CCPU is allowed to l= oad > > r->ring[x] value in advance, if it decides to do so (CCPU needs to be a= ble to see > > in advance that x will be an interesting index worth loading). The inde= x value x > > is updated atomically, but it does not matter here. Also, the write ba= rrier on PCPU > > side guarantees that CCPU cannot see update of x before PCPU has really= updated > > the r->ring[x] to z and moved the tail, but still allows to do the out-= of-order loads > > without proper read barrier. > > > > When the read barrier is moved between steps 4 and 5, it disallows to u= se > > any out-of-order loads so far and forces to drop r->ring[x] y value and > > load current value z. > > > > The ring queue appears to work well as this is a rare corner case. Due = to the > > head,tail-structure the problem needs queue to be full and also CCPU ne= eds > > to see r->ring[x] update later than it does the out-of-order load. In a= ddition, > > the HW needs to be able to predict and choose the load to the future in= dex > > (which should be quite possible, considering modern CPUs). If you have = seen > > in the past problems and noticed that a larger ring queue works better = as a > > workaround, you may have encountered the problem already. >=20 > I don't understand what means 'larger rings works better' here. > What we are talking about is race condition, that if hit, would > cause data corruption and most likely a crash. >=20 > > > > It is quite safe to move the barrier before DEQUEUE because after the D= EQUEUE > > there is nothing really that we would want to protect with a read barri= er. >=20 > I don't think so. > If you remove barrier after DEQUEUE(), that means on systems with relaxed= memory ordering > cons.tail could be updated before DEQUEUE() will be finished and producer= can overwrite > queue entries that were not yet dequeued. > So if cpu can really do such speculative out of order loads, > then we do need for __rte_ring_sc_do_dequeue() something like: >=20 > rte_smp_rmb(); > DEQUEUE_PTRS(); > rte_smp_rmb(); >=20 > Konstantin >=20 > > The read > > barrier is mapped to a compiler barrier on strong memory model systems = and this > > works fine too as the order of the head,tail updates is still guarantee= d on the new > > location. Even if the problem would be theoretical on most systems, it = is worth fixing > > as the risk for problems is very low. > > > > -- > > Juhamatti > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > Signed-off-by: Juhamatti Kuusisaari > > > > > > > > > > > > --- > > > > > > lib/librte_ring/rte_ring.h | 6 ++++-- > > > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h > > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49 100644 > > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *= r, > > > > > void **obj_table, > > > > > > cons_next); > > > > > > } while (unlikely(success =3D=3D 0)); > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > + > > > > > > /* copy in table */ > > > > > > DEQUEUE_PTRS(); > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > /* > > > > > > * If there are other dequeues in progress that precede= d > > > > > > us, @@ -746,9 +747,10 @@ __rte_ring_sc_do_dequeue(struct rte_ri= ng > > > > > > *r, > > > > > void **obj_table, > > > > > > cons_next =3D cons_head + n; > > > > > > r->cons.head =3D cons_next; > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > + > > > > > > /* copy in table */ > > > > > > DEQUEUE_PTRS(); > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > __RING_STAT_ADD(r, deq_success, n); > > > > > > r->cons.tail =3D cons_next; > > > > > > -- > > > > > > 2.9.0 > > > > > > > > > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > =3D=3D > > > > > > The information contained in this message may be privileged and > > > > > > confidential and protected from disclosure. If the reader of th= is > > > > > > message is not the intended recipient, or an employee or agent > > > > > > responsible for delivering this message to the intended recipie= nt, > > > > > > you are hereby notified that any reproduction, dissemination or > > > > > > distribution of this communication is strictly prohibited. If y= ou > > > > > > have received this communication in error, please notify us > > > > > > immediately by replying to the message and deleting it from you= r > > > computer. Thank you. > > > > > > Coriant-Tellabs > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > =3D=3D