From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0053.outbound.protection.outlook.com [104.47.0.53]) by dpdk.org (Postfix) with ESMTP id 763952B98 for ; Wed, 13 Jul 2016 07:27:22 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=coriant.onmicrosoft.com; s=selector1-coriant-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=cnYnUYJycLlOnj28Fo6q1cYjc8aIQdbLOr62UqHwxYk=; b=btJ5tyj8bpJB7dHw39avNVKRs2XsKuOqGi+sYVaErFwhvzeVYH5elQNx6FO/At5LSweaA1M3xnzuULTIW3fdw2qZ7NV7jlO78Uf6zknx5sa587feTprzt1zyg660CIcYZqsn84FNoNQkihbN4PQAqMvZlQFkpiNmHiRW4s4JPHg= Received: from HE1PR04MB1337.eurprd04.prod.outlook.com (10.163.175.23) by HE1PR04MB1338.eurprd04.prod.outlook.com (10.163.175.24) with Microsoft SMTP Server (TLS) id 15.1.539.14; Wed, 13 Jul 2016 05:27:20 +0000 Received: from HE1PR04MB1337.eurprd04.prod.outlook.com ([10.163.175.23]) by HE1PR04MB1337.eurprd04.prod.outlook.com ([10.163.175.23]) with mapi id 15.01.0539.019; Wed, 13 Jul 2016 05:27:21 +0000 From: "Kuusisaari, Juhamatti" To: "Ananyev, Konstantin" , "'dev@dpdk.org'" Thread-Topic: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location Thread-Index: AQHR214MoG1fIZBOPEC9vEAt0w2bq6ATCsYAgAAIC6CAABd0gIABBgWwgAByaYCAAHRrAIAAs/QA Date: Wed, 13 Jul 2016 05:27:20 +0000 Message-ID: References: <20160711102055.14855-1-juhamatti.kuusisaari@coriant.com> <2601191342CEEE43887BDE71AB97725836B7C858@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7C916@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D20B@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725836B7D628@irsmsx105.ger.corp.intel.com> Accept-Language: fi-FI, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Juhamatti.Kuusisaari@coriant.com; x-originating-ip: [138.111.130.175] x-ms-office365-filtering-correlation-id: f1d185b1-03eb-4735-5de3-08d3aade5d1f x-microsoft-exchange-diagnostics: 1; HE1PR04MB1338; 6:fWl/wFMa7CJwLx3V0AY+hiwRCPFusBtrLqD7qyG9w7gq129S73rlMpCvX27l+B+jgwfaiOy+mzb1gqwwmKe7w1eXVoXsIhGCU25cnv0tEf+GwxcwvfsSTuvfV3FYCZxKXwN4V1Uz2xPbWIFSZ3BNX8coj6quznVO/IwexgSJgstx4S7KMzb+s5TTEtPzgP+gDFJr68ThaXntirHKNDm48KYbS0VPyFrGrqiNdqhCSIOOlvprRJMNJ+MqfEUDZbSnYoBwCe0tyLTmXyH7QVzUwzmD1cZIroDDPdZBceMWZGx7PhL/LG+Sf7juuz5OYyOwSPHVS9Lwa103icO6fGP8tQ==; 5:+VDbWPxNRmXrGvAfbxMncVEo+/aJ+1YjHK9tTuA0SCHaPUl13c9vxpGnFgeHZGi/H8rx5J2pfGyeqQF0JTA5gAAi2WHCAhGatS4obRm5ESuQM0+RIInKcs4SmXSy02pd00ITnRwc1sykyEuUt+kYKw==; 24:V8/m9hV4OSvv+uQ/9OPL732XzlM3XBZ/uda61CzotbpUKmfody3KRDLc5LsO5g7cKKJrfqG9H2OEjqMDU9Fe61if0/ue3VWnz2eDW5w/Kok=; 7:p5Ue3hJZbeWkylxLlZoaqjHdV1IZjcMYxVeRd2j5AghhylR898e4gL6NV8qrRiFAYqkzLBSgh5QCkwxBm1ReDR+Z6i18Irfz6L9RR7Ke5VtFwqIQMWPrD+Cpf/sJNzfvkRfTs0gztQ1bHW6CrdNqqMYFoCrh6Kv8wfDYVcgp5Xpxbq2sMseCpTXfwzQrAVnoMPPZJmGsoi3q9Swv5Hh5kRjGOEnqQeo4NJ4YTwiU5bJzM5mnyrxGTePkm85Z5pmw x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR04MB1338; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(51653755401839); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026); SRVR:HE1PR04MB1338; BCL:0; PCL:0; RULEID:; SRVR:HE1PR04MB1338; x-forefront-prvs: 000227DA0C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(52044002)(13464003)(199003)(189002)(377454003)(189998001)(105586002)(2906002)(54356999)(107886002)(76176999)(122556002)(50986999)(93886004)(101416001)(3660700001)(586003)(76576001)(87936001)(10400500002)(77096005)(3280700002)(6116002)(102836003)(3846002)(2900100001)(81156014)(81166006)(19580395003)(19580405001)(5001770100001)(2950100001)(86362001)(8936002)(66066001)(68736007)(97736004)(74316002)(92566002)(106116001)(106356001)(33656002)(9686002)(5002640100001)(7696003)(5003600100003)(7846002)(7736002)(305945005)(491001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR04MB1338; H:HE1PR04MB1337.eurprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: coriant.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: coriant.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jul 2016 05:27:20.8988 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 76595477-907e-4695-988b-a6b39087332d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR04MB1338 Subject: Re: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jul 2016 05:27:22 -0000 Hello, > > Hi Juhamatti, > > > > > > > > Hello, > > > > > > > > > > -----Original Message----- > > > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of > > > > > > > Juhamatti Kuusisaari > > > > > > > Sent: Monday, July 11, 2016 11:21 AM > > > > > > > To: dev@dpdk.org > > > > > > > Subject: [dpdk-dev] [PATCH] lib: move rte_ring read barrier > > > > > > > to correct location > > > > > > > > > > > > > > Fix the location of the rte_ring data dependency read barrier= . > > > > > > > It needs to be called before accessing indexed data to > > > > > > > ensure that the data itself is guaranteed to be correctly upd= ated. > > > > > > > > > > > > > > See more details at kernel/Documentation/memory-barriers.txt > > > > > > > section 'Data dependency barriers'. > > > > > > > > > > > > > > > > > > Any explanation why? > > > > > > From my point smp_rmb()s are on the proper places here :) > > > > > > Konstantin > > > > > > > > > > The problem here is that on a weak memory model system the CPU > > > > > is allowed to load the address data out-of-order in advance. > > > > > If the read barrier is after the DEQUEUE, you might end up > > > > > having the old data there on a race situation when the buffer is > continuously full. > > > > > Having it before the DEQUEUE guarantees that the load is not > > > > > done in advance. > > > > > > > > Sorry, still didn't see any race condition in the current code. > > > > Can you provide any particular example? > > > > From other side, moving smp_rmb() before dequeueing the objects, > > > > could introduce a race condition, on cpus where later writes can > > > > be reordered with earlier reads. > > > > > > Here is a simplified example sequence from time perspective: > > > 1. Consumer CPU (CCPU) loads value y from r->ring[x] out-of-order > > > (the key of the problem) > > > > To read the value of ring[x] cpu has to calculate x first. > > And to calculate x it needs to read cons.head and prod.tail first. > > Are you saying that some modern cpu can: > > -'speculate' value of cons.head and prod.tail > > (based on what?) > > -calculate x based on these speculated values. > > - read ring[x] > > - read cons.head and prod.tail > > - if read values are not equal to speculated ones , then > > re-caluclate x and re-read ring[x] > > - else use speculatively read ring[x] > > ? > > If such thing is possible (is it really? and if yes on which cpu?), >=20 > As I can see, neither ARM or PPC support such things. > Both of them do obey address dependency. > (ARM & PPC guys feel free to correct me here, if I am wrong here). > So what cpu we are talking about? I checked that too, indeed the problem I described seems to be more academi= c=20 than even theoretical and does not apply to current CPUs. So I agree here a= nd=20 this makes this patch unneeded, I'll withdraw it. However, the implementati= on=20 may still have another issue, see below. > > then yes, we might need an extra smp_rmb() before DEQUEUE_PTRS() for > > __rte_ring_sc_do_dequeue(). > > For __rte_ring_mc_do_dequeue(), I think we are ok, as there is CAS > > just before DEQUEUE_PTRS(). > > > > > 2. Producer CPU (PCPU) updates r->ring[x] to value be z 3. PCPU > > > updates prod_tail to be x 4. CCPU updates cons_head to be x 5. CCPU > > > loads r->ring[x] by using out-of-order loaded value y [is z in > > > reality] > > > > > > The problem here is that on weak memory model, the CCPU is allowed > > > to load > > > r->ring[x] value in advance, if it decides to do so (CCPU needs to > > > r->be able to see > > > in advance that x will be an interesting index worth loading). The > > > index value x is updated atomically, but it does not matter here. > > > Also, the write barrier on PCPU side guarantees that CCPU cannot see > > > update of x before PCPU has really updated the r->ring[x] to z and > > > moved the tail, but still allows to do the out-of-order loads without > proper read barrier. > > > > > > When the read barrier is moved between steps 4 and 5, it disallows > > > to use any out-of-order loads so far and forces to drop r->ring[x] y > > > value and load current value z. > > > > > > The ring queue appears to work well as this is a rare corner case. > > > Due to the head,tail-structure the problem needs queue to be full > > > and also CCPU needs to see r->ring[x] update later than it does the > > > out-of-order load. In addition, the HW needs to be able to predict > > > and choose the load to the future index (which should be quite > > > possible, considering modern CPUs). If you have seen in the past > > > problems and noticed that a larger ring queue works better as a > workaround, you may have encountered the problem already. > > > > I don't understand what means 'larger rings works better' here. > > What we are talking about is race condition, that if hit, would cause > > data corruption and most likely a crash. The larger ring queue length makes the problem more infrequent as the queue= =20 has more available free space and the problem does not occur without queue being full. The symptoms apply to a new problem I describe below too. > > > > > > It is quite safe to move the barrier before DEQUEUE because after > > > the DEQUEUE there is nothing really that we would want to protect wit= h a > read barrier. > > > > I don't think so. > > If you remove barrier after DEQUEUE(), that means on systems with > > relaxed memory ordering cons.tail could be updated before DEQUEUE() > > will be finished and producer can overwrite queue entries that were not > yet dequeued. > > So if cpu can really do such speculative out of order loads, then we > > do need for __rte_ring_sc_do_dequeue() something like: > > > > rte_smp_rmb(); > > DEQUEUE_PTRS(); > > rte_smp_rmb(); You have a valid point here, there needs to be a guarantee that cons_tail c= annot=20 be updated before DEQUEUE is completed. Nevertheless, my point was that it = is=20 not guaranteed with a read barrier anyway. The implementation has the follo= wing=20 sequence=20 DEQUEUE_PTRS(); (i.e. READ/LOAD) rte_smp_rmb(); .. r->cons.tail =3D cons_next; (i.e WRITE/STORE) Above read barrier does not guarantee any ordering for the following writes= /stores.=20 As a guarantee is needed, I think we in fact need to change the read barrie= r on the=20 dequeue to a full barrier, which guarantees the read+write order, as follow= s=20 DEQUEUE_PTRS();=20 rte_smp_mb(); .. r->cons.tail =3D cons_next; If you agree, I can for sure prepare another patch for this issue. Thanks, -- Juhamatti > > Konstantin > > > > > The read > > > barrier is mapped to a compiler barrier on strong memory model > > > systems and this works fine too as the order of the head,tail > > > updates is still guaranteed on the new location. Even if the problem > > > would be theoretical on most systems, it is worth fixing as the risk = for > problems is very low. > > > > > > -- > > > Juhamatti > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Juhamatti Kuusisaari > > > > > > > > > > > > > > --- > > > > > > > lib/librte_ring/rte_ring.h | 6 ++++-- > > > > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h > > > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49 100644 > > > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct > > > > > > > rte_ring *r, > > > > > > void **obj_table, > > > > > > > cons_next); > > > > > > > } while (unlikely(success =3D=3D 0)); > > > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > > + > > > > > > > /* copy in table */ > > > > > > > DEQUEUE_PTRS(); > > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > > > /* > > > > > > > * If there are other dequeues in progress that > > > > > > > preceded us, @@ -746,9 +747,10 @@ > > > > > > > __rte_ring_sc_do_dequeue(struct rte_ring *r, > > > > > > void **obj_table, > > > > > > > cons_next =3D cons_head + n; > > > > > > > r->cons.head =3D cons_next; > > > > > > > > > > > > > > + rte_smp_rmb(); > > > > > > > + > > > > > > > /* copy in table */ > > > > > > > DEQUEUE_PTRS(); > > > > > > > - rte_smp_rmb(); > > > > > > > > > > > > > > __RING_STAT_ADD(r, deq_success, n); > > > > > > > r->cons.tail =3D cons_next; > > > > > > > -- > > > > > > > 2.9.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > =3D=3D > > > > > > > The information contained in this message may be privileged > > > > > > > and confidential and protected from disclosure. If the > > > > > > > reader of this message is not the intended recipient, or an > > > > > > > employee or agent responsible for delivering this message to > > > > > > > the intended recipient, you are hereby notified that any > > > > > > > reproduction, dissemination or distribution of this > > > > > > > communication is strictly prohibited. If you have received > > > > > > > this communication in error, please notify us immediately by > > > > > > > replying to the message and deleting it from your > > > > computer. Thank you. > > > > > > > Coriant-Tellabs > > > > > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > =3D=3D