From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 8B5AB1F3 for ; Tue, 20 Aug 2013 11:13:13 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 20 Aug 2013 02:10:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.89,918,1367996400"; d="scan'208";a="390044554" Received: from fmsmsx108.amr.corp.intel.com ([10.19.9.228]) by orsmga002.jf.intel.com with ESMTP; 20 Aug 2013 02:13:34 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by FMSMSX108.amr.corp.intel.com (10.19.9.228) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 20 Aug 2013 02:13:34 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.81]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.209]) with mapi id 14.03.0123.003; Tue, 20 Aug 2013 17:13:32 +0800 From: "Chen, Bo D" To: Olivier MATZ Thread-Topic: [dpdk-dev] A question of DPDK ring buffer Thread-Index: AQHOnV9Cm72Eh56YW0aNEo66D3yPn5mdO9kAgACN5GA= Date: Tue, 20 Aug 2013 09:13:32 +0000 Message-ID: References: <5213272C.4060101@6wind.com> In-Reply-To: <5213272C.4060101@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: dev Subject: Re: [dpdk-dev] A question of DPDK ring buffer X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Aug 2013 09:13:14 -0000 Hi Olivier, Please see my comments. do { prod_head =3D r->prod.head; cons_tail =3D r->cons.tail; prod_next =3D prod_head + n; success =3D rte_atomic32_cmpset(&r->prod.head, prod_head, prod_next); =09 /* * Why not enqueue data here? It would be just a couple of pointers assi= gnment, not taking too much time.=20 * Then the entire CAS loop contains both pointer adjustment and data en= queue, and the dequeue operation would not have a chance to interfere data = producing. * The next wait loop can be removed accordingly. /* =09 } while (unlikely(success =3D=3D 0)); /* while (unlikely(r->prod.tail !=3D prod_head)) rte_pause(); r->prod.tail =3D prod_next; */ Regards, Bob -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ Sent: Tuesday, August 20, 2013 4:22 PM To: Bob Chen Cc: dev Subject: Re: [dpdk-dev] A question of DPDK ring buffer Hello Ben, > OK, here is the question: Why DPDK has to maintain that public=20 > prod_tail structure? Is it really necessary to endure a while loop here? If you remove this wait loop, you can trigger an issue. Imagine a case wher= e core 0 wants to add an object in the ring: it does the CAS, modifying pro= d_head. At this time it is interrupted for some reason (maybe by the kernel= ) before writing the object pointer in the ring, and thus before the modifi= cation of prod_tail. During this time, core 1 wants to enqueue another object: it does the CAS, = then writes the object pointer, then modifies prod_head (without waiting th= e core 0 as we removed the wait loop). Now the state ring is wrong: it shows 2 objects, but one object pointer is = invalid. If you try to dequeue the objects, it will return an bad pointer. Of course, the interruption by the kernel should be avoided as much as poss= ible, but even without beeing interrupted, a similar scenario can occur if = a core is slower than another to enqueue its data (due to a cache miss for = instance, or because the first core enqueues more objects than the other). To convince you, I think you can remove the wait loop and run the ring test= in app/test/test_ring.c, I suppose it won't work. Regards, Olivier