From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 96379558A for ; Tue, 29 Mar 2016 10:54:48 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 29 Mar 2016 01:54:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,410,1455004800"; d="scan'208";a="933755727" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.48]) by fmsmga001.fm.intel.com with SMTP; 29 Mar 2016 01:54:45 -0700 Received: by (sSMTP sendmail emulation); Tue, 29 Mar 2016 09:54:43 +0025 Date: Tue, 29 Mar 2016 09:54:43 +0100 From: Bruce Richardson To: Lazaros Koromilas Cc: Olivier Matz , dev@dpdk.org, Thomas Monjalon Message-ID: <20160329085443.GA17800@bricha3-MOBL3> References: <1458229783-15547-1-git-send-email-l@nofutznetworks.com> <56F51DCB.5020502@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH v2] ring: check for zero objects mc dequeue / mp enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Mar 2016 08:54:49 -0000 On Mon, Mar 28, 2016 at 06:48:07PM +0300, Lazaros Koromilas wrote: > Hi Olivier, > > We could have two threads (running on different cores in the general > case) that both succeed the cmpset operation. In the dequeue path, > when n == 0, then cons_next == cons_head, and cmpset will always > succeed. Now, if they both see an old r->cons.tail value from a > previous dequeue, they can get stuck in the while Hi, I don't see how threads reading an "old r->cons.tail" value is even possible. The head and tail pointers on the ring are marked in the code as volatile, so all reads and writes to those values are always done from memory and not cached in registers. No deadlock should be possible on that while loop, unless a process crashes in the middle of a ring operation. Each thread which updates the head pointer from x to y, is responsible for updating the tail pointer in a similar manner. The loop ensures the tail updates are in the same order as the head updates. If you believe deadlock is possible, can you outline the sequence of operations which would lead to such a state, because I cannot see how it could occur without a crash inside one of the threads. /Bruce > (unlikely(r->cons.tail != cons_head)) loop. I tried, however, to > reproduce (without the patch) and it seems that there is still a > window for deadlock. > > I'm pasting some debug output below that shows two processes' state. > It's two receivers doing interleaved mc_dequeue(32)/mc_dequeue(0), and > one sender doing mp_enqueue(32) on the same ring. > > gdb --args ./ring-test -l0 --proc-type=primary > gdb --args ./ring-test -l1 --proc-type=secondary > gdb --args ./ring-test -l2 --proc-type=secondary -- tx > > This is what I would usually see, process 0 and 1 both stuck at the same state: > > 663 while (unlikely(r->cons.tail != cons_head)) { > (gdb) p n > $1 = 0 > (gdb) p r->cons.tail > $2 = 576416 > (gdb) p cons_head > $3 = 576448 > (gdb) p cons_next > $4 = 576448 > > But now I managed to get the two processes stuck at this state too. > > process 0: > 663 while (unlikely(r->cons.tail != cons_head)) { > (gdb) p n > $1 = 32 > (gdb) p r->cons.tail > $2 = 254348832 > (gdb) p cons_head > $3 = 254348864 > (gdb) p cons_next > $4 = 254348896 > > proccess 1: > 663 while (unlikely(r->cons.tail != cons_head)) { > (gdb) p n > $1 = 32 > (gdb) p r->cons.tail > $2 = 254348832 > (gdb) p cons_head > $3 = 254348896 > (gdb) p cons_next > $4 = 254348928 > Where is the thread which updated the head pointer from 832 to 864? That thread was the one which would update the tail pointer to 864 to allow your thread 0 to continue. /Bruce > I haven't been able to trigger this with the patch so far, but it > should be possible. > > Lazaros. > > On Fri, Mar 25, 2016 at 1:15 PM, Olivier Matz wrote: > > Hi Lazaros, > > > > On 03/17/2016 04:49 PM, Lazaros Koromilas wrote: > >> Issuing a zero objects dequeue with a single consumer has no effect. > >> Doing so with multiple consumers, can get more than one thread to succeed > >> the compare-and-set operation and observe starvation or even deadlock in > >> the while loop that checks for preceding dequeues. The problematic piece > >> of code when n = 0: > >> > >> cons_next = cons_head + n; > >> success = rte_atomic32_cmpset(&r->cons.head, cons_head, cons_next); > >> > >> The same is possible on the enqueue path. > > > > Just a question about this patch (that has been applied). Thomas > > retitled the commit from your log message: > > > > ring: fix deadlock in zero object multi enqueue or dequeue > > http://dpdk.org/browse/dpdk/commit/?id=d0979646166e > > > > I think this patch does not fix a deadlock, or did I miss something? > > > > As explained in the following links, the ring may not perform well > > if several threads running on the same cpu use it: > > > > http://dpdk.org/ml/archives/dev/2013-November/000714.html > > http://www.dpdk.org/ml/archives/dev/2014-January/001070.html > > http://www.dpdk.org/ml/archives/dev/2014-January/001162.html > > http://dpdk.org/ml/archives/dev/2015-July/020659.html > > > > A deadlock could occur if the threads running on the same core > > have different priority. > > > > Regards, > > Olivier