From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) by dpdk.org (Postfix) with ESMTP id E0810376E for ; Tue, 29 Mar 2016 19:35:29 +0200 (CEST) Received: by mail-io0-f172.google.com with SMTP id g185so31965876ioa.2 for ; Tue, 29 Mar 2016 10:35:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nofutznetworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=MGD03TiGjYm/uS4NlUquxi9MgSBpI5sCCnyp5foKju8=; b=tOij7TMSXFivP8j7ipRjiDnE18p/E8b77boVvatUby0rWBrMHD3xHK8VxxssOkm8IV hX9tEUsf+wyY+Ef0IwO1phtLXRjEiUU18uF/ZqG7+fTTB79NzfQ+oIwNkNX2cLS2iOn6 wPr7ZPHHwd9W9CvzE32s73NqKPrnOJlGmj7740T7V866GOMiNtD/ESDqwCs1GQBiESh+ aXZVbyLL04I547to51NDG1RT8w2n6neEhRpKB9Ecc5LS4w6GB7uGyF4OpPLiFIbp7m/R TOqYb/feoqe5KtHNJoBinD+XopI55BRglB0BAyJdHFbp4aP+KhwCaIPVIBn1bZ4sID2b 7dhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=MGD03TiGjYm/uS4NlUquxi9MgSBpI5sCCnyp5foKju8=; b=f+Foap9NJqkKTPslk4xTfiMf/otj9asTStqHlRGtbCLB0HvivNZj+HrMmcTF6AUQGb 0lcr3MRgszy4ofkdojrr/QVmjfSOl1jOhDRSOWq4uv4eheGhyNi38BJU2M4HPb8YQFK6 bXWklnN8luLXLUC2aEkQ74y8YwOyJO1VACF+afcgITEyOHXXsyvCnMAtNctiFsXVoCav KumwWUV46VUX6YBDIumKtwEN3RpZ7vT50NAr/GVQt+ZuFXbnYP7mDE6VlBpnmemeiysy yiCbq+/gR+QBjmXgCDun+OwIhN/h9CQPtJOvrL0qkskgod/aoGydRhS2bSYjw4nTIwui dOzQ== X-Gm-Message-State: AD7BkJKJW4Ity47FnzmE1uEwsOx827nfJjKMFeTmPS9pzg75GXj7sFEulDiql2M9VpLOCJYv5UTasrZg8ULy9A== MIME-Version: 1.0 X-Received: by 10.157.56.78 with SMTP id r14mr2158221otd.122.1459272929347; Tue, 29 Mar 2016 10:35:29 -0700 (PDT) Received: by 10.157.43.10 with HTTP; Tue, 29 Mar 2016 10:35:29 -0700 (PDT) In-Reply-To: <20160329160442.GB15912@bricha3-MOBL3> References: <1458229783-15547-1-git-send-email-l@nofutznetworks.com> <56F51DCB.5020502@6wind.com> <20160329085443.GA17800@bricha3-MOBL3> <56FA9F48.3030509@6wind.com> <20160329160442.GB15912@bricha3-MOBL3> Date: Tue, 29 Mar 2016 20:35:29 +0300 Message-ID: From: Lazaros Koromilas To: Bruce Richardson Cc: Olivier MATZ , dev@dpdk.org, Thomas Monjalon Content-Type: text/plain; charset=UTF-8 Subject: Re: [dpdk-dev] [PATCH v2] ring: check for zero objects mc dequeue / mp enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Mar 2016 17:35:30 -0000 On Tue, Mar 29, 2016 at 7:04 PM, Bruce Richardson wrote: > > On Tue, Mar 29, 2016 at 05:29:12PM +0200, Olivier MATZ wrote: > > Hi, > > > > > > On 03/29/2016 10:54 AM, Bruce Richardson wrote: > > >On Mon, Mar 28, 2016 at 06:48:07PM +0300, Lazaros Koromilas wrote: > > >>Hi Olivier, > > >> > > >>We could have two threads (running on different cores in the general > > >>case) that both succeed the cmpset operation. In the dequeue path, > > >>when n == 0, then cons_next == cons_head, and cmpset will always > > >>succeed. Now, if they both see an old r->cons.tail value from a > > >>previous dequeue, they can get stuck in the while > > > > > >Hi, > > > > > >I don't see how threads reading an "old r->cons.tail" value is even possible. > > >The head and tail pointers on the ring are marked in the code as volatile, so > > >all reads and writes to those values are always done from memory and not cached > > >in registers. No deadlock should be possible on that while loop, unless a > > >process crashes in the middle of a ring operation. Each thread which updates > > >the head pointer from x to y, is responsible for updating the tail pointer in > > >a similar manner. The loop ensures the tail updates are in the same order as the > > >head updates. > > > > > >If you believe deadlock is possible, can you outline the sequence of operations > > >which would lead to such a state, because I cannot see how it could occur without > > >a crash inside one of the threads. > > > > I think the deadlock Lazaros describes could occur in the following > > condition: > > > > current ring state > > r->prod.head = 0 > > r->prod.tail = 0 > > > > core 0 core 1 > > ==================================================================== > > enqueue 0 object > > cmpset(&r->prod.head, 0, 0) > > core 0 is interrupted here > > enqueue 1 object > > cmpset(&r->prod.head, 0, 1) > > copy the objects in box 0 > > while (r->prod.tail != prod_head)) > > r->prod.tail = prod_next > > copy 0 object (-> nothing to do) > > while (r->prod.tail != prod_head)) > > > > > > > > I think this issue is indeed fixed by Lazaros' patch (I missed it > > in previous review). However, I don't think this deadlock could > > happen once we avoided the (n == 0) case. > > > Yes, I agree with your scenario. However, I thought the issue was occuring with > non-zero updates, but I may well be mistaken. > If it's fixed now, all good... :-) > > /Bruce Hi all, Bruce, I thought it could be still possible because of my threads being stuck inside two dequeue(32) calls. But haven't been able to reproduce it with non-zero dequeues. And by trying to find a scenario of my own, it seems that at least one dequeue(0) is needed, similarly to Olivier's example on the enqueue path. Thanks, Lazaros.