From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C6A00A00E6 for ; Wed, 10 Jul 2019 16:43:37 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 83E1A1B945; Wed, 10 Jul 2019 16:43:36 +0200 (CEST) Received: from inbox.dpdk.org (xvm-172-178.dc0.ghst.net [95.142.172.178]) by dpdk.org (Postfix) with ESMTP id C01582BF4 for ; Wed, 10 Jul 2019 16:43:35 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id 788C1A0613; Wed, 10 Jul 2019 16:43:35 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Date: Wed, 10 Jul 2019 14:43:35 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: other X-Bugzilla-Version: 19.08 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: msantana@redhat.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 Subject: [dpdk-dev] [Bug 316] livelock causes librte_distributor unit test to hang X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" https://bugs.dpdk.org/show_bug.cgi?id=3D316 Bug ID: 316 Summary: livelock causes librte_distributor unit test to hang Product: DPDK Version: 19.08 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: other Assignee: dev@dpdk.org Reporter: msantana@redhat.com Target Milestone: --- Issue first encountered in Travis CI.=20 The meson unit test distributor_autotest randomly times out. Normally this = test finishes in less than 1/2 a second, so running to 10 seconds and timing ou= t is a big jump in run time. I was able to reproduce by running: `while sudo sh -c "echo 'distributor_autotest' | ./build/app/test/dpdk-test= "; do :; done` It runs a couple of times fine showing output and showing progress, but the= n at some point after a couple of seconds it just stops - no longer getting any output. It just sits there with no further output. I let it sit there for a whole minute and nothing happens. So I attach gdb to try to figure out what= is happening. One thread seems to be stuck on a while loop, see lib/librte_distributor/rte_distributor.c:310. I looked at the assembly code (layout asm, ni) and I saw these four lines b= elow (which correspond to the while loop) being executed repeatedly and indefinitely. It looks like this thread is waiting for the variable bufptr6= 4[0] to change state. 0xa064d0 pause 0xa064d2 mov 0x3840(%rdx),%rax 0xa064d9 test $0x1,%al 0xa064db je 0xa064d0 While the first thread is waiting on bufptr64[0] to change state, there is another thread that is also stuck on another while loop on lib/librte_distributor/rte_distributor.c:53. It seems that this thread is s= tuck waiting for retptr64 to change state. Corresponding assembly being executed indefinitely: 0xa06de0 mov 0x38c0(%r8),%rax 0xa06de7 test $0x1,%al 0xa06de9 je 0xa06bbd 0xa06def nop 0xa06df0 pause 0xa06df2 rdtsc 0xa06df4 mov %rdx,%r10 0xa06df7 shl $0x20,%r10 0xa06dfb mov %eax,%eax 0xa06dfd or %r10,%rax 0xa06e00 lea 0x64(%rax),%r10 0xa06e04 jmp 0xa06e12 0xa06e06 nopw %cs:0x0(%rax,%rax,1) 0xa06e10 pause 0xa06e12 rdtsc 0xa06e14 shl $0x20,%rdx 0xa06e18 mov %eax,%eax 0xa06e1a or %rdx,%rax 0xa06e1d cmp %rax,%r10 0xa06e20 ja 0xa06e10 0xa06e22 jmp 0xa06de0 My guess is that these threads are interdependent, so one thread is waiting= for the other thread to change the state of the control variable. I can't say f= or sure if this is what is happening or why the these variables don't change state, so I would like ask someone who is more familiar with this particular code to take a look --=20 You are receiving this mail because: You are the assignee for the bug.=