patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Harman Kalra <hkalra@marvell.com>
To: "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>
Cc: David Marchand <david.marchand@redhat.com>,
	Aaron Conole <aconole@redhat.com>,
	David Hunt <david.hunt@intel.com>, dev <dev@dpdk.org>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>, dpdk stable <stable@dpdk.org>
Subject: Re: [dpdk-stable] [EXT] RE: [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64
Date: Thu, 17 Oct 2019 11:42:17 +0000	[thread overview]
Message-ID: <20191017114203.GA137626@outlook.office365.com> (raw)
In-Reply-To: <AM0PR08MB3986AC7C021B31BA4C2C53759E950@AM0PR08MB3986.eurprd08.prod.outlook.com>

Hi

I tested this patch, following are my observations:
1. With this patch distributor_autotest getting suspended on arm64 platform
is resolved. But continous execution of this test results in test failure,
as reported by Aaron.
2. While testing on x86 platform, still I can observe distributor_autotest
getting suspeneded(stuck) on continous execution of the test (it took almost
7-8 iterations to reproduce the suspension).

Thanks

On Wed, Oct 09, 2019 at 05:52:03AM +0000, Ruifeng Wang (Arm Technology China) wrote:
> External Email
> 
> ----------------------------------------------------------------------
> 
> > -----Original Message-----
> > From: David Marchand <david.marchand@redhat.com>
> > Sent: Wednesday, October 9, 2019 03:47
> > To: Aaron Conole <aconole@redhat.com>
> > Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; David
> > Hunt <david.hunt@intel.com>; dev <dev@dpdk.org>; hkalra@marvell.com;
> > Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; dpdk
> > stable <stable@dpdk.org>
> > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock
> > issue for aarch64
> > 
> > On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole <aconole@redhat.com> wrote:
> > >
> > > Ruifeng Wang <ruifeng.wang@arm.com> writes:
> > >
> > > > Distributor and worker threads rely on data structs in cache line
> > > > for synchronization. The shared data structs were not protected.
> > > > This caused deadlock issue on weaker memory ordering platforms as
> > > > aarch64.
> > > > Fix this issue by adding memory barriers to ensure synchronization
> > > > among cores.
> > > >
> > > > Bugzilla ID: 342
> > > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
> > >
> > > I see a failure in the distributor_autotest (on one of the builds):
> > >
> > > 64/82 DPDK:fast-tests / distributor_autotest  FAIL     0.37 s (exit status 255
> > or signal 127 SIGinvalid)
> > >
> > > --- command ---
> > >
> > > DPDK_TEST='distributor_autotest'
> > > /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1
> > > --file-prefix=distributor_autotest
> > >
> > > --- stdout ---
> > >
> > > EAL: Probing VFIO support...
> > >
> > > APP: HPET is not enabled, using TSC as default timer
> > >
> > > RTE>>distributor_autotest
> > >
> > > === Basic distributor sanity tests ===
> > >
> > > Worker 0 handled 32 packets
> > >
> > > Sanity test with all zero hashes done.
> > >
> > > Worker 0 handled 32 packets
> > >
> > > Sanity test with non-zero hashes done
> > >
> > > === testing big burst (single) ===
> > >
> > > Sanity test of returned packets done
> > >
> > > === Sanity test with mbuf alloc/free (single) ===
> > >
> > > Sanity test with mbuf alloc/free passed
> > >
> > > Too few cores to run worker shutdown test
> > >
> > > === Basic distributor sanity tests ===
> > >
> > > Worker 0 handled 32 packets
> > >
> > > Sanity test with all zero hashes done.
> > >
> > > Worker 0 handled 32 packets
> > >
> > > Sanity test with non-zero hashes done
> > >
> > > === testing big burst (burst) ===
> > >
> > > Sanity test of returned packets done
> > >
> > > === Sanity test with mbuf alloc/free (burst) ===
> > >
> > > Line 326: Packet count is incorrect, 1048568, expected 1048576
> > >
> > > Test Failed
> > >
> > > RTE>>
> > >
> > > --- stderr ---
> > >
> > > EAL: Detected 2 lcore(s)
> > >
> > > EAL: Detected 1 NUMA nodes
> > >
> > > EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket
> > >
> > > EAL: Selected IOVA mode 'PA'
> > >
> > > EAL: No available hugepages reported in hugepages-1048576kB
> > >
> > > -------
> > >
> > > Not sure how to help debug further.  I'll re-start the job to see if
> > > it 'clears' up - but I guess there may be a delicate synchronization
> > > somewhere that needs to be accounted.
> > 
> > Idem, and with the same loop I used before, it can be caught quickly.
> > 
> > # time (log=/tmp/$$.log; while true; do echo distributor_autotest
> > |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8
> > -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log)
> > 
> Thanks Aaron and David for your report. I can reproduce this issue with the script.
> Will fix it in next version.
> 
> > [snip]
> > 
> > RTE>>distributor_autotest
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> > EAL: request: mp_malloc_sync
> > EAL: Heap on socket 0 was expanded by 2MB
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> > EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> > EAL: request: mp_malloc_sync
> > EAL: Heap on socket 0 was expanded by 8MB === Basic distributor sanity
> > tests === Worker 0 handled 32 packets Sanity test with all zero hashes done.
> > Worker 0 handled 32 packets
> > Sanity test with non-zero hashes done
> > === testing big burst (single) ===
> > Sanity test of returned packets done
> > 
> > === Sanity test with mbuf alloc/free (single) === Sanity test with mbuf
> > alloc/free passed
> > 
> > Too few cores to run worker shutdown test === Basic distributor sanity tests
> > === Worker 0 handled 32 packets Sanity test with all zero hashes done.
> > Worker 0 handled 32 packets
> > Sanity test with non-zero hashes done
> > === testing big burst (burst) ===
> > Sanity test of returned packets done
> > 
> > === Sanity test with mbuf alloc/free (burst) === Line 326: Packet count is
> > incorrect, 1048568, expected 1048576 Test Failed
> > RTE>>
> > real    0m36.668s
> > user    1m7.293s
> > sys    0m1.560s
> > 
> > Could be worth running this loop on all tests? (not talking about the CI, it
> > would be a manual effort to catch lurking issues).
> > 
> > 
> > --
> > David Marchand

  reply	other threads:[~2019-10-17 11:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:55 [dpdk-stable] " Ruifeng Wang
2019-10-08 12:53 ` Hunt, David
2019-10-08 17:05 ` [dpdk-stable] [dpdk-dev] " Aaron Conole
2019-10-08 19:46   ` David Marchand
2019-10-08 20:08     ` Aaron Conole
2019-10-09  5:52     ` Ruifeng Wang (Arm Technology China)
2019-10-17 11:42       ` Harman Kalra [this message]
2019-10-17 13:48         ` [dpdk-stable] [EXT] " Ruifeng Wang (Arm Technology China)
     [not found] ` <20191012024352.23545-1-ruifeng.wang@arm.com>
2019-10-12  2:43   ` [dpdk-stable] [PATCH v2 1/2] " Ruifeng Wang
2019-10-13  2:31     ` Honnappa Nagarahalli
2019-10-14 10:00       ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43   ` [dpdk-stable] [PATCH v2 2/2] test/distributor: fix false unit test failure Ruifeng Wang
     [not found] ` <20191015092826.13002-1-ruifeng.wang@arm.com>
2019-10-15  9:28   ` [dpdk-stable] [PATCH v3 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-15  9:28   ` [dpdk-stable] [PATCH v3 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-25  8:13     ` Hunt, David

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191017114203.GA137626@outlook.office365.com \
    --to=hkalra@marvell.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=aconole@redhat.com \
    --cc=david.hunt@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).