From: David Marchand <david.marchand@redhat.com>
To: Aaron Conole <aconole@redhat.com>
Cc: Ruifeng Wang <ruifeng.wang@arm.com>,
David Hunt <david.hunt@intel.com>, dev <dev@dpdk.org>,
hkalra@marvell.com, Gavin Hu <gavin.hu@arm.com>,
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
nd <nd@arm.com>, dpdk stable <stable@dpdk.org>
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64
Date: Tue, 8 Oct 2019 21:46:37 +0200 [thread overview]
Message-ID: <CAJFAV8yyt7EyPV_SKLmdFrXgqwJh=J-cLZ0GoRnoEBzh_Nnj7A@mail.gmail.com> (raw)
In-Reply-To: <f7tk19f182h.fsf@dhcp-25.97.bos.redhat.com>
On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Ruifeng Wang <ruifeng.wang@arm.com> writes:
>
> > Distributor and worker threads rely on data structs in cache line
> > for synchronization. The shared data structs were not protected.
> > This caused deadlock issue on weaker memory ordering platforms as
> > aarch64.
> > Fix this issue by adding memory barriers to ensure synchronization
> > among cores.
> >
> > Bugzilla ID: 342
> > Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
>
> I see a failure in the distributor_autotest (on one of the builds):
>
> 64/82 DPDK:fast-tests / distributor_autotest FAIL 0.37 s (exit status 255 or signal 127 SIGinvalid)
>
> --- command ---
>
> DPDK_TEST='distributor_autotest' /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 --file-prefix=distributor_autotest
>
> --- stdout ---
>
> EAL: Probing VFIO support...
>
> APP: HPET is not enabled, using TSC as default timer
>
> RTE>>distributor_autotest
>
> === Basic distributor sanity tests ===
>
> Worker 0 handled 32 packets
>
> Sanity test with all zero hashes done.
>
> Worker 0 handled 32 packets
>
> Sanity test with non-zero hashes done
>
> === testing big burst (single) ===
>
> Sanity test of returned packets done
>
> === Sanity test with mbuf alloc/free (single) ===
>
> Sanity test with mbuf alloc/free passed
>
> Too few cores to run worker shutdown test
>
> === Basic distributor sanity tests ===
>
> Worker 0 handled 32 packets
>
> Sanity test with all zero hashes done.
>
> Worker 0 handled 32 packets
>
> Sanity test with non-zero hashes done
>
> === testing big burst (burst) ===
>
> Sanity test of returned packets done
>
> === Sanity test with mbuf alloc/free (burst) ===
>
> Line 326: Packet count is incorrect, 1048568, expected 1048576
>
> Test Failed
>
> RTE>>
>
> --- stderr ---
>
> EAL: Detected 2 lcore(s)
>
> EAL: Detected 1 NUMA nodes
>
> EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket
>
> EAL: Selected IOVA mode 'PA'
>
> EAL: No available hugepages reported in hugepages-1048576kB
>
> -------
>
> Not sure how to help debug further. I'll re-start the job to see if
> it 'clears' up - but I guess there may be a delicate synchronization
> somewhere that needs to be accounted.
Idem, and with the same loop I used before, it can be caught quickly.
# time (log=/tmp/$$.log; while true; do echo distributor_autotest
|taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8
-l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm
-f $log)
[snip]
RTE>>distributor_autotest
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 2MB
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 8MB
=== Basic distributor sanity tests ===
Worker 0 handled 32 packets
Sanity test with all zero hashes done.
Worker 0 handled 32 packets
Sanity test with non-zero hashes done
=== testing big burst (single) ===
Sanity test of returned packets done
=== Sanity test with mbuf alloc/free (single) ===
Sanity test with mbuf alloc/free passed
Too few cores to run worker shutdown test
=== Basic distributor sanity tests ===
Worker 0 handled 32 packets
Sanity test with all zero hashes done.
Worker 0 handled 32 packets
Sanity test with non-zero hashes done
=== testing big burst (burst) ===
Sanity test of returned packets done
=== Sanity test with mbuf alloc/free (burst) ===
Line 326: Packet count is incorrect, 1048568, expected 1048576
Test Failed
RTE>>
real 0m36.668s
user 1m7.293s
sys 0m1.560s
Could be worth running this loop on all tests? (not talking about the
CI, it would be a manual effort to catch lurking issues).
--
David Marchand
next prev parent reply other threads:[~2019-10-08 19:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-08 9:55 [dpdk-stable] " Ruifeng Wang
2019-10-08 12:53 ` Hunt, David
2019-10-08 17:05 ` [dpdk-stable] [dpdk-dev] " Aaron Conole
2019-10-08 19:46 ` David Marchand [this message]
2019-10-08 20:08 ` Aaron Conole
2019-10-09 5:52 ` Ruifeng Wang (Arm Technology China)
2019-10-17 11:42 ` [dpdk-stable] [EXT] " Harman Kalra
2019-10-17 13:48 ` Ruifeng Wang (Arm Technology China)
[not found] ` <20191012024352.23545-1-ruifeng.wang@arm.com>
2019-10-12 2:43 ` [dpdk-stable] [PATCH v2 1/2] " Ruifeng Wang
2019-10-13 2:31 ` Honnappa Nagarahalli
2019-10-14 10:00 ` Ruifeng Wang (Arm Technology China)
2019-10-12 2:43 ` [dpdk-stable] [PATCH v2 2/2] test/distributor: fix false unit test failure Ruifeng Wang
[not found] ` <20191015092826.13002-1-ruifeng.wang@arm.com>
2019-10-15 9:28 ` [dpdk-stable] [PATCH v3 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-25 8:13 ` Hunt, David
2019-10-15 9:28 ` [dpdk-stable] [PATCH v3 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-25 8:13 ` Hunt, David
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJFAV8yyt7EyPV_SKLmdFrXgqwJh=J-cLZ0GoRnoEBzh_Nnj7A@mail.gmail.com' \
--to=david.marchand@redhat.com \
--cc=aconole@redhat.com \
--cc=david.hunt@intel.com \
--cc=dev@dpdk.org \
--cc=gavin.hu@arm.com \
--cc=hkalra@marvell.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=nd@arm.com \
--cc=ruifeng.wang@arm.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).