From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 50F7BA2F6B for ; Tue, 8 Oct 2019 21:46:52 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 387261C06B; Tue, 8 Oct 2019 21:46:52 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 2ABD21C06B for ; Tue, 8 Oct 2019 21:46:50 +0200 (CEST) Received: from mail-vk1-f200.google.com (mail-vk1-f200.google.com [209.85.221.200]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7B252C057F88 for ; Tue, 8 Oct 2019 19:46:49 +0000 (UTC) Received: by mail-vk1-f200.google.com with SMTP id p63so8697320vkf.3 for ; Tue, 08 Oct 2019 12:46:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2I4JbgAdH9GbFD1WfoRw8xtx2PZSUR/ggastNsBpVE4=; b=pOGxMrm+yo6mS4utf2XGn4dHRsJR46Xrb4YBfhOFVJOG7Mq1o0GU8P8KWxeXpd4a+r rGaYCK77Nrzx67Ge9ydnpGYb6f9ZvWlFeBqYl13p33mVP4XdoVlziL4ig8nTKNo+R12K KR3uLnuxHYUy2lVH8bnjolMmbuFb2+FujyGgG5r7ID6kFML350NubAY4fV1aI9nIq7ws K1C8+nCx/yElQJDDGejtp8iFaMwGNkfppIM6+IRWEI8J/QgHieI6/U6ECI03DlyooG/x HbEuyTUtRTIgVQ8UMDxxJeMXktM/4E46emq+a8n57YLyTSe2PEcHS99c5f7CN3TmWmPG WXbg== X-Gm-Message-State: APjAAAWj1N78G6Jc2YNZSgnDuhHUI8I02GE6S6FtvfMJvujNDT/DIz8d 71HJ5TvaYrUtTLFbBjMOZ7Dopi7b1REF0CFt5sCJh30S/BUEC+P/GWlRGPnyEAnSbqdFbhaRvU3 9C6dd+HFwWOwfYgUi80yuEsY= X-Received: by 2002:a1f:a748:: with SMTP id q69mr42495vke.80.1570564008607; Tue, 08 Oct 2019 12:46:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqzRQimP5ss+mHO4VwgdqwaXGm0gcqA6Yjlt4jSHIC05XRPe7DtBBlyrzkvNLTgL6Bq1YmEhsnO/T6lSZnuCusk= X-Received: by 2002:a1f:a748:: with SMTP id q69mr42476vke.80.1570564008121; Tue, 08 Oct 2019 12:46:48 -0700 (PDT) MIME-Version: 1.0 References: <20191008095524.1585-1-ruifeng.wang@arm.com> In-Reply-To: From: David Marchand Date: Tue, 8 Oct 2019 21:46:37 +0200 Message-ID: To: Aaron Conole Cc: Ruifeng Wang , David Hunt , dev , hkalra@marvell.com, Gavin Hu , Honnappa Nagarahalli , nd , dpdk stable Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole wrote: > > Ruifeng Wang writes: > > > Distributor and worker threads rely on data structs in cache line > > for synchronization. The shared data structs were not protected. > > This caused deadlock issue on weaker memory ordering platforms as > > aarch64. > > Fix this issue by adding memory barriers to ensure synchronization > > among cores. > > > > Bugzilla ID: 342 > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library") > > Cc: stable@dpdk.org > > > > Signed-off-by: Ruifeng Wang > > Reviewed-by: Gavin Hu > > --- > > I see a failure in the distributor_autotest (on one of the builds): > > 64/82 DPDK:fast-tests / distributor_autotest FAIL 0.37 s (exit status 255 or signal 127 SIGinvalid) > > --- command --- > > DPDK_TEST='distributor_autotest' /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 --file-prefix=distributor_autotest > > --- stdout --- > > EAL: Probing VFIO support... > > APP: HPET is not enabled, using TSC as default timer > > RTE>>distributor_autotest > > === Basic distributor sanity tests === > > Worker 0 handled 32 packets > > Sanity test with all zero hashes done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > === testing big burst (single) === > > Sanity test of returned packets done > > === Sanity test with mbuf alloc/free (single) === > > Sanity test with mbuf alloc/free passed > > Too few cores to run worker shutdown test > > === Basic distributor sanity tests === > > Worker 0 handled 32 packets > > Sanity test with all zero hashes done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > === testing big burst (burst) === > > Sanity test of returned packets done > > === Sanity test with mbuf alloc/free (burst) === > > Line 326: Packet count is incorrect, 1048568, expected 1048576 > > Test Failed > > RTE>> > > --- stderr --- > > EAL: Detected 2 lcore(s) > > EAL: Detected 1 NUMA nodes > > EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: No available hugepages reported in hugepages-1048576kB > > ------- > > Not sure how to help debug further. I'll re-start the job to see if > it 'clears' up - but I guess there may be a delicate synchronization > somewhere that needs to be accounted. Idem, and with the same loop I used before, it can be caught quickly. # time (log=/tmp/$$.log; while true; do echo distributor_autotest |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8 -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log) [snip] RTE>>distributor_autotest EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 2MB EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 8MB === Basic distributor sanity tests === Worker 0 handled 32 packets Sanity test with all zero hashes done. Worker 0 handled 32 packets Sanity test with non-zero hashes done === testing big burst (single) === Sanity test of returned packets done === Sanity test with mbuf alloc/free (single) === Sanity test with mbuf alloc/free passed Too few cores to run worker shutdown test === Basic distributor sanity tests === Worker 0 handled 32 packets Sanity test with all zero hashes done. Worker 0 handled 32 packets Sanity test with non-zero hashes done === testing big burst (burst) === Sanity test of returned packets done === Sanity test with mbuf alloc/free (burst) === Line 326: Packet count is incorrect, 1048568, expected 1048576 Test Failed RTE>> real 0m36.668s user 1m7.293s sys 0m1.560s Could be worth running this loop on all tests? (not talking about the CI, it would be a manual effort to catch lurking issues). -- David Marchand