From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8E190A317C for ; Thu, 17 Oct 2019 15:49:11 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 81C1A1E8F3; Thu, 17 Oct 2019 15:49:11 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150057.outbound.protection.outlook.com [40.107.15.57]) by dpdk.org (Postfix) with ESMTP id CFC421E8F3; Thu, 17 Oct 2019 15:49:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zBp+ug59q2hT/ckNM0QsZavcMGOJx8Dp0HIjF4j7SsA=; b=0Vi2m7PiYtzvGiG5on9/oEc31DwtJzdFVE4qJhuM3xKWsL3JDFA1Lq/OzPe5GT16nKIsBRaJxZ5zQUNfTLYJp251JZvJsS4ZXI6y2r5nCwARHc16ykjRpR8gOSNm4Gt0sq56e2rXVScNDhgVE+YeSy2khRHQ428CBCJ1rHLyBCs= Received: from AM4PR08CA0075.eurprd08.prod.outlook.com (2603:10a6:205:2::46) by AM0PR08MB5123.eurprd08.prod.outlook.com (2603:10a6:208:15b::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.16; Thu, 17 Oct 2019 13:49:07 +0000 Received: from VE1EUR03FT017.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e09::207) by AM4PR08CA0075.outlook.office365.com (2603:10a6:205:2::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.21 via Frontend Transport; Thu, 17 Oct 2019 13:49:07 +0000 Authentication-Results: spf=temperror (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dpdk.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dpdk.org; dmarc=none action=none header.from=arm.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of arm.com: DNS Timeout) Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT017.mail.protection.outlook.com (10.152.18.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2305.15 via Frontend Transport; Thu, 17 Oct 2019 13:49:05 +0000 Received: ("Tessian outbound 851a1162fca7:v33"); Thu, 17 Oct 2019 13:48:59 +0000 X-CR-MTA-TID: 64aa7808 Received: from 857e3fde7560.2 (ip-172-16-0-2.eu-west-1.compute.internal [104.47.12.52]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id 76207357-8334-41EB-B79C-4F909B7544D0.1; Thu, 17 Oct 2019 13:48:54 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04lp2052.outbound.protection.outlook.com [104.47.12.52]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 857e3fde7560.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 17 Oct 2019 13:48:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bacY4aGsn5m/rDg/6dQ5Dshk7LqlYyk3QwWjmbg4HJnjuhFs7MJJI+pIVbvhU2F+Rp9vVNyrYF/HextSvnqMrQlk0JUa4Fa04RHoPFcqYZxIvFP7pszz+3dFRMsYx5xoj1jU57Jxs3Vte/fhPzyUDivmhacxTfKIW/Bjhu+JzU5rtP8FbJlh9a3BYSI/yUIz81q1BTd4wsH6N/0rgmgJdlqogeS2KOqQ2sWAuodYvgbepyzORI8gpry0UVn+TcHQ9Et+stCDNl7IOEe9SUkunJ+qdflxuzgByvyqgfefYLRDYR0a40j3uhC5bw5+ip/SF7Tgw/KSaNJowXMA16UMVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zBp+ug59q2hT/ckNM0QsZavcMGOJx8Dp0HIjF4j7SsA=; b=iT5IlTJfl6kDFzOXDouEcmrRmzNFisYtzO1TS92/i1OBSbGR9kq25szgUQHZ7aO1e2SnwxqYUK5rLVWN/lrIo8Kg5Jkczl39c2gdhsXRE6soXobfxiDyZM9ZZe+Bz+Il4OfpMAVpfXBPpue66pdPrj3c6frCbaKsGMC3ZZPMA9LNIWii0YE0RVOzRPNfPOHdOWU9mWeZR71iRvy8G0p5qZ+kgHEAKIjfJXYctpEhaHLhHnDXSpsjAtnZ1W5RuF88w1I/j7Y0a9BT5OvrIhMmdX+gtfcpbjIf7yn292MzKh0FLr174W8vg3XJ4nlWXBOpDDUcbupjSxER26UP2NXrQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zBp+ug59q2hT/ckNM0QsZavcMGOJx8Dp0HIjF4j7SsA=; b=0Vi2m7PiYtzvGiG5on9/oEc31DwtJzdFVE4qJhuM3xKWsL3JDFA1Lq/OzPe5GT16nKIsBRaJxZ5zQUNfTLYJp251JZvJsS4ZXI6y2r5nCwARHc16ykjRpR8gOSNm4Gt0sq56e2rXVScNDhgVE+YeSy2khRHQ428CBCJ1rHLyBCs= Received: from AM0PR08MB3986.eurprd08.prod.outlook.com (20.178.118.90) by AM0PR08MB3779.eurprd08.prod.outlook.com (20.178.22.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.16; Thu, 17 Oct 2019 13:48:53 +0000 Received: from AM0PR08MB3986.eurprd08.prod.outlook.com ([fe80::8106:8373:8559:3c07]) by AM0PR08MB3986.eurprd08.prod.outlook.com ([fe80::8106:8373:8559:3c07%7]) with mapi id 15.20.2347.023; Thu, 17 Oct 2019 13:48:53 +0000 From: "Ruifeng Wang (Arm Technology China)" To: Harman Kalra CC: David Marchand , Aaron Conole , David Hunt , dev , "Gavin Hu (Arm Technology China)" , Honnappa Nagarahalli , nd , dpdk stable , nd Thread-Topic: [EXT] RE: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 Thread-Index: AQHVfb6XHhOYGpfwykm4YyVRfbk1/KdQ+cXCgAAs74CAAKaUYIAM9xWAgAAiBCA= Date: Thu, 17 Oct 2019 13:48:52 +0000 Message-ID: References: <20191008095524.1585-1-ruifeng.wang@arm.com> <20191017114203.GA137626@outlook.office365.com> In-Reply-To: <20191017114203.GA137626@outlook.office365.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: e2e543df-4ea1-4100-9188-d4f95d953e21.0 x-checkrecipientchecked: true Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Ruifeng.Wang@arm.com; x-originating-ip: [113.29.88.7] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 2b9629ef-4cf8-4a05-7e49-08d75308c6ab X-MS-Office365-Filtering-HT: Tenant X-MS-TrafficTypeDiagnostic: AM0PR08MB3779:|AM0PR08MB3779:|AM0PR08MB5123: X-MS-Exchange-PUrlCount: 1 x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:5516;OLM:5516; x-forefront-prvs: 01930B2BA8 X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(4636009)(346002)(376002)(39860400002)(396003)(366004)(136003)(13464003)(199004)(189003)(54906003)(478600001)(5660300002)(186003)(14454004)(6916009)(316002)(71190400001)(71200400001)(2906002)(966005)(256004)(86362001)(14444005)(25786009)(3846002)(6116002)(4326008)(8936002)(8676002)(305945005)(99286004)(74316002)(11346002)(76176011)(7736002)(66946007)(64756008)(6506007)(66476007)(66066001)(81166006)(81156014)(26005)(102836004)(53546011)(55236004)(52536014)(6246003)(9686003)(486006)(76116006)(33656002)(476003)(446003)(66446008)(229853002)(55016002)(6436002)(6306002)(7696005)(66556008); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB3779; H:AM0PR08MB3986.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: UUyr8oUcC4jx4oV24wFG2eiHJPmnx01qV/FuM2Nma7AXPB1k/dxhIPOaHU3ugTk9lw3tuyppEqPYNgdAV62J7ODNcRtY7AyHD5cOvmujeYGUIwXmoajVr/QwlyXBt/SKKl6C2hCh3d62tBlq1IVKHD7/f37wqVqxhdHtpFQaBdyjGhbAAB66yZxL1edK0SsflLzYBNuomLkLCyC6F9tMifvwwX6buYY0T6aZbIdkyJAOyysN/N0UiZzsGt2MZF9Cs9AjiEjgLhG3t67TBfLxcrvMLukKTyxzUjLp9hOToLxd9KE8skLB8LNxHpA2TKWXToEb9BxcBfHNHpmrmiSQQ8NvpvpQH9r1FjINQ5Br37NVsFcp8s5TeTQTOlee5rpGRO4qvCB39mX1HST7hupl4RZUXhRCAN/2m/KTIYMjY5OSRlxDa6oczbfJdZ2mJ/gtMl28HH9FjQxVXWCLFnOL+Q== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB3779 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Ruifeng.Wang@arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT017.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; IPV:CAL; SCL:-1; CTRY:IE; EFV:NLI; SFV:NSPM; SFS:(10009020)(4636009)(136003)(376002)(396003)(39860400002)(346002)(13464003)(189003)(199004)(81156014)(22756006)(478600001)(305945005)(4326008)(446003)(47776003)(14444005)(11346002)(8676002)(102836004)(316002)(5660300002)(6116002)(14454004)(70206006)(126002)(476003)(6246003)(81166006)(486006)(26826003)(74316002)(3846002)(86362001)(336012)(6862004)(63350400001)(52536014)(966005)(23726003)(66066001)(50466002)(97756001)(70586007)(36906005)(26005)(7736002)(186003)(99286004)(450100002)(46406003)(76176011)(7696005)(54906003)(33656002)(356004)(76130400001)(6506007)(53546011)(229853002)(2906002)(6306002)(55016002)(9686003)(8936002)(8746002)(25786009); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB5123; H:64aa7808-outbound-1.mta.getcheckrecipient.com; FPR:; SPF:TempError; LANG:en; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; MX:1; A:1; X-MS-Office365-Filtering-Correlation-Id-Prvs: bedc1d0c-0ac7-4218-ede4-08d75308bf68 NoDisclaimer: True X-Forefront-PRVS: 01930B2BA8 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +gIKooxSrQXJv4WGTYde15f+OKRbGHjK1C97WTi/VzToZwP/MGujIlRdZHyMW4gVdAcaAPdBDWe6Zvcco3zRlzD+b69g2ByOeZCkyCU5y88IwV2d+EE/x3d1XSgm7/DUX6/efs+14eKL5i9nZxZ1QYzT6NnmnTwfHoK6gTR1RVKML3pk6aYiGEFcbAGHzU810SUQquQEDyOCLdYamvg1fwofwNJgwAKN/xGDomlmfC2xiSMefxyQ1a9Ss8v5D3khFf16kbmC4AjtXYUBfjUPAF2Neo/cnL6JJvp1F3+r0FMFB1qFbndVPk8cpMBK5MthF2m1lBUhZzM6SycOhITiR7E3l/JTkvlqHvAA7/J8OzyR3xQQVqoCWSrT6jcRsrvtWDPMc3Z9Zb5Rhp7pLFBqO4rzFnwep7tY4l2gZSbw7GVCWi9Z5HzZYumJAogFDgqjtF3M9CvMYgQuaX//j4tKBA== X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Oct 2019 13:49:05.2224 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2b9629ef-4cf8-4a05-7e49-08d75308c6ab X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5123 Subject: Re: [dpdk-stable] [EXT] RE: [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi Harman, Thank you for testing this. > -----Original Message----- > From: Harman Kalra > Sent: Thursday, October 17, 2019 19:42 > To: Ruifeng Wang (Arm Technology China) > Cc: David Marchand ; Aaron Conole > ; David Hunt ; dev > ; Gavin Hu (Arm Technology China) ; > Honnappa Nagarahalli ; nd > ; dpdk stable > Subject: Re: [EXT] RE: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: = fix > deadlock issue for aarch64 >=20 > Hi >=20 > I tested this patch, following are my observations: > 1. With this patch distributor_autotest getting suspended on arm64 platfo= rm > is resolved. But continous execution of this test results in test failure= , as > reported by Aaron. > 2. While testing on x86 platform, still I can observe distributor_autotes= t > getting suspeneded(stuck) on continous execution of the test (it took alm= ost > 7-8 iterations to reproduce the suspension). Yes, this v1 patch is not complete to solve the issue. I have posted v3: http://patches.dpdk.org/project/dpdk/list/?series=3D6856 With the new patch set, I didn't observe test failure in my test. Will you try that? Thanks. /Ruifeng >=20 > Thanks >=20 > On Wed, Oct 09, 2019 at 05:52:03AM +0000, Ruifeng Wang (Arm Technology > China) wrote: > > External Email > > > > ---------------------------------------------------------------------- > > > > > -----Original Message----- > > > From: David Marchand > > > Sent: Wednesday, October 9, 2019 03:47 > > > To: Aaron Conole > > > Cc: Ruifeng Wang (Arm Technology China) ; > > > David Hunt ; dev ; > > > hkalra@marvell.com; Gavin Hu (Arm Technology China) > > > ; Honnappa Nagarahalli > > > ; nd ; dpdk stable > > > > > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix > > > deadlock issue for aarch64 > > > > > > On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole > wrote: > > > > > > > > Ruifeng Wang writes: > > > > > > > > > Distributor and worker threads rely on data structs in cache > > > > > line for synchronization. The shared data structs were not protec= ted. > > > > > This caused deadlock issue on weaker memory ordering platforms > > > > > as aarch64. > > > > > Fix this issue by adding memory barriers to ensure > > > > > synchronization among cores. > > > > > > > > > > Bugzilla ID: 342 > > > > > Fixes: 775003ad2f96 ("distributor: add new burst-capable > > > > > library") > > > > > Cc: stable@dpdk.org > > > > > > > > > > Signed-off-by: Ruifeng Wang > > > > > Reviewed-by: Gavin Hu > > > > > --- > > > > > > > > I see a failure in the distributor_autotest (on one of the builds): > > > > > > > > 64/82 DPDK:fast-tests / distributor_autotest FAIL 0.37 s (exit= status > 255 > > > or signal 127 SIGinvalid) > > > > > > > > --- command --- > > > > > > > > DPDK_TEST=3D'distributor_autotest' > > > > /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 > > > > --file-prefix=3Ddistributor_autotest > > > > > > > > --- stdout --- > > > > > > > > EAL: Probing VFIO support... > > > > > > > > APP: HPET is not enabled, using TSC as default timer > > > > > > > > RTE>>distributor_autotest > > > > > > > > =3D=3D=3D Basic distributor sanity tests =3D=3D=3D > > > > > > > > Worker 0 handled 32 packets > > > > > > > > Sanity test with all zero hashes done. > > > > > > > > Worker 0 handled 32 packets > > > > > > > > Sanity test with non-zero hashes done > > > > > > > > =3D=3D=3D testing big burst (single) =3D=3D=3D > > > > > > > > Sanity test of returned packets done > > > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (single) =3D=3D=3D > > > > > > > > Sanity test with mbuf alloc/free passed > > > > > > > > Too few cores to run worker shutdown test > > > > > > > > =3D=3D=3D Basic distributor sanity tests =3D=3D=3D > > > > > > > > Worker 0 handled 32 packets > > > > > > > > Sanity test with all zero hashes done. > > > > > > > > Worker 0 handled 32 packets > > > > > > > > Sanity test with non-zero hashes done > > > > > > > > =3D=3D=3D testing big burst (burst) =3D=3D=3D > > > > > > > > Sanity test of returned packets done > > > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (burst) =3D=3D=3D > > > > > > > > Line 326: Packet count is incorrect, 1048568, expected 1048576 > > > > > > > > Test Failed > > > > > > > > RTE>> > > > > > > > > --- stderr --- > > > > > > > > EAL: Detected 2 lcore(s) > > > > > > > > EAL: Detected 1 NUMA nodes > > > > > > > > EAL: Multi-process socket > > > > /var/run/dpdk/distributor_autotest/mp_socket > > > > > > > > EAL: Selected IOVA mode 'PA' > > > > > > > > EAL: No available hugepages reported in hugepages-1048576kB > > > > > > > > ------- > > > > > > > > Not sure how to help debug further. I'll re-start the job to see > > > > if it 'clears' up - but I guess there may be a delicate > > > > synchronization somewhere that needs to be accounted. > > > > > > Idem, and with the same loop I used before, it can be caught quickly. > > > > > > # time (log=3D/tmp/$$.log; while true; do echo distributor_autotest > > > |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level > > > |*:8 > > > -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; > > > rm -f $log) > > > > > Thanks Aaron and David for your report. I can reproduce this issue with= the > script. > > Will fix it in next version. > > > > > [snip] > > > > > > RTE>>distributor_autotest > > > EAL: Trying to obtain current memory policy. > > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > > EAL: Restoring previous memory policy: 0 > > > EAL: request: mp_malloc_sync > > > EAL: Heap on socket 0 was expanded by 2MB > > > EAL: Trying to obtain current memory policy. > > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > > EAL: Restoring previous memory policy: 0 > > > EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous > > > space > > > EAL: Trying to obtain current memory policy. > > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > > EAL: Restoring previous memory policy: 0 > > > EAL: request: mp_malloc_sync > > > EAL: Heap on socket 0 was expanded by 8MB =3D=3D=3D Basic distributor > > > sanity tests =3D=3D=3D Worker 0 handled 32 packets Sanity test with a= ll zero > hashes done. > > > Worker 0 handled 32 packets > > > Sanity test with non-zero hashes done =3D=3D=3D testing big burst (si= ngle) > > > =3D=3D=3D Sanity test of returned packets done > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (single) =3D=3D=3D Sanity = test with > > > mbuf alloc/free passed > > > > > > Too few cores to run worker shutdown test =3D=3D=3D Basic distributor > > > sanity tests =3D=3D=3D Worker 0 handled 32 packets Sanity test with a= ll zero > hashes done. > > > Worker 0 handled 32 packets > > > Sanity test with non-zero hashes done =3D=3D=3D testing big burst (bu= rst) > > > =3D=3D=3D Sanity test of returned packets done > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (burst) =3D=3D=3D Line 326= : Packet > > > count is incorrect, 1048568, expected 1048576 Test Failed > > > RTE>> > > > real 0m36.668s > > > user 1m7.293s > > > sys 0m1.560s > > > > > > Could be worth running this loop on all tests? (not talking about > > > the CI, it would be a manual effort to catch lurking issues). > > > > > > > > > -- > > > David Marchand