From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DD5F9A317C for ; Thu, 17 Oct 2019 13:42:33 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0153A1E8E1; Thu, 17 Oct 2019 13:42:32 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id AF5B31E8D5; Thu, 17 Oct 2019 13:42:30 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id x9HBekJr021194; Thu, 17 Oct 2019 04:42:22 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=pfpt0818; bh=gZy9f+v8yq1mQ1PNe98S1TUfUYtARdUd+hkLv0ASSHs=; b=wVV5l8bp1nFf8prVBEWmdC99SBEvEgWKVEB/KaOeM2GbeVBdDXrpAuIXikgyqSiJC/Co DTXIrn7uixhsgnoNcdI73OhNFY9LZtHWBq1+1pSVhf+DGMGrtT122sQsf5uM1dqNH85k GDxC9dcnuQc3grQBvYcP0IgCvE17TFc0QJp318zMFxU5VJwRWIM6N86dmPCg23gJB/i5 cGBrSTafCvtYsvqJMrz3hqsrJjV16lSzxjsfvk4F3Qh6UQWr3MideyNzEryqWITeNn9f balcb77NReqdANR6/s3LKjEFAxkAoXeUUBqzWJy/sw+VgXQByUTITdWkOXxa5w7fNBz9 aw== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0b-0016f401.pphosted.com with ESMTP id 2vpj9bs08m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 17 Oct 2019 04:42:22 -0700 Received: from SC-EXCH02.marvell.com (10.93.176.82) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 17 Oct 2019 04:42:20 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.56) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Thu, 17 Oct 2019 04:42:20 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=c+ODUdO5kxfg/bPsYCibGVICRgOQLRctvZCkFtLkNrBlCEw+Tq2KTmeN21ls2dysgCkmlymy86pFtSA8b/cC7O5zT4TSnlsM9xIl+XBglBwdPQb+QfsjrpCSEZPvHCp2FZBgWkQ+wZ8t1n7C37IPa4TNfOjrZC/cjy9vDwVS5AzBIxUjHdmbe6QlVbS08gPSTBQOtJwBu8JHxRt0OnYQq4k1//gqHasKj1F4CforqApJj23AXkiw8Cuok3L8pmEQExdFg/CkJUO22wQLvKpX+ka9y46o/qOOi+e87hBmkqyuIN9xY7dijX9O3QI5tNcIlA08TwxHXENoQgw6tPgx/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gZy9f+v8yq1mQ1PNe98S1TUfUYtARdUd+hkLv0ASSHs=; b=oWw00qEWTBdQOE2X1zChANpfKav6IHT2h1T5uQVsFNTtgwEhIGJRrnWqrw9VnBwv/4lCzIRR68590HBMS/bW1OBWP3rB8JPYnux3Pe7G9LT5IsqIWmQXzTpbLVkGSbJSE0LCXVxK2kEaaFuKRqqZXAgjOf7Cz71bozTT6Ss4NXa5/5s1QsA50Ai3kPKhSEVASL69DQGehMhDSe9EsynwwsXsSSOM8RqQsDBgcelvH+ZpoS/m84LE0waLVmu56xuWgwLUt5igni+gRLenV7WAUfayZIEcsAt33vQyo/HXpZ2Qqq8od5SJHRB04clbeq21Z8K2uHayIVKfyB4SVOeHLA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gZy9f+v8yq1mQ1PNe98S1TUfUYtARdUd+hkLv0ASSHs=; b=ay1/6tXPYyO8krwddyonQm4EcvKmB/5bhpOxm4XnpBRZwAgPLYPWOZGBn0CMwTq4YjISDlq2M3SIkxM+oqDYXq3eWocO2bE7nrJCunVRLfr6a+Gu7udeA7yb50ZcqwvXpKL9KRKisFfp7JYrtBYbwejJtJA+5m/pQ3vI3XOuMj0= Received: from MN2PR18MB2848.namprd18.prod.outlook.com (20.179.21.149) by MN2PR18MB2943.namprd18.prod.outlook.com (20.179.22.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.16; Thu, 17 Oct 2019 11:42:18 +0000 Received: from MN2PR18MB2848.namprd18.prod.outlook.com ([fe80::cc97:455d:adca:ef1e]) by MN2PR18MB2848.namprd18.prod.outlook.com ([fe80::cc97:455d:adca:ef1e%4]) with mapi id 15.20.2347.023; Thu, 17 Oct 2019 11:42:17 +0000 From: Harman Kalra To: "Ruifeng Wang (Arm Technology China)" CC: David Marchand , Aaron Conole , David Hunt , dev , "Gavin Hu (Arm Technology China)" , Honnappa Nagarahalli , nd , dpdk stable Thread-Topic: [EXT] RE: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 Thread-Index: AQHVfmXDH5B0dY98skaTgCDx3DfGOadewv8A Date: Thu, 17 Oct 2019 11:42:17 +0000 Message-ID: <20191017114203.GA137626@outlook.office365.com> References: <20191008095524.1585-1-ruifeng.wang@arm.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BMXPR01CA0063.INDPRD01.PROD.OUTLOOK.COM (2603:1096:b00:2c::27) To MN2PR18MB2848.namprd18.prod.outlook.com (2603:10b6:208:3e::21) x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [115.113.156.2] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b3c9f3c1-c346-47ba-fab2-08d752f71012 x-ms-traffictypediagnostic: MN2PR18MB2943: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3968; x-forefront-prvs: 01930B2BA8 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(39860400002)(396003)(366004)(346002)(376002)(136003)(13464003)(189003)(199004)(256004)(66476007)(102836004)(6436002)(476003)(76176011)(305945005)(14444005)(229853002)(6486002)(33656002)(25786009)(55236004)(486006)(6916009)(11346002)(7736002)(26005)(446003)(64756008)(66946007)(66446008)(66556008)(186003)(71190400001)(14454004)(71200400001)(478600001)(6246003)(6506007)(386003)(53546011)(316002)(52116002)(1076003)(66066001)(86362001)(2906002)(99286004)(8676002)(4326008)(8936002)(6116002)(5660300002)(3846002)(6512007)(9686003)(81166006)(81156014)(54906003); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB2943; H:MN2PR18MB2848.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Y1T89om9JJhu6WcLDb399odHSDtwLKx0HeDUs9nblnimiLEKEBBYzYHYVFWM8POXRoe4Mn4RU3iaCFtCtksTpzc7FIb36qQhPYE1AE2WP39ZdrblWZhn53O8P/MrSIahNjvqAmQaqnNC63BZZkg3J9Uw+C/FEkvFnl3WR60TnVoe9BXg3QB9ZqPrb053sOCB66Un+UIa2rW+eYU+cl0b5J82pt1irUm8Ega4MjEt0HfNKFdKsTBgPVVjGkB0+YOyMr2+Oo/JQzOknuFXNJKsfzw/d6BH16ZM0aXITaM0iFAfa/Os9XQJfUerlwuh/Jwwe8+z3Yiez07Kn2uLUzqwkqQAArcXuvQirTXpMzDixNLH79a2LNFgSRW/4L9TV2hMxuVjrjx62nMyDCsKiSITDxPj0Tthu9+HOdhv3r+l2oc= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-ID: <0AA72DED1043754B91DC55B8215F8EF9@namprd18.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: b3c9f3c1-c346-47ba-fab2-08d752f71012 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Oct 2019 11:42:17.8092 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Tpw72jS4wdn5Uf7ewjXwUDRHq+FTypf2GAnjOP69IFgPUMZwc7TRI4l5au+RBDlj3mx7BklVgt2NsMi+7D43Hg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB2943 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.95,1.0.8 definitions=2019-10-17_04:2019-10-17,2019-10-17 signatures=0 Subject: Re: [dpdk-dev] [EXT] RE: [dpdk-stable] [PATCH] lib/distributor: fix deadlock issue for aarch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi I tested this patch, following are my observations: 1. With this patch distributor_autotest getting suspended on arm64 platform is resolved. But continous execution of this test results in test failure, as reported by Aaron. 2. While testing on x86 platform, still I can observe distributor_autotest getting suspeneded(stuck) on continous execution of the test (it took almos= t 7-8 iterations to reproduce the suspension). Thanks On Wed, Oct 09, 2019 at 05:52:03AM +0000, Ruifeng Wang (Arm Technology Chin= a) wrote: > External Email >=20 > ---------------------------------------------------------------------- >=20 > > -----Original Message----- > > From: David Marchand > > Sent: Wednesday, October 9, 2019 03:47 > > To: Aaron Conole > > Cc: Ruifeng Wang (Arm Technology China) ; David > > Hunt ; dev ; hkalra@marvell.com; > > Gavin Hu (Arm Technology China) ; Honnappa > > Nagarahalli ; nd ; dpdk > > stable > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix dead= lock > > issue for aarch64 > >=20 > > On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole wrote: > > > > > > Ruifeng Wang writes: > > > > > > > Distributor and worker threads rely on data structs in cache line > > > > for synchronization. The shared data structs were not protected. > > > > This caused deadlock issue on weaker memory ordering platforms as > > > > aarch64. > > > > Fix this issue by adding memory barriers to ensure synchronization > > > > among cores. > > > > > > > > Bugzilla ID: 342 > > > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library") > > > > Cc: stable@dpdk.org > > > > > > > > Signed-off-by: Ruifeng Wang > > > > Reviewed-by: Gavin Hu > > > > --- > > > > > > I see a failure in the distributor_autotest (on one of the builds): > > > > > > 64/82 DPDK:fast-tests / distributor_autotest FAIL 0.37 s (exit s= tatus 255 > > or signal 127 SIGinvalid) > > > > > > --- command --- > > > > > > DPDK_TEST=3D'distributor_autotest' > > > /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 > > > --file-prefix=3Ddistributor_autotest > > > > > > --- stdout --- > > > > > > EAL: Probing VFIO support... > > > > > > APP: HPET is not enabled, using TSC as default timer > > > > > > RTE>>distributor_autotest > > > > > > =3D=3D=3D Basic distributor sanity tests =3D=3D=3D > > > > > > Worker 0 handled 32 packets > > > > > > Sanity test with all zero hashes done. > > > > > > Worker 0 handled 32 packets > > > > > > Sanity test with non-zero hashes done > > > > > > =3D=3D=3D testing big burst (single) =3D=3D=3D > > > > > > Sanity test of returned packets done > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (single) =3D=3D=3D > > > > > > Sanity test with mbuf alloc/free passed > > > > > > Too few cores to run worker shutdown test > > > > > > =3D=3D=3D Basic distributor sanity tests =3D=3D=3D > > > > > > Worker 0 handled 32 packets > > > > > > Sanity test with all zero hashes done. > > > > > > Worker 0 handled 32 packets > > > > > > Sanity test with non-zero hashes done > > > > > > =3D=3D=3D testing big burst (burst) =3D=3D=3D > > > > > > Sanity test of returned packets done > > > > > > =3D=3D=3D Sanity test with mbuf alloc/free (burst) =3D=3D=3D > > > > > > Line 326: Packet count is incorrect, 1048568, expected 1048576 > > > > > > Test Failed > > > > > > RTE>> > > > > > > --- stderr --- > > > > > > EAL: Detected 2 lcore(s) > > > > > > EAL: Detected 1 NUMA nodes > > > > > > EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socke= t > > > > > > EAL: Selected IOVA mode 'PA' > > > > > > EAL: No available hugepages reported in hugepages-1048576kB > > > > > > ------- > > > > > > Not sure how to help debug further. I'll re-start the job to see if > > > it 'clears' up - but I guess there may be a delicate synchronization > > > somewhere that needs to be accounted. > >=20 > > Idem, and with the same loop I used before, it can be caught quickly. > >=20 > > # time (log=3D/tmp/$$.log; while true; do echo distributor_autotest > > |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8 > > -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm = -f $log) > >=20 > Thanks Aaron and David for your report. I can reproduce this issue with t= he script. > Will fix it in next version. >=20 > > [snip] > >=20 > > RTE>>distributor_autotest > > EAL: Trying to obtain current memory policy. > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > EAL: Restoring previous memory policy: 0 > > EAL: request: mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 2MB > > EAL: Trying to obtain current memory policy. > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > EAL: Restoring previous memory policy: 0 > > EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous spa= ce > > EAL: Trying to obtain current memory policy. > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > EAL: Restoring previous memory policy: 0 > > EAL: request: mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 8MB =3D=3D=3D Basic distributor s= anity > > tests =3D=3D=3D Worker 0 handled 32 packets Sanity test with all zero h= ashes done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > =3D=3D=3D testing big burst (single) =3D=3D=3D > > Sanity test of returned packets done > >=20 > > =3D=3D=3D Sanity test with mbuf alloc/free (single) =3D=3D=3D Sanity te= st with mbuf > > alloc/free passed > >=20 > > Too few cores to run worker shutdown test =3D=3D=3D Basic distributor s= anity tests > > =3D=3D=3D Worker 0 handled 32 packets Sanity test with all zero hashes = done. > > Worker 0 handled 32 packets > > Sanity test with non-zero hashes done > > =3D=3D=3D testing big burst (burst) =3D=3D=3D > > Sanity test of returned packets done > >=20 > > =3D=3D=3D Sanity test with mbuf alloc/free (burst) =3D=3D=3D Line 326: = Packet count is > > incorrect, 1048568, expected 1048576 Test Failed > > RTE>> > > real 0m36.668s > > user 1m7.293s > > sys 0m1.560s > >=20 > > Could be worth running this loop on all tests? (not talking about the C= I, it > > would be a manual effort to catch lurking issues). > >=20 > >=20 > > -- > > David Marchand