From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AFB1141BBE for ; Fri, 3 Feb 2023 17:10:04 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8A1264067B; Fri, 3 Feb 2023 17:10:04 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id AF1A44021E; Fri, 3 Feb 2023 17:10:02 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675440603; x=1706976603; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=r3o+XCEKjewVMioM6JLJiyfzTbw4VCWi110zA6zvs/A=; b=KXmM4DL8VuNNhp9yCTX+2yMD4QWtHNUsWfLrPeTR1TZbFQqDCQitk191 KbEc8dQNGAqhMd8xdrVJXgY2D/PEehR2k+j80Wsaj6oVd147e8jrrUERW +dBF6Y9V0UkPOqPU0cOdfsDtyotWW7jfvnQMf36OfnZJ8XzV3CGbhsO1k 9P1SYGO1dvFM9r5E2fc2Ia7Oe9bwA5+Y9bj+jXpUEMsRCUV4bhLYvnO9y ZFVWjFyEhYvZykHBMdSnzCnvUGbOLNQfJf9Tip6hHC0h3mqi6DRWiMIwB i4u3PbRCpRQzR8kieiy1rLHasAd1+t4xdeMKFkObwKY+lkBuFNLRnhE6r w==; X-IronPort-AV: E=McAfee;i="6500,9779,10610"; a="327419764" X-IronPort-AV: E=Sophos;i="5.97,270,1669104000"; d="scan'208";a="327419764" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Feb 2023 08:10:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10610"; a="698125178" X-IronPort-AV: E=Sophos;i="5.97,270,1669104000"; d="scan'208";a="698125178" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by orsmga001.jf.intel.com with ESMTP; 03 Feb 2023 08:10:01 -0800 Received: from orsmsx612.amr.corp.intel.com (10.22.229.25) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16; Fri, 3 Feb 2023 08:10:00 -0800 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx612.amr.corp.intel.com (10.22.229.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16 via Frontend Transport; Fri, 3 Feb 2023 08:10:00 -0800 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.106) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.16; Fri, 3 Feb 2023 08:10:00 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NC4Gf25Rbi30PNBsBlZsYDXmHuUFzUT4FqaRNN4MUUzjOYutrViStt5CtagzP1AHyJ/3Ia1UWfxo1WzotqXZtG1Fuvz4idJSWwevq3hIwtDaaVUe78y0sEqDDS/OdXIheDHL4DjDmpPuYra8Ar0kAd2FiPSN/cERyNkjD51vlsPRucOu4p7Ln0kTYWvjYPksXQpjFGBYHyJrhcWx+w/BnnOkMHym4lx1RCdLcPm3XAE0GlWBLbj6hkj09TFfCGY3sc+y6blpeVGp7acVo9wb+Pk32IKymZOmPwaTw/noNLb9buQna75T84rzbjis0zszx/Q6JpMUQT1xDqkqwf3h/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nxJiUMuwFu9Gc+D309f1VimNePU6zwE89PWITX7VSA0=; b=N8SmhMJqd7zIzQKvmkqzqdY/Ic4tP4RmzSTwjV4ObhEn5dofxxCmH0hRoCUqwHXGT34x7g71xm3d7ufjo52CyWFTVk4TD7cW+1vsxSTZZCbRnFg8sDEVdYaIcQe4tHkx9GmXK8OQ1Y2uBlnUtxWbkavD1U+ajjnBmHYeI2JZCBfz4y0DohwihWOsLYPTe1NaA+ADO4XVYSseo37iXk5xccraCv/ha+87pwpztw/QZSMUAcQhIEg3yP/NvLRZ/0fd88+SZd/nQELS0PAtMlPMXzyYm+KWtws5/YX6H5tn4+mSW/X74YnT81FUmdkzES5LtFmRo8Nd0rqkIu7vzQ3xIg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from BN0PR11MB5712.namprd11.prod.outlook.com (2603:10b6:408:160::17) by SA3PR11MB7628.namprd11.prod.outlook.com (2603:10b6:806:312::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.25; Fri, 3 Feb 2023 16:09:58 +0000 Received: from BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::f912:bad2:e618:84c1]) by BN0PR11MB5712.namprd11.prod.outlook.com ([fe80::f912:bad2:e618:84c1%5]) with mapi id 15.20.6064.027; Fri, 3 Feb 2023 16:09:58 +0000 From: "Van Haaren, Harry" To: Thomas Monjalon , David Marchand CC: "dev@dpdk.org" , "dpdklab@iol.unh.edu" , "ci@dpdk.org" , "Honnappa.Nagarahalli@arm.com" , mattias.ronnblom , =?iso-8859-1?Q?Morten_Br=F8rup?= , Tyler Retzlaff , Aaron Conole Subject: RE: [PATCH v3] test/service: fix spurious failures by extending timeout Thread-Topic: [PATCH v3] test/service: fix spurious failures by extending timeout Thread-Index: AQHY2YKRDtiKaquGO0iRXNp3CgKP364BW6cAgABmbQCAr1xjAIAIX4GggASLY0CAAAiJgIAADL6Q Date: Fri, 3 Feb 2023 16:09:58 +0000 Message-ID: References: <20221006081729.578475-1-harry.van.haaren@intel.com> <21760850.EfDdHjke4D@thomas> In-Reply-To: <21760850.EfDdHjke4D@thomas> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: BN0PR11MB5712:EE_|SA3PR11MB7628:EE_ x-ms-office365-filtering-correlation-id: 3010de39-09c7-4608-e984-08db06011909 x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: gfJEFnajiXrUn3GvuuxkR8TIAcDdlq7ENcItcSBVEpDvAHWCvOGlkGIHKhdZ9anrznHKf8C2SC3Y489YYMvAVQxNYFufzu6iBDq1Pnbucm/ByN2lRXZQlD9FVrw6OHA2s6sYCGoCXJn3T4IlOXgWxAQau1oLMlBhrOt9RlID2TnJXsxNuvAbNaL59qAfZXpPIA8qQWGx5HWrVSt/NDKKwBu5QrGygPALU+2FPL/LVmWngFkbjkt1cLbYRE+7Q3hHJ9ROPMlkTkKzqf9b9+1AXAJ1ac6WE2CzGK5FhaFr3norUA440pYt2GxR1gfBTfvHwzHOBGvHMgB2sn0Cjg8CGXxFn3AF4IsndWopsAbF4Gzf3ZBNt89iw55fyfZBJzgpopX3iT5rOyKXZ7JkyZ8ZwkxPu6q51o7jMFzFnw5+QOLBWfoJik6Z3gR78HiJeh7UZRBVklxNLafJ+9P9l7a8JREIMCE7ZHWTqGmqHMb/49jw+y8bNmO6Oc0wpv2cgHh9rejYQkDgKVHgIqdocX0qmvKAB0kitBgmb52I4yY+cuSKPtXhnOUwWghPj323uXFuzQVDU3PH501MEyy5wgHO2mI/hZazc/N6po5qF1TWAfbZhWnPImm0U1HGhXjg43HgmC7Iu3eEhx8yp0rnihP7Ua3m2bEu6s+mwwjCJHUaNlD91pW7W+kL/4YdLkhrr3o2ghuHyY2LsW8TtMNJpIQ2vg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN0PR11MB5712.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(346002)(376002)(136003)(39860400002)(366004)(396003)(451199018)(66899018)(66476007)(71200400001)(54906003)(110136005)(8676002)(7696005)(4326008)(76116006)(66946007)(66446008)(316002)(38070700005)(82960400001)(33656002)(53546011)(86362001)(122000001)(38100700002)(9686003)(66556008)(26005)(55016003)(64756008)(186003)(478600001)(6506007)(2906002)(52536014)(8936002)(7416002)(5660300002)(66574015)(83380400001)(41300700001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?KNRERlreFjTKhoElh7fmP0fTODtnielVrNLiI3yZYLHdDJLh8rcUtqGb1H?= =?iso-8859-1?Q?0W+i4wRljGtiEYg208iANGUKy07daOFN7jAfQ7c3hYo40HjFWKd9WhLYul?= =?iso-8859-1?Q?dDnwVVouYAy/Rfcc6n3rV6LXnYjYrF8YcPqxWoTrEHsT/TrdS8RYZhuyXj?= =?iso-8859-1?Q?TuOvbDmuLOfgBUbOS3itqCV4woDpTGp/dRz4BbnV25ZtjlyAC2xM0NRfWJ?= =?iso-8859-1?Q?ZdnJ97p138v8/BkUkmHb9vb/kW0Q0xhToNgJnJuz9ZOG13exrx5CfU6Pty?= =?iso-8859-1?Q?AJ/c4CL9DLSeWn7IqMmsDQQXDL9QjYVbtWtmn5eXJr7XO0SqLnBJqC3F3Q?= =?iso-8859-1?Q?DOmXKtCDpoVuyXIg15RmlfGVzy0/PWQHnHKLoIdU9PrdX0k0CSs69wePK/?= =?iso-8859-1?Q?KQZyClcXRXC2J9v5EXZjHajqqfHcq8yH9aGXgR7GjslXyaWQ/Zfy3eG2rN?= =?iso-8859-1?Q?ATWC3/GoYBLQyXDIh8lkzfAg+Y53+l5XsZ+LuEeAbbMTP/Ag9RO4iyUrjS?= =?iso-8859-1?Q?e3YCfnJ0ys1eKryPR5/mPJaIwnQ6+thv2gxSq6pRaDYZoSwc3Y9kOjwIz/?= =?iso-8859-1?Q?8QNK4TRVsey3WaDkEc7xrr4FVpVPsNPS1DO/cO9Sfo/YmYcMkYN1dOfnEI?= =?iso-8859-1?Q?G/O5gM+CyCvZIA8B9NGzL1+TVMT9NgjDpuczUqzjCh+/FozPqyqU2WXbmJ?= =?iso-8859-1?Q?U6a9jsB/wy1G6c0C/0bcfEWKMOrzvd2GJKydVhY5XaWTYRyind3HqQGQIj?= =?iso-8859-1?Q?4MTScoY+aGxuAp0EHu6dGURo+FF/zlFkpb50GAbBrXTacRnA/JveSwoi9L?= =?iso-8859-1?Q?KFyeMfA0uLZLmaUkymsUJq/2f5SSP7gYpF15/hYT1rsz+bxKwXUHPtNo28?= =?iso-8859-1?Q?AZHDg9tJKj2HEk8QbUhLd5sBIu6YgbN2jg3w5p0Cuvrah9TmmXiveLOBUB?= =?iso-8859-1?Q?UIxP2P2ffq7F6B0U0IkcWm3HS0oPtsj+FvLUG6nv0pVZ4d0FSKJ9/WGpFF?= =?iso-8859-1?Q?zl984VarDClXkGdxymLhO0Xjx5NMKsQxxNhXI2jkuIWwk+gMYyGG21Zy14?= =?iso-8859-1?Q?eZqz35ujakv9BrMQtFCSMK2fwYPMxG7gy1o+8OlAZWmT6AvHkX0UrbzmsZ?= =?iso-8859-1?Q?nA6S97tZGBQ+a9uvvFcog+Wk94bVIpGP7X5tfv7KsbRP8Mm0jO7DvfDdeb?= =?iso-8859-1?Q?m2CRS8aOJeeE05kNrC7b0eLh8ni1hQ9WVumuNFfsbAEG9eVPf134tI6Y3L?= =?iso-8859-1?Q?+Cr7paMlvrJGw/qZRB26PT1oeWRIzTzgg33WXO0gY5/ze4LS7FVPqEXl2b?= =?iso-8859-1?Q?5Kz2kZO5eSUsvNBpR7Df4D0jN0KH7kfONnbDi5JE/6n9bP28Jw5HwC/x+j?= =?iso-8859-1?Q?CydLv53wSv968Zjnbgfnd26cCIWF62xgq7kytDt8PxPnELWtk5QcxqPjfK?= =?iso-8859-1?Q?l1RqBbGhRIN0AvvVwZI8u/s+P9/bqoa0YyucIJUq2Jh1l8qoLsxzZcdUBQ?= =?iso-8859-1?Q?xt+e8VjHsGtNl4AckAlXToCLyzRm67VQ8c4MV4nUP0BR6QKNO0BvEJ7+9b?= =?iso-8859-1?Q?mLppEiqoqB75fT+aaX4144XntLijy5oQ4qHJV1MtjNhmTvbbLy6rqSNQQ3?= =?iso-8859-1?Q?/MeUwlJetd6+o3ocW+7O4riiv1gv9hj+OE5w02r8o2mxEurMULdNhyew?= =?iso-8859-1?Q?=3D=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN0PR11MB5712.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3010de39-09c7-4608-e984-08db06011909 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Feb 2023 16:09:58.7402 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: MTqbHevZMDyreN6QZO13QOCVA4wUy2lWGaRD/I6qmkgo96EXCG0/GrES+llJ87eFlFzLEyfj4wp1G8VJ6jCXQ+Qeyd7mYWTkp3X3NvEeQ0U= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR11MB7628 X-OriginatorOrg: intel.com X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org > -----Original Message----- > From: Thomas Monjalon > Sent: Friday, February 3, 2023 3:16 PM > To: David Marchand ; Van Haaren, Harry > > Cc: dev@dpdk.org; dpdklab@iol.unh.edu; ci@dpdk.org; > Honnappa.Nagarahalli@arm.com; mattias.ronnblom > ; Morten Br=F8rup > ; Tyler Retzlaff = ; > Aaron Conole > Subject: Re: [PATCH v3] test/service: fix spurious failures by extending = timeout >=20 > 03/02/2023 16:03, Van Haaren, Harry: > > From: Van Haaren, Harry > > > > The timeout approach just does not have its place in a functional t= est. > > > > Either this test is rewritten, or it must go to the performance tes= ts > > > > list so that we stop getting false positives. > > > > Can you work on this? > > > > > > I'll investigate various approaches on Thursday and reply here with s= uggested > > > next steps. > > > > I've identified 3 checks that fail in CI (from the above log outputs), = all 3 cases > > Have different dlays: 100 ms delay, 200 ms delay and 1000ms. > > In the CI, the service-core just hasn't been scheduled (yet) and causes= the > "failure". > > > > Option 1) > > One option is to while(1) loop, waiting for the service-thread to be sc= heduled. > This can be > > seen as "increasing the timeout", however in this case the test-case wo= uld be > errored > > not in the test-code, but in the meson-test runner as a timeout (with a= 10sec > default?) > > The benefit here is that massively increasing (~1sec or less to 10 sec)= will cover > all/many > > of the CI timeouts. > > > > Option 2) > > Move to perf-tests, and not run these in a noisy-CI environment where t= he > results are not > > consistent enough to have value. This would mean that the tests are not= run in > CI for the > > 3 checks in question are below, they all *require* the service core to = be > scheduled: > > service_attr_get() -> requires service core to run for service stats to= increment > > service_lcore_attr_get() -> requires service core to run for lcore stat= s to > increment > > service_lcore_start_stop() -> requires service to run to to ensure serv= ice-func > itself executes. > > > > I don't see how we can "improve" option 2 to not require the service-th= read to > be scheduled by the OS.. > > And the only way to make the OS schedule it in the CI more consistently= is to > give it more time? >=20 > We are talking about seconds. > There are setups where scheduling a thread is taking seconds? Apparently so - otherwise these tests would always pass. They *only* fail at random runs in CI, and reliably pass everywhere else.. = I've not had them fail locally, and that includes running in a loop for hours with a bus= y system.. but not a low-priority CI VM in a busy datacenter. [Bruce wrote in separate mail] >>> For me, the question is - why hasn't the service-core been scheduled? C= an >>> we use sched-yield or some other mechanism to force a wakeup of it? I'm not aware of a way to make *a specific other pthread* wakeup. We could= sacrifice the current lcore that's waiting for the service-lcore, with a sched_yield(= ) as you suggest. It would potentially "churn" the scheduler enough to give the service core = some CPU? It's a guess/gamble in the end, kind of like the timeouts we have today.. > > Thoughts and input welcomed, I'm happy to make the code changes > themselves, its small effort > > For both option 1 & 2. >=20 > For time-sensitive tests, yes they should be in perf tests category. > As David said earlier, no timeout approach in functional tests. Ok, as before, option 1) is to while(1) and wait for "success". Then there'= s no timeout in the test code, but our meson test runner will time-out/fail a= fter ~10sec IIRC. Or we move the tests perf-tests, as per Option 2), and these simply won't r= un in CI. I'm OK with all 3 (including testing with sched_yield() for a month or two = and if that helps?)