From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B3CE7A2EEB for ; Mon, 7 Oct 2019 11:51:02 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2EE221C209; Mon, 7 Oct 2019 11:51:02 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B111A1C202 for ; Mon, 7 Oct 2019 11:51:00 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Oct 2019 02:50:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,267,1566889200"; d="scan'208";a="183384715" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga007.jf.intel.com with ESMTP; 07 Oct 2019 02:50:58 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX107.ger.corp.intel.com (163.33.3.99) with Microsoft SMTP Server (TLS) id 14.3.439.0; Mon, 7 Oct 2019 10:50:47 +0100 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.160]) by irsmsx155.ger.corp.intel.com ([169.254.14.139]) with mapi id 14.03.0439.000; Mon, 7 Oct 2019 10:50:46 +0100 From: "Van Haaren, Harry" To: Aaron Conole , David Marchand CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [BUG] service_lcore_en_dis_able from service_autotest failing Thread-Index: AQHVY1rPqIc+H4QfTEirQT6qqLucGqdPHkAQ Date: Mon, 7 Oct 2019 09:50:46 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMzZhYjI1YTgtYjQ4Ny00MzBiLTk1MTYtNWYwMzVlZGY2OTZlIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiVndWUmJuYUpzVWRlYUxFdWJjY0JhbmFXdU9kZTFaWUVhUmJhS21qRGpPMEZoc3VvZ2tEblNhMjhJTzdsblFRNSJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [BUG] service_lcore_en_dis_able from service_autotest failing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Aaron Conole [mailto:aconole@redhat.com] > Sent: Wednesday, September 4, 2019 8:56 PM > To: David Marchand > Cc: Van Haaren, Harry ; dev@dpdk.org > Subject: Re: [dpdk-dev] [BUG] service_lcore_en_dis_able from service_auto= test > failing >=20 > David Marchand writes: >=20 > > On Wed, Sep 4, 2019 at 12:04 PM David Marchand > > wrote: > >> > >> On Wed, Sep 4, 2019 at 11:42 AM Van Haaren, Harry > >> wrote: > >> > > >> > > -----Original Message----- > >> > > From: Aaron Conole [mailto:aconole@redhat.com] > >> > > Sent: Tuesday, September 3, 2019 3:46 PM > >> > > To: Van Haaren, Harry > >> > > Cc: dev@dpdk.org > >> > > Subject: [BUG] service_lcore_en_dis_able from service_autotest fai= ling > >> > > > >> > > Hi Harry, > >> > > >> > Hey Aaron, > >> > > >> > > I noticed as part of series_6218 > >> > > (http://patches.dpdk.org/project/dpdk/list/?series=3D6218) that th= e > travis > >> > > build had a single failure, in service_autotest but it doesn't see= m > >> > > related to the series at all. > >> > > > >> > > https://travis-ci.com/ovsrobot/dpdk/jobs/230358460 > >> > > > >> > > Not sure if there's some kind of debugging we can add or look at t= o > help > >> > > diagnose failures when they occur. Do you have time to have a loo= k? > >> > > >> > Thanks for flagging this. > >> > > >> > I've just re-run the unit tests here multiple times to see if I can > >> > reproduce something strange, no luck on reproducing the issue. > >> > > >> > Attempted with clang-6 and clang-7 (travis error on clang-7), > >> > still no issues found. > >> > > >> > Building with Clang-7 and Shared libs (instead of default static) > >> > still no issues found. > >> > > >> > If somebody can reproduce please send an update to here and I'll > >> > attempt to replicate that setup. Right now I can't reproduce the iss= ue. > >> > >> You have to be patient, but I caught it on my laptop: > >> > > > > Ok, and now with the logs: > > > > > > # time (log=3D/tmp/$$.log; while true; do echo service_autotest |taskse= t > > -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8 -l 0-1 > >>$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f > > $log) > > EAL: Detected lcore 0 as core 0 on socket 0 > > EAL: Detected lcore 1 as core 1 on socket 0 > > EAL: Detected lcore 2 as core 2 on socket 0 > > EAL: probe driver: 8086:15d7 net_e1000_em > > EAL: Not managed by a supported kernel driver, skipped > > EAL: Module /sys/module/vfio not found! error 2 (No such file or direct= ory) > > APP: HPET is not enabled, using TSC as default timer > > RTE>>service_autotest > > + ------------------------------------------------------- + > > + Test Suite : service core test suite > > + ------------------------------------------------------- + > > + TestCase [ 0] : unregister_all succeeded > > + TestCase [ 1] : service_name succeeded > > + TestCase [ 2] : service_get_by_name succeeded > > Service dummy_service Summary > > dummy_service: stats 1 calls 0 cycles 0 avg: 0 > > Service dummy_service Summary > > dummy_service: stats 0 calls 0 cycles 0 avg: 0 > > + TestCase [ 3] : service_dump succeeded > > + TestCase [ 4] : service_attr_get succeeded > > + TestCase [ 5] : service_lcore_attr_get succeeded > > + TestCase [ 6] : service_probe_capability succeeded > > + TestCase [ 7] : service_start_stop succeeded > > + TestCase [ 8] : service_lcore_add_del skipped > > + TestCase [ 9] : service_lcore_start_stop succeeded > > EAL: Test assert service_lcore_en_dis_able line 488 failed: Ex-service > > core function call had no effect. > > + TestCase [10] : service_lcore_en_dis_able failed > > + TestCase [11] : service_mt_unsafe_poll skipped > > + TestCase [12] : service_mt_safe_poll skipped > > + TestCase [13] : service_app_lcore_mt_safe succeeded > > + TestCase [14] : service_app_lcore_mt_unsafe succeeded > > + TestCase [15] : service_may_be_active succeeded > > + ------------------------------------------------------- + > > + Test Suite Summary > > + Tests Total : 16 > > + Tests Skipped : 3 > > + Tests Executed : 16 > > + Tests Unsupported: 0 > > + Tests Passed : 12 > > + Tests Failed : 1 > > + ------------------------------------------------------- + > > Test Failed > > RTE>>EAL: request: mp_malloc_sync > > EAL: Heap on socket 0 was shrunk by 2MB > > > > real 2m42.884s > > user 5m1.902s > > sys 0m2.208s >=20 > I can confirm - takes about 1m to fail. Hi Aaron and David, I've been attempting to reproduce this, still no errors here. Given the nature of service-cores, and the difficulty to reproduce here this feels like a race-condition - one that may not exist in all binaries. Can you describe your compiler/command setup? (gcc 7.4.0 here). I'm using Meson to build, so reproducing using this instead of the command as provided above. There should be no difference in reproducing due to this= : $ meson test service_autotest --repeat 50 1/1 DPDK:fast-tests / service_autotest OK 3.86 s 1/1 DPDK:fast-tests / service_autotest OK 3.87 s ... 1/1 DPDK:fast-tests / service_autotest OK 3.84 s OK: 50 FAIL: 0 SKIP: 0 TIMEOUT: 0 I'll keep it running for a few hours but I have little faith if it only takes 1 minute on your machines... Regards, -Harry