From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 96E8FA0528; Sat, 18 Jul 2020 00:38:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C28971BF5F; Sat, 18 Jul 2020 00:38:56 +0200 (CEST) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by dpdk.org (Postfix) with ESMTP id 2F5C11BEDE for ; Sat, 18 Jul 2020 00:38:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1595025534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8r8tczHhd4Gw74EdkqpwOh5KBsre+YxrsbJ7Six1PeY=; b=KByr8kfEL4ARbIgavc09jizyxx3YjfD77YCyMKosFkCvHPSUro2qOjdHGI+2MAKU1ywjHp MyWW9VgRUF8Zi9nAYG3LTlaUiQvce/CnfTuOKIA8kz2CD/Xz5K9a1n1DSg+3XP95HOS5/z Tdn8h5zknv/wtV7Qeh0TseRsN9QAWrY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-52sNRyxBMcSf3QFDHro24w-1; Fri, 17 Jul 2020 18:38:50 -0400 X-MC-Unique: 52sNRyxBMcSf3QFDHro24w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5AC1A80047F; Fri, 17 Jul 2020 22:38:49 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (ovpn-117-20.rdu2.redhat.com [10.10.117.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A60657BD4E; Fri, 17 Jul 2020 22:38:44 +0000 (UTC) From: Aaron Conole To: Lukasz Wojciechowski Cc: David Marchand , Van Haaren Harry , Igor Romanov , Honnappa Nagarahalli , Phil Yang , dev , Ferruh Yigit References: <6d0bf076-c9bd-564c-20f6-ba8a49bc2389@intel.com> <88ee4323-1336-4bc7-414d-3dd7f19d684f@partner.samsung.com> Date: Fri, 17 Jul 2020 18:38:43 -0400 In-Reply-To: <88ee4323-1336-4bc7-414d-3dd7f19d684f@partner.samsung.com> (Lukasz Wojciechowski's message of "Fri, 17 Jul 2020 22:31:03 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=aconole@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] Random failure in service_autotest X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Lukasz Wojciechowski writes: > W dniu 17.07.2020 o=C2=A017:19, David Marchand pisze: >> On Fri, Jul 17, 2020 at 10:56 AM David Marchand >> wrote: >>> On Wed, Jul 15, 2020 at 12:41 PM Ferruh Yigit = wrote: >>>> On 7/15/2020 11:14 AM, David Marchand wrote: >>>>> Hello Harry and guys who touched the service code recently :-) >>>>> >>>>> I spotted a failure for the service UT in Travis: >>>>> https://travis-ci.com/github/ovsrobot/dpdk/jobs/361097992#L18697 >>>>> >>>>> I found only a single instance of this failure and tried to reproduce >>>>> it with my usual "brute" active loop with no success so far. >>>> +1, I didn't able to reproduce it in my environment but observed it in= the >>>> Travis CI. >>>> >>>>> Any chance it could be due to recent changes? >>>>> https://protect2.fireeye.com/url?k=3D70a801b3-2d7b5aa7-70a98afc-0cc47= a31ce4e-231dc7b8ee6eb8a9&q=3D1&u=3Dhttps%3A%2F%2Fgit.dpdk.org%2Fdpdk%2Fcomm= it%2F%3Fid%3Df3c256b621262e581d3edcca383df83875ab7ebe >>>>> https://protect2.fireeye.com/url?k=3D21dbcfd3-7c0894c7-21da449c-0cc47= a31ce4e-d8c6abfb03bf67f1&q=3D1&u=3Dhttps%3A%2F%2Fgit.dpdk.org%2Fdpdk%2Fcomm= it%2F%3Fid%3D048db4b6dcccaee9277ce5b4fbb2fe684b212e22 >>> I can see more occurrences of the issue in the CI. >>> I just applied the patch changing the log level for test assert, in >>> the hope it will help. >> And... we just got one with logs: >> https://travis-ci.com/github/ovsrobot/dpdk/jobs/362109882#L18948 >> >> EAL: Test assert service_lcore_attr_get line 396 failed: >> lcore_attr_get() didn't get correct loop count (zero) >> >> It looks like a race between the service core still running and the >> core resetting the loops attr. >> > Yes, it seems to be just lack of patience of the test. It should wait a= =20 > bit for lcore to stop before resetting attrs. > Something like this should help: > @@ -384,6 +384,9 @@ service_lcore_attr_get(void) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rte_service_lcore_stop(slcore= _id); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* wait for the service lcore to st= op */ > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 rte_delay_ms(200); > + > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 TEST_ASSERT_EQUAL(0, rte_serv= ice_lcore_attr_reset_all(slcore_id), > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 "Valid lcore_attr_reset_all() didn't return=20 > success"); Would an rte_eal_wait_lcore make sense? Overall, I really dislike sleeps because they can hide racy synchronization points.