From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 59AE3A04DD; Tue, 26 Nov 2019 14:57:26 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 876772B96; Tue, 26 Nov 2019 14:57:25 +0100 (CET) Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by dpdk.org (Postfix) with ESMTP id 06B8ECF3 for ; Tue, 26 Nov 2019 14:57:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574776643; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CF3ikwjHIfujrA0tPhwLMdj6UZrb26qjsZUEXBQP0ew=; b=J2++jv5XHsFu43cUwDEEOb48wXueHFxBNy9joemR5fP7bNcaq3V0ovLHpxmiHFbD7Qc+Eh sTqnuBWB+WTx8/CxhbmTjNT1eVu0Gc6GjLMuiIe05puwYLgvds09eoOHN99xT4YR0AyZ/p ybJxpohgSg4Y9vt/fGOVG4tdgr+Wl2g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-220-bDkN22c9OZyNtpxhPjme4A-1; Tue, 26 Nov 2019 08:57:19 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6EFD190CCFA; Tue, 26 Nov 2019 13:57:17 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.127]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E35F65C21B; Tue, 26 Nov 2019 13:57:16 +0000 (UTC) From: Aaron Conole To: "Van Haaren\, Harry" Cc: Thomas Monjalon , "Amber\, Kumar" , "dev\@dpdk.org" , "Wang\, Yipeng1" , "Yigit\, Ferruh" , "Thakur\, Sham Singh" , David Marchand References: <20191122182100.15631-1-kumar.amber@intel.com> <2900799.QLPOietlla@xps> Date: Tue, 26 Nov 2019 08:57:16 -0500 In-Reply-To: (Van Haaren's message of "Tue, 26 Nov 2019 13:29:37 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-MC-Unique: bDkN22c9OZyNtpxhPjme4A-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH v3] hash: added a new API to hash to query key id X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" "Van Haaren, Harry" writes: >> -----Original Message----- >> From: Van Haaren, Harry >> Sent: Tuesday, November 26, 2019 1:19 PM >> To: Aaron Conole ; Thomas Monjalon > > > >> > EAL: Test assert service_lcore_en_dis_able line 487 failed: Ex-service >> core >> > function call had no effect. >> > >> > So I'll spend some time in this area, it seems. >>=20 >>=20 >> The below diff makes it 100% reproducible here, failing every time. >>=20 >> It seems like the main thread is returning, before the service thread ha= s >> returned. >>=20 >> The rte_eal_mp_wait_lcore() call seems to not wait on the service-core, >> which allows >> the main thread to read the "service_remote_launch_flag" value as 0 (bef= ore >> the service-thread writes it to 1). >>=20 >> Adding the delay between the service launch and service write being >> performed makes this issue much much more likely to occur - so the above >> description I have confidence in. >>=20 >> What I'm not clear on (yet) is why the eal_mp_wait_lcore() isn't waiting= ... >>=20 >> -H >>=20 >>=20 >> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores= .c >> index 9fe38f5e0..846ad00d1 100644 >> --- a/app/test/test_service_cores.c >> +++ b/app/test/test_service_cores.c >> @@ -445,6 +445,7 @@ static int >> service_remote_launch_func(void *arg) >> { >> RTE_SET_USED(arg); >> + rte_delay_ms(100); >> service_remote_launch_flag =3D 1; >> return 0; >> } > > Diff below seems to fix the problem here; Aaron would you test the below = fix in your setup for a while too? > I have a loop running here attempting to reproduce - but before 100% fail= ures and so far 100% passes with the added wait_lcore() call. > > > diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.= c > index 9fe38f5e0..62ffedb19 100644 > --- a/app/test/test_service_cores.c > +++ b/app/test/test_service_cores.c > @@ -445,6 +445,7 @@ static int > service_remote_launch_func(void *arg) > { > RTE_SET_USED(arg); > + rte_delay_ms(100); > service_remote_launch_flag =3D 1; > return 0; > } > @@ -483,6 +484,7 @@ service_lcore_en_dis_able(void) > int ret =3D rte_eal_remote_launch(service_remote_launch_func, NUL= L, > slcore_id); > TEST_ASSERT_EQUAL(0, ret, "Ex-service core remote launch failed."= ); > + rte_eal_wait_lcore(slcore_id); > rte_eal_mp_wait_lcore(); Ahh, I see. Actually, this brings up a question - is the intent for mp_wait_lcore to cycle through the service cores as well? Because IIUC, the issue will be the lcore will be set to ROLE_RTE normally, but service cores will do: ROLE_SERVICE and then the wait cannot work. If the idea is that mp_wait_lcore should work (and looking at the test, it seems like it is the intent?) then it will need to cycle through service cores, too. If the intent is that it shouldn't, then we should remove those calls from the test application to prevent developer from misunderstanding. Either way, the documentation for `rte_service_lcore_start` is a bit too ambiguous and needs to reflect whether the mp_wait_lcore should work. I think either it should (which means updating rte_get_next_lcore to include ROLE_SERVICE), or none of the lcore functions should work, and we should have an rte_service...() equivalent that should be used. > TEST_ASSERT_EQUAL(1, service_remote_launch_flag, > "Ex-service core function call had no effect.");