From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 70E6BA034F; Mon, 11 Oct 2021 17:06:39 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0840F410DA; Mon, 11 Oct 2021 17:06:39 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 38794410D7 for ; Mon, 11 Oct 2021 17:06:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633964796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/dFunb03kzhsjGpBFOvi1dMnoEVMYZqhno3iqcO4+2w=; b=LsMfRh/SL7ggP7f+ZipGDYUSXMoyHQEDtDlFiTCo4r3IGmNcA5D4CLwgalEbfpZ914rCoJ 1CnbN5jviZhqjmQHHoRIjBarPmbOYJyDP8UYAGuAvR2/Sm78urDQ7SpXDTFx+UxU9ZpHZo OC1y9B8jtYeU3JMnCG2RaffRHXDNdBQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-558-OwxusRBOPZqdtINCf0KJRg-1; Mon, 11 Oct 2021 11:06:33 -0400 X-MC-Unique: OwxusRBOPZqdtINCf0KJRg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 89EEA100CCC0; Mon, 11 Oct 2021 15:06:32 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.32.186]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 25C4560938; Mon, 11 Oct 2021 15:06:28 +0000 (UTC) From: Aaron Conole To: David Marchand Cc: dev@dpdk.org, stable@dpdk.org, Harry van Haaren , Kevin Laatz References: <20211011145430.6587-1-david.marchand@redhat.com> Date: Mon, 11 Oct 2021 11:06:26 -0400 In-Reply-To: <20211011145430.6587-1-david.marchand@redhat.com> (David Marchand's message of "Mon, 11 Oct 2021 16:54:30 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=aconole@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Subject: Re: [dpdk-dev] [PATCH] test/service: fix race in attr check X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" David Marchand writes: > The CI reported rare (and cryptic) failures like: > > RTE>>service_autotest > + ------------------------------------------------------- + > + Test Suite : service core test suite > + ------------------------------------------------------- + > + TestCase [ 0] : unregister_all succeeded > + TestCase [ 1] : service_name succeeded > + TestCase [ 2] : service_get_by_name succeeded > Service dummy_service Summary > dummy_service: stats 1 calls 0 cycles 0 avg: 0 > Service dummy_service Summary > dummy_service: stats 0 calls 0 cycles 0 avg: 0 > + TestCase [ 3] : service_dump succeeded > + TestCase [ 4] : service_attr_get failed > + TestCase [ 5] : service_lcore_attr_get succeeded > + TestCase [ 6] : service_probe_capability succeeded > + TestCase [ 7] : service_start_stop succeeded > + TestCase [ 8] : service_lcore_add_del succeeded > + TestCase [ 9] : service_lcore_start_stop succeeded > + TestCase [10] : service_lcore_en_dis_able succeeded > + TestCase [11] : service_mt_unsafe_poll succeeded > + TestCase [12] : service_mt_safe_poll succeeded > perf test for MT Safe: 42.7 cycles per call > + TestCase [13] : service_app_lcore_mt_safe succeeded > perf test for MT Unsafe: 73.3 cycles per call > + TestCase [14] : service_app_lcore_mt_unsafe succeeded > + TestCase [15] : service_may_be_active succeeded > + TestCase [16] : service_active_two_cores succeeded > + ------------------------------------------------------- + > + Test Suite Summary : service core test suite > + ------------------------------------------------------- + > + Tests Total : 17 > + Tests Skipped : 0 > + Tests Executed : 17 > + Tests Unsupported: 0 > + Tests Passed : 16 > + Tests Failed : 1 > + ------------------------------------------------------- + > Test Failed > RTE>> > stderr: > EAL: Detected CPU lcores: 16 > EAL: Detected NUMA nodes: 2 > EAL: Detected static linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available 1048576 kB hugepages reported > EAL: VFIO support initialized > EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0 > APP: HPET is not enabled, using TSC as default timer > EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't > get call count (zero) > > According to API, trying to stop a service lcore is not possible if this > lcore is the only one associated to a service. > Doing this will result in a -EBUSY return code from > rte_service_lcore_stop() which the service_attr_get subtest was not > checking. > This left the service lcore running, and a race existed with the main > lcore on checking the service attributes which triggered this CI > failure. > > To fix this, dissociate the service lcore with current service. > > Once fixed this first issue, a race still exists, because the > wait_slcore_inactive helper added in a previous fix was not > paired with a check that the service lcore _did_ stop. > > Add missing check on rte_service_lcore_may_be_active. > > Fixes: 4d55194d76a4 ("service: add attribute get function") > Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore") > Cc: stable@dpdk.org > > Signed-off-by: David Marchand > --- Excellent catch. Acked-by: Aaron Conole