From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 68C22A0540; Mon, 20 Jul 2020 14:08:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9F48229CB; Mon, 20 Jul 2020 14:08:48 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 4117325B3 for ; Mon, 20 Jul 2020 14:08:46 +0200 (CEST) IronPort-SDR: DNvr1Yz2Iv5O9a+SBCXrLjP+B/ffDAL+94g7h0tGKTJnV6thRaoNKt44INISkdeLt7SA0j4oYe tiJKxM6/XPuA== X-IronPort-AV: E=McAfee;i="6000,8403,9687"; a="234747612" X-IronPort-AV: E=Sophos;i="5.75,374,1589266800"; d="scan'208";a="234747612" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2020 05:08:45 -0700 IronPort-SDR: ybks7pDfn5c57GDM30InwQYQaL5VPFZG8NJRag2fpy0biwt+4iAs1Bec5Ikbs8ZzccYxZRpiNy 5llmuRN8unzg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,374,1589266800"; d="scan'208";a="487703130" Received: from silpixa00399779.ir.intel.com (HELO silpixa00399779.ger.corp.intel.com) ([10.237.222.209]) by fmsmga005.fm.intel.com with ESMTP; 20 Jul 2020 05:08:43 -0700 From: Harry van Haaren To: dev@dpdk.org Cc: david.marchand@redhat.com, igor.romanov@oktetlabs.ru, honnappa.nagarahalli@arm.com, ferruh.yigit@intel.com, nd@arm.com, aconole@redhat.com, l.wojciechow@partner.samsung.com, Harry van Haaren Date: Mon, 20 Jul 2020 13:09:38 +0100 Message-Id: <20200720120938.34660-1-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [dpdk-dev] [PATCH] service: fix stop API to wait for service thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This commit improves the service_lcore_stop() implementation, waiting for the service core in question to return. The service thread itself now has a variable to indicate if its thread is active. When zero the service thread has completed its service, and has returned from the service_runner_func() function. This fixes a race condition observed in the DPDK CI, where the statistics of the service were not consistent with the expectation due to the service thread still running, and incrementing a stat after stop was called. Signed-off-by: Harry van Haaren --- This is one possible solution, that avoids a class of race-conditions based on stop() api and following behaviours. Without a change in implementation of the service core thread, we could not detect when the thread was actually finished. This is now possible, and the stop api makes use of it to wait for 1000x one millisecond, or log a warning that a service core didn't return quickly. Thanks for the discussion/debug on list - I'm not sure how to add reported-by/suggested-by etc tags: but I'll resend a V2 (or can add on apply). --- lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index 6a0e0ff65..d2255587d 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -65,6 +65,7 @@ struct core_state { /* map of services IDs are run on this core */ uint64_t service_mask; uint8_t runstate; /* running or stopped */ + uint8_t thread_active; /* indicates when the thread is in service_run() */ uint8_t is_service_core; /* set if core is currently a service core */ uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX]; uint64_t loops; @@ -457,6 +458,8 @@ service_runner_func(void *arg) const int lcore = rte_lcore_id(); struct core_state *cs = &lcore_states[lcore]; + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED); + /* runstate act as the guard variable. Use load-acquire * memory order here to synchronize with store-release * in runstate update functions. @@ -475,6 +478,7 @@ service_runner_func(void *arg) cs->loops++; } + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED); return 0; } @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore) __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED, __ATOMIC_RELEASE); + /* wait for service lcore to return */ + i = 0; + uint8_t active; + uint64_t start = rte_rdtsc(); + do { + active = __atomic_load_n(&lcore_states[lcore].thread_active, + __ATOMIC_RELAXED); + if (active == 0) + break; + rte_delay_ms(1); + i++; + } while (i < 1000); + + if (active != 0) { + uint64_t end = rte_rdtsc(); + RTE_LOG(WARNING, EAL, + "service lcore stop() failed, waited for %ld cycles\n", + end - start); + } + return 0; } -- 2.17.1