From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 60496A0540; Mon, 20 Jul 2020 14:52:02 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D026529CB; Mon, 20 Jul 2020 14:52:01 +0200 (CEST) Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id EA12B1023 for ; Mon, 20 Jul 2020 14:52:00 +0200 (CEST) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20200720125200euoutp012ab46d540e22bb46d5fae285475282cb~jdvyjB3aD2075320753euoutp01E for ; Mon, 20 Jul 2020 12:52:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20200720125200euoutp012ab46d540e22bb46d5fae285475282cb~jdvyjB3aD2075320753euoutp01E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1595249520; bh=dih+5fa+O0dvGCMx3EbmUpAmJAOHKZNi1naA5+bIUrM=; h=Subject:To:Cc:From:Date:In-Reply-To:References:From; b=pqTegOfq72wA0X33vXTdQWhV9BL9lCNLdOVbVwpGESAChRwp/vzQh16UBO+hyeIru +6K5feZX2KtOaozp0w7vz6X0kM8x/7y5+GH0zClKPOb2NQtwQJyyCU0RoXzvGkBOJP j2Kfc5sgQsSOCEs90e0Kdxb87SOZgrF5lc/JHUQo= Received: from eusmges1new.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20200720125200eucas1p1d7f4b59a9380f226814251baf7ffcc03~jdvyWB4633096630966eucas1p1S; Mon, 20 Jul 2020 12:52:00 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges1new.samsung.com (EUCPMTA) with SMTP id E1.AE.06456.073951F5; Mon, 20 Jul 2020 13:52:00 +0100 (BST) Received: from eusmtrp1.samsung.com (unknown [182.198.249.138]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20200720125159eucas1p217be2baca613a022ae05d6ba59c6e1eb~jdvx0rmJ81998619986eucas1p2V; Mon, 20 Jul 2020 12:51:59 +0000 (GMT) Received: from eusmgms2.samsung.com (unknown [182.198.249.180]) by eusmtrp1.samsung.com (KnoxPortal) with ESMTP id 20200720125159eusmtrp1e2fe1cf6d523448d7178def59c44e054~jdvxz6F1l1012410124eusmtrp1a; Mon, 20 Jul 2020 12:51:59 +0000 (GMT) X-AuditID: cbfec7f2-7efff70000001938-b5-5f159370951c Received: from eusmtip1.samsung.com ( [203.254.199.221]) by eusmgms2.samsung.com (EUCPMTA) with SMTP id 09.13.06017.F63951F5; Mon, 20 Jul 2020 13:51:59 +0100 (BST) Received: from [106.210.88.70] (unknown [106.210.88.70]) by eusmtip1.samsung.com (KnoxPortal) with ESMTPA id 20200720125158eusmtip1f731f147c7798be296ef6b04a2cc59ae~jdvw6RZSG0037600376eusmtip1e; Mon, 20 Jul 2020 12:51:58 +0000 (GMT) To: Harry van Haaren , dev@dpdk.org Cc: david.marchand@redhat.com, igor.romanov@oktetlabs.ru, honnappa.nagarahalli@arm.com, ferruh.yigit@intel.com, nd@arm.com, aconole@redhat.com From: Lukasz Wojciechowski Message-ID: Date: Mon, 20 Jul 2020 14:51:56 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200720120938.34660-1-harry.van.haaren@intel.com> Content-Transfer-Encoding: 8bit Content-Language: en-US X-Brightmail-Tracker: H4sIAAAAAAAAA01SfyyUcRzue+97d6+bs6/DfKbGXPlDFln+eDcRLbota/6prV+uw7tjzrE7 RLPF6AdpCWXOhukHMxxNzuTIXZLdtVYRE2WNK78ionFS7l6W/57P83k+3+d5ti9FiFq4HlSi Mo1RKWUKMU9Atr9ae3sotdRNerjV6Eivz03waV19IY/+saTj0GN6E5/OvbtK0hVT+XzaUJpE m+uKiDBK0ljViCTrNY+5kodd0xzJ8MoMV7LQPcSL5p4XHI1nFIkZjCog9LIgocXYRqZ2e2be 3HjBy0FlUIgcKMBBYF7PJ21YhOsRNPXGFiLBFv6F4Of1Dzx2WEagX7Tydy6sA018dlGHINc4 RLDDPILO1V7CpnLBEfD6j5ljw674GCwPmjg2EYFLEFimGu1P8XAIvKxY4dqwEEdCnb5wS0RR JPYB/TdvG+2GY0A7reOwEmcYqJi0Z3XAYaDLH7OfEtgL8p5VEix2h9HJarsXYAMfvjY0c9jY J6BjtgWx2AVm+tu26+wDU2kRyR60IxiyriF26EEwfKd+WxUMxk0rz5aOwL6g7Qxg6XCYM7/j 22jATjAy78yGcIKS9nKCpYVw64aIVfvDVNF9tGO70TRJFiOxZlc1za46ml11NP99axDZgNyZ dHWynFEHKpkr/mpZsjpdKfePS0l+irb+kWmzf6kDrbyPNSBMIbGjcPG2m1TElWWos5INCChC 7Co8/sYUIxLGy7KuMqoUqSpdwagNaC9Fit2FR2qnL4mwXJbGJDFMKqPa2XIoB48cdI774JMv ZzZi/XnuhQyL/nOAJ/P3iXZu0CL1k8dmf5cox/NA0ayPLlB1lnm3xhkfaQ6UjbfRDX3XMoO/ eBV4nao9HR7ycWSiazDKtWetSru/Mk9nqd3Tp0krHq/UbQbRUc73ZHP4wFlrts/o74Xyk2mR ocVav4s5bo4L1TVnxsSkOkEWeJBQqWX/AMnqXmxDAwAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrEIsWRmVeSWpSXmKPExsVy+t/xu7r5k0XjDXruKVn8evOA3WL7ii42 i3eftjNZ3Nl7mt2isf8bi8XMpy3sFocmZ1ucWd7D7MDhsWbeGkaPXwuWsnos3vOSyeP611es Hu/3XWULYI3SsynKLy1JVcjILy6xVYo2tDDSM7S00DMysdQzNDaPtTIyVdK3s0lJzcksSy3S t0vQy9hweAtLwT65ivY/B9gaGKdIdDFyckgImEj8PrmWvYuRi0NIYCmjxOVTKxm7GDmAEjIS Hy4JQNQIS/y51sUGUfOaUeL+4aVMIAlhAVeJE3/PgNkiAvYSn6+cZgIpYhaYxCjxr3MBK0TH ZEaJk2/Ws4NUsQnYShyZ+ZUVxOYVcJNYvreLCWQbi4CqxN7niiBhUYE4ieVb5rNDlAhKnJz5 hAXE5hRwkNjecgeslVnATGLe5ofMELa8RPPW2VC2uMStJ/OZJjAKzULSPgtJyywkLbOQtCxg ZFnFKJJaWpybnltspFecmFtcmpeul5yfu4kRGH/bjv3csoOx613wIUYBDkYlHt4P3aLxQqyJ ZcWVuYcYJTiYlUR4nc6ejhPiTUmsrEotyo8vKs1JLT7EaAr020RmKdHkfGBqyCuJNzQ1NLew NDQ3Njc2s1AS5+0QOBgjJJCeWJKanZpakFoE08fEwSnVwHh1Z+W1oFPK77n597wL/xT96Zj1 zpvVR/sWPfz3p21r+/k/jTpsr2KmNJ989eZ/r4zHJ4VFGxz+lpxbcPzx6hKpE6t/uLzMendj 6/lq5pUWDuk35KK+/s+qjFs34eaXvb9aXtYLa95yOvP98CPzK0YXr06UMg6O6Ev5IuJwJW+P ssuxA0LtjSu9lFiKMxINtZiLihMBmu1j9NUCAAA= X-CMS-MailID: 20200720125159eucas1p217be2baca613a022ae05d6ba59c6e1eb X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20200720120850eucas1p10debcafa273d244a7e63d225c50cc9df X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20200720120850eucas1p10debcafa273d244a7e63d225c50cc9df References: <20200720120938.34660-1-harry.van.haaren@intel.com> Subject: Re: [dpdk-dev] [PATCH] service: fix stop API to wait for service thread X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" W dniu 20.07.2020 o 14:09, Harry van Haaren pisze: > This commit improves the service_lcore_stop() implementation, > waiting for the service core in question to return. The service > thread itself now has a variable to indicate if its thread is > active. When zero the service thread has completed its service, > and has returned from the service_runner_func() function. > > This fixes a race condition observed in the DPDK CI, where the > statistics of the service were not consistent with the expectation > due to the service thread still running, and incrementing a stat > after stop was called. > > Signed-off-by: Harry van Haaren > > --- > > This is one possible solution, that avoids a class of race-conditions > based on stop() api and following behaviours. Without a change in > implementation of the service core thread, we could not detect when > the thread was actually finished. This is now possible, and the stop > api makes use of it to wait for 1000x one millisecond, or log a warning > that a service core didn't return quickly. > > Thanks for the discussion/debug on list - I'm not sure how to add > reported-by/suggested-by etc tags: but I'll resend a V2 (or can add > on apply). > > --- > lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c > index 6a0e0ff65..d2255587d 100644 > --- a/lib/librte_eal/common/rte_service.c > +++ b/lib/librte_eal/common/rte_service.c > @@ -65,6 +65,7 @@ struct core_state { > /* map of services IDs are run on this core */ > uint64_t service_mask; > uint8_t runstate; /* running or stopped */ > + uint8_t thread_active; /* indicates when the thread is in service_run() */ > uint8_t is_service_core; /* set if core is currently a service core */ > uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX]; > uint64_t loops; > @@ -457,6 +458,8 @@ service_runner_func(void *arg) > const int lcore = rte_lcore_id(); > struct core_state *cs = &lcore_states[lcore]; > > + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED); > + > /* runstate act as the guard variable. Use load-acquire > * memory order here to synchronize with store-release > * in runstate update functions. > @@ -475,6 +478,7 @@ service_runner_func(void *arg) > cs->loops++; > } > > + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED); > return 0; > } > > @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore) > __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED, > __ATOMIC_RELEASE); > > + /* wait for service lcore to return */ > + i = 0; > + uint8_t active; > + uint64_t start = rte_rdtsc(); > + do { > + active = __atomic_load_n(&lcore_states[lcore].thread_active, > + __ATOMIC_RELAXED); > + if (active == 0) > + break; > + rte_delay_ms(1); > + i++; > + } while (i < 1000); > + > + if (active != 0) { > + uint64_t end = rte_rdtsc(); > + RTE_LOG(WARNING, EAL, > + "service lcore stop() failed, waited for %ld cycles\n", > + end - start); > + } > + > return 0; > } > I don't like the idea of inserting this polling loop inside API call. And I don't like setting up a 1000 iterations constraint. How about keeping the thread_active flag, but moving checking state of this flag to separate function. This way the user of the API would be able to write own loop. Maybe he/she would like a custom loop, because: * waiting for more cores * would like to wait longer * would like to check if service is finished less often... -- Lukasz Wojciechowski Principal Software Engineer Samsung R&D Institute Poland Samsung Electronics Office +48 22 377 88 25 l.wojciechow@partner.samsung.com