From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5261DA04DC for ; Sat, 17 Oct 2020 05:34:51 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2E4C2E2EE; Sat, 17 Oct 2020 05:34:50 +0200 (CEST) Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id EC02AE2D8 for ; Sat, 17 Oct 2020 05:34:46 +0200 (CEST) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20201017033435euoutp018820394e4afec7125508c4e894b151f5~_qjglRHja3000330003euoutp01c for ; Sat, 17 Oct 2020 03:34:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20201017033435euoutp018820394e4afec7125508c4e894b151f5~_qjglRHja3000330003euoutp01c DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1602905675; bh=aKULN5yoddeJK4yElkDEk332ChFSRLbgkye9vU3EH/Y=; h=Subject:To:Cc:From:Date:In-Reply-To:References:From; b=nZpg4A7KTHTIdSwf5Lg0wPi3JXaS7tLTbjmVyQzAxNE/IRgxsddNUH5xAOkTaQyv5 X1YHFRGHy8aGH9Jf38ZGvxt+P5ftiRRUsNQiJ/CQJR8m77clFh9BzYeG0kp++Lw2BS JNyjklapI5TVCpixOVAM22f8wHN04oJd+MWMLTZE= Received: from eusmges1new.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20201017033429eucas1p111c850bc4cfedf3e05c674e073ecb329~_qja1J-f90379903799eucas1p12; Sat, 17 Oct 2020 03:34:29 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges1new.samsung.com (EUCPMTA) with SMTP id 44.19.06456.5466A8F5; Sat, 17 Oct 2020 04:34:29 +0100 (BST) Received: from eusmtrp1.samsung.com (unknown [182.198.249.138]) by eucas1p1.samsung.com (KnoxPortal) with ESMTPA id 20201017033428eucas1p1dcff40c52451984c8f41c52c7008ba93~_qjZrEA4B0381903819eucas1p1x; Sat, 17 Oct 2020 03:34:28 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp1.samsung.com (KnoxPortal) with ESMTP id 20201017033428eusmtrp19c12307aa7090f03b10ba704bf2ce0ba~_qjZqdCkc1383213832eusmtrp1K; Sat, 17 Oct 2020 03:34:28 +0000 (GMT) X-AuditID: cbfec7f2-7efff70000001938-b6-5f8a66456fa9 Received: from eusmtip1.samsung.com ( [203.254.199.221]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id CC.A9.06314.3466A8F5; Sat, 17 Oct 2020 04:34:28 +0100 (BST) Received: from [106.210.88.70] (unknown [106.210.88.70]) by eusmtip1.samsung.com (KnoxPortal) with ESMTPA id 20201017033427eusmtip1000f75751e184e87465258c27b859b7b~_qjY1_rTI0424404244eusmtip1E; Sat, 17 Oct 2020 03:34:27 +0000 (GMT) To: Honnappa Nagarahalli , David Hunt , Bruce Richardson Cc: "dev@dpdk.org" , "stable@dpdk.org" , nd , David Marchand , "\"'Lukasz Wojciechowski'\"," From: Lukasz Wojciechowski Message-ID: <65361650-0f31-c128-11a4-da2170e2fcf3@partner.samsung.com> Date: Sat, 17 Oct 2020 05:34:26 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: Content-Transfer-Encoding: 8bit Content-Language: en-US X-Brightmail-Tracker: H4sIAAAAAAAAA01Sa0iTYRjl3fdtfq4mr9PwUYtgWGGYJkoMrEztMoIoQSLE23SfF3RTNrUM DSEpW7N0JraJZYoo5gUt51pmuC7eR4Y3AiNJ8xIqaRZOm7V9k/x3znnO857nwEsR/Fa2B5Ui y6TlMnGagMMlde/XTUfOJCpjj+qe8oQTDcHCe+ofLGFHvZIjXFrpYAk1MwUOwm+qZiQcrFMR QkvhpsMpStT4qBGJzFW1bFFN5zxL1L3UyRItd41yLrEjuccldFpKNi33OxnHTf7Q30NkfKWv mbfKUD6qCFciRwpwIFQPqFlKxKX4uB7BSEMXyZCfCH7f19vJKoIhw6rD9krxZDfbivm4DsHL kmTGtIigpKsAKRFFueArUN57yKq74rsI1EODyEoI3IdAYxpG1m0OPgFvNWu2l3j4LJhNlRwr JvEBmOr8ZUvbg2OguLWAZDzO0KeZtmFHHA3m2i0bJvB+uNleQTDYDT5NP7YVAmx0gIdFlYg5 +zQ8KClhM9gFFnqe2+vshYFSFcks6BCMbqwjhrxGMF5Ub3cFwRvLBsfajcDe0GLwY+QQMG7O EVYZsBNMLDozRziBWldul3lQeIvPuH1hRlWGtmM3m6bJYiTQ7qim3VFHu6OO9n9uFSIbkBud pZAm0Qp/GX3VVyGWKrJkSb4J6dI29O8jDVh6VvRo7WO8EWEKCXbzqNo7sXy2OFuRIzUioAiB Ky90aCCGz5OIc67T8vRYeVYarTAiT4oUuPECquej+ThJnEmn0nQGLd+esihHj3x022P5XOdY nrRo17v1Yf2fiNrpqFlLWNuCJC3gglfzmv7x+Pe47MBjee7ZUQs+7Q25c26eETVcyYha0rLG Dn+1oAtt8g6NjAj2MGz0R/XtczaElIXFJ0xOzT1JvmFKPSjzUuV+zjWFjbmrNKqLL7jnBZd7 y57leyd+cQ4q5c36JApIRbLY/zAhV4j/AjiN8HNEAwAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrIIsWRmVeSWpSXmKPExsVy+t/xu7ouaV3xBqfVLG6ssrfom/SRyWL7 ii42i3eftjNZzHzawm7xrGcdo8WZ5T3MFv86/rA7cHismbeG0ePXgqWsHov3vGTyOPhuD5PH +31X2QJYo/RsivJLS1IVMvKLS2yVog0tjPQMLS30jEws9QyNzWOtjEyV9O1sUlJzMstSi/Tt EvQyLpw6zlzwOLXi1/+pjA2MswO7GDk5JARMJCbcPcjaxcjFISSwlFHi/ezDbF2MHEAJGYkP lwQgaoQl/lzrYgOxhQReM0p0NTmDlAgLREhMP6EO0ioi0Msosb1rLhOIwyxwklFi27s2Joih B1kl5u9ewQjSzSZgK3Fk5ldWEJtXwE3i17m5YFNZBFQlHu75xg5iiwrESfyY2MsGUSMocXLm ExYQm1MgVuLX0v9gNrOAmcS8zQ+ZIWx5ieats6FscYlbT+YzTWAUmoWkfRaSlllIWmYhaVnA yLKKUSS1tDg3PbfYUK84Mbe4NC9dLzk/dxMjMPq2Hfu5eQfjpY3BhxgFOBiVeHg3LOqMF2JN LCuuzD3EKMHBrCTC63T2dJwQb0piZVVqUX58UWlOavEhRlOg5yYyS4km5wMTQ15JvKGpobmF paG5sbmxmYWSOG+HwMEYIYH0xJLU7NTUgtQimD4mDk6pBkb/PJUz347zLPj5quYUd3Dlr9qt OorHsjTmRPwt5+iOjDgxZ3/YLKaEiNJn6+ws7jSetn4pvvijzu0Vftc2rC59I8QwY/EH4Wv6 /zKceqfkxWVziwe1f3qx4GXsD1PXpnef89JXaC65/ZLDr4J74dKg0K+LPcr22vA+nn7+xWL7 mjlsfwrlam2UWIozEg21mIuKEwH6zbTq1AIAAA== X-CMS-MailID: 20201017033428eucas1p1dcff40c52451984c8f41c52c7008ba93 X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20200925224217eucas1p1bb5f73109b4aeed8f2badf311fa8dfb5 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20200925224217eucas1p1bb5f73109b4aeed8f2badf311fa8dfb5 References: <20200923132541.21417-1-l.wojciechow@partner.samsung.com> <20200925224209.12173-1-l.wojciechow@partner.samsung.com> <20200925224209.12173-3-l.wojciechow@partner.samsung.com> <572a9ba1-3141-ff11-b283-74a81c7ddc89@partner.samsung.com> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4 2/8] test/distributor: synchronize lcores statistics X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" W dniu 16.10.2020 o 17:42, Honnappa Nagarahalli pisze: > > >> W dniu 16.10.2020 o 14:43, Lukasz Wojciechowski pisze: >>> Hi Honnappa, >>> >>> Thank you for your answer. >>> In the current v7 version I followed your advise and used RELAXED memory >> model. >>> And it works without any issues. I guess after fixing other issues found >> since v4 the distributor works more stable. >>> I didn't have time to rearrange all tests in the way I proposed, but I guess if >> they work like this it's not a top priority. > Agree, not a top priority. > >>> Can you give an ack on the series? I believe David Marchand is waiting for >> your opinion to process it. >> I'm sorry I didn't see your other comments. I'll try to fix them them today. > No problem, I can review the next series quickly. So it's already there, the v8. but you have one patch more to look at :) I was ready to submit new version, but I run some tests ... and one of executions hanged, so I attached myself with gdb and found one issue more. > >>> Best regards >>> Lukasz >>> >>> W dniu 16.10.2020 o 07:43, Honnappa Nagarahalli pisze: >>>> >>>> >>>>> Hi Honnappa, >>>>> >>>>> Many thanks for the review! >>>>> >>>>> I'll write my answers here not inline as it would be easier to read >>>>> them in one place, I think. >>>>> So first of all I agree with you in 2 things: >>>>> 1) all uses of statistics must be atomic and lack of that caused >>>>> most of the problems >>>>> 2) it would be better to replace barrier and memset in >>>>> clear_packet_count() with atomic stores as you suggested >>>>> >>>>> So I will apply both of above. >>>>> >>>>> However I wasn't not fully convinced on changing acquire/release to >> relaxed. >>>>> It wood be perfectly ok if it would look like in this Herb Sutter's example: >>>>> https://youtu.be/KeLBd2[] EJLOU?t=4170 But in his case the counters >>>>> are cleared before worker threads start and are printout after they >>>>> are completed. >>>>> >>>>> In case of the dpdk distributor tests both worker and main cores are >>>>> running at the same time. In the sanity_test, the statistics are >>>>> cleared and verified few times for different hashes of packages. The >>>>> worker cores are not stopped at this time and they continue their loops >> in handle procedure. >>>>> Verification made in main core is an exchange of data as the current >>>>> statistics indicate how the test will result. >>>> Agree. The key point we have to note is that the data that is exchanged >> between the two threads is already atomic (handled_packets is atomic). >>>>> So as I wasn't convinced, I run some tests with both both relaxed >>>>> and acquire/release modes and they both fail :( The failures caused >>>>> by statistics errors to number of tests ratio for >>>>> 200000 tests was: >>>>> for relaxed: 0,000790562 >>>>> for acq/rel: 0,000091321 >>>>> >>>>> >>>>> That's why I'm going to modify tests in such way, that they would: >>>>> 1) clear statistics >>>>> 2) launch worker threads >>>>> 3) run test >>>>> 4) wait for workers procedures to complete >>>>> 5) check stats, verify results and print them out >>>>> >>>>> This way worker main core will use (clear or verify) stats only when >>>>> there are no worker threads. This would make things simpler and >>>>> allowing to focus on testing the distributor not tests. And of >>>>> course relaxed mode would be enough! >>>> Agree, this would be the only way to ensure that the main thread sees >>>> the correct statistics (just like in the video) >>>> >>>>> Best regards >>>>> Lukasz >>>>> >>>>> >>>>> W dniu 29.09.2020 o 07:49, Honnappa Nagarahalli pisze: >>>>>> >>>>>> >>>>>>> Statistics of handled packets are cleared and read on main lcore, >>>>>>> while they are increased in workers handlers on different lcores. >>>>>>> >>>>>>> Without synchronization occasionally showed invalid values. >>>>>>> This patch uses atomic acquire/release mechanisms to synchronize. >>>>>> In general, load-acquire and store-release memory orderings are >>>>>> required >>>>> while synchronizing data (that cannot be updated atomically) between >>>>> threads. In the situation, making counters atomic is enough. >>>>>>> Fixes: c3eabff124e6 ("distributor: add unit tests") >>>>>>> Cc: bruce.richardson@intel.com >>>>>>> Cc: stable@dpdk.org >>>>>>> >>>>>>> Signed-off-by: Lukasz Wojciechowski >>>>>>> >>>>>>> Acked-by: David Hunt >>>>>>> --- >>>>>>> app/test/test_distributor.c | 39 >>>>>>> ++++++++++++++++++++++++----------- >>>>> -- >>>>>>> 1 file changed, 26 insertions(+), 13 deletions(-) >>>>>>> >>>>>>> diff --git a/app/test/test_distributor.c >>>>>>> b/app/test/test_distributor.c index >>>>>>> 35b25463a..0e49e3714 100644 >>>>>>> --- a/app/test/test_distributor.c >>>>>>> +++ b/app/test/test_distributor.c >>>>>>> @@ -43,7 +43,8 @@ total_packet_count(void) { >>>>>>> unsigned i, count = 0; >>>>>>> for (i = 0; i < worker_idx; i++) >>>>>>> - count += worker_stats[i].handled_packets; >>>>>>> + count += >>>>>>> __atomic_load_n(&worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE); >>>>>> RELAXED memory order is sufficient. For ex: the worker threads are >>>>>> not >>>>> 'releasing' any data that is not atomically updated to the main thread. >>>>>>> return count; >>>>>>> } >>>>>>> >>>>>>> @@ -52,6 +53,7 @@ static inline void >>>>>>> clear_packet_count(void) >>>>>>> { >>>>>>> memset(&worker_stats, 0, sizeof(worker_stats)); >>>>>>> + rte_atomic_thread_fence(__ATOMIC_RELEASE); >>>>>> Ideally, the counters should be set to 0 atomically rather than >>>>>> using a >>>>> memset. >>>>>>> } >>>>>>> >>>>>>> /* this is the basic worker function for sanity test @@ -72,13 >>>>>>> +74,13 @@ handle_work(void *arg) >>>>>>> num = rte_distributor_get_pkt(db, id, buf, buf, num); >>>>>>> while (!quit) { >>>>>>> >> __atomic_fetch_add(&worker_stats[id].handled_packets, >>>>>>> num, >>>>>>> - __ATOMIC_RELAXED); >>>>>>> + __ATOMIC_ACQ_REL); >>>>>> Using the __ATOMIC_ACQ_REL order does not mean anything to the >> main >>>>> thread. The main thread might still see the updates from different >>>>> threads in different order. >>>>>>> count += num; >>>>>>> num = rte_distributor_get_pkt(db, id, >>>>>>> buf, buf, num); >>>>>>> } >>>>>>> __atomic_fetch_add(&worker_stats[id].handled_packets, >> num, >>>>>>> - __ATOMIC_RELAXED); >>>>>>> + __ATOMIC_ACQ_REL); >>>>>> Same here, do not see why this change is required. >>>>>> >>>>>>> count += num; >>>>>>> rte_distributor_return_pkt(db, id, buf, num); >>>>>>> return 0; >>>>>>> @@ -134,7 +136,8 @@ sanity_test(struct worker_params *wp, struct >>>>>>> rte_mempool *p) >>>>>>> >>>>>>> for (i = 0; i < rte_lcore_count() - 1; i++) >>>>>>> printf("Worker %u handled %u packets\n", i, >>>>>>> - worker_stats[i].handled_packets); >>>>>>> + >> __atomic_load_n(&worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE)); >>>>>> __ATOMIC_RELAXED is enough. >>>>>> >>>>>>> printf("Sanity test with all zero hashes done.\n"); >>>>>>> >>>>>>> /* pick two flows and check they go correctly */ @@ -159,7 >>>>>>> +162,9 @@ sanity_test(struct worker_params *wp, struct >> rte_mempool >>>>>>> *p) >>>>>>> >>>>>>> for (i = 0; i < rte_lcore_count() - 1; i++) >>>>>>> printf("Worker %u handled %u packets\n", i, >>>>>>> - >> worker_stats[i].handled_packets); >>>>>>> + __atomic_load_n( >>>>>>> + >> &worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE)); >>>>>> __ATOMIC_RELAXED is enough >>>>>> >>>>>>> printf("Sanity test with two hash values done\n"); >>>>>>> } >>>>>>> >>>>>>> @@ -185,7 +190,8 @@ sanity_test(struct worker_params *wp, struct >>>>>>> rte_mempool *p) >>>>>>> >>>>>>> for (i = 0; i < rte_lcore_count() - 1; i++) >>>>>>> printf("Worker %u handled %u packets\n", i, >>>>>>> - worker_stats[i].handled_packets); >>>>>>> + >> __atomic_load_n(&worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE)); >>>>>> __ATOMIC_RELAXED is enough >>>>>> >>>>>>> printf("Sanity test with non-zero hashes done\n"); >>>>>>> >>>>>>> rte_mempool_put_bulk(p, (void *)bufs, BURST); @@ -280,15 >>>>>>> +286,17 @@ handle_work_with_free_mbufs(void *arg) >>>>>>> buf[i] = NULL; >>>>>>> num = rte_distributor_get_pkt(d, id, buf, buf, num); >>>>>>> while (!quit) { >>>>>>> - worker_stats[id].handled_packets += num; >>>>>>> count += num; >>>>>>> + >> __atomic_fetch_add(&worker_stats[id].handled_packets, >>>>>>> num, >>>>>>> + __ATOMIC_ACQ_REL); >>>>>> IMO, the problem would be the non-atomic update of the statistics. >>>>>> So, __ATOMIC_RELAXED is enough >>>>>> >>>>>>> for (i = 0; i < num; i++) >>>>>>> rte_pktmbuf_free(buf[i]); >>>>>>> num = rte_distributor_get_pkt(d, >>>>>>> id, buf, buf, num); >>>>>>> } >>>>>>> - worker_stats[id].handled_packets += num; >>>>>>> count += num; >>>>>>> + __atomic_fetch_add(&worker_stats[id].handled_packets, >> num, >>>>>>> + __ATOMIC_ACQ_REL); >>>>>> Same here, the problem is non-atomic update of the statistics, >>>>> __ATOMIC_RELAXED is enough. >>>>>> Similarly, for changes below, __ATOMIC_RELAXED is enough. >>>>>> >>>>>>> rte_distributor_return_pkt(d, id, buf, num); >>>>>>> return 0; >>>>>>> } >>>>>>> @@ -363,8 +371,9 @@ handle_work_for_shutdown_test(void *arg) >>>>>>> /* wait for quit single globally, or for worker zero, wait >>>>>>> * for zero_quit */ >>>>>>> while (!quit && !(id == zero_id && zero_quit)) { >>>>>>> - worker_stats[id].handled_packets += num; >>>>>>> count += num; >>>>>>> + >> __atomic_fetch_add(&worker_stats[id].handled_packets, >>>>>>> num, >>>>>>> + __ATOMIC_ACQ_REL); >>>>>>> for (i = 0; i < num; i++) >>>>>>> rte_pktmbuf_free(buf[i]); >>>>>>> num = rte_distributor_get_pkt(d, @@ -379,10 >> +388,11 @@ >>>>>>> handle_work_for_shutdown_test(void *arg) >>>>>>> >>>>>>> total += num; >>>>>>> } >>>>>>> - worker_stats[id].handled_packets += num; >>>>>>> count += num; >>>>>>> returned = rte_distributor_return_pkt(d, id, buf, num); >>>>>>> >>>>>>> + __atomic_fetch_add(&worker_stats[id].handled_packets, >> num, >>>>>>> + __ATOMIC_ACQ_REL); >>>>>>> if (id == zero_id) { >>>>>>> /* for worker zero, allow it to restart to pick up last >> packet >>>>>>> * when all workers are shutting down. >>>>>>> @@ -394,10 +404,11 @@ handle_work_for_shutdown_test(void *arg) >>>>>>> id, buf, buf, num); >>>>>>> >>>>>>> while (!quit) { >>>>>>> - worker_stats[id].handled_packets += num; >>>>>>> count += num; >>>>>>> rte_pktmbuf_free(pkt); >>>>>>> num = rte_distributor_get_pkt(d, id, buf, buf, >> num); >>>>>>> + >>>>>>> __atomic_fetch_add(&worker_stats[id].handled_packets, >>>>>>> + num, __ATOMIC_ACQ_REL); >>>>>>> } >>>>>>> returned = rte_distributor_return_pkt(d, >>>>>>> id, buf, num); >>>>>>> @@ -461,7 +472,8 @@ sanity_test_with_worker_shutdown(struct >>>>>>> worker_params *wp, >>>>>>> >>>>>>> for (i = 0; i < rte_lcore_count() - 1; i++) >>>>>>> printf("Worker %u handled %u packets\n", i, >>>>>>> - worker_stats[i].handled_packets); >>>>>>> + >> __atomic_load_n(&worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE)); >>>>>>> >>>>>>> if (total_packet_count() != BURST * 2) { >>>>>>> printf("Line %d: Error, not all packets flushed. " >>>>>>> @@ -514,7 +526,8 @@ test_flush_with_worker_shutdown(struct >>>>>>> worker_params *wp, >>>>>>> zero_quit = 0; >>>>>>> for (i = 0; i < rte_lcore_count() - 1; i++) >>>>>>> printf("Worker %u handled %u packets\n", i, >>>>>>> - worker_stats[i].handled_packets); >>>>>>> + >> __atomic_load_n(&worker_stats[i].handled_packets, >>>>>>> + __ATOMIC_ACQUIRE)); >>>>>>> >>>>>>> if (total_packet_count() != BURST) { >>>>>>> printf("Line %d: Error, not all packets flushed. " >>>>>>> -- >>>>>>> 2.17.1 >>>>> -- >>>>> Lukasz Wojciechowski >>>>> Principal Software Engineer >>>>> >>>>> Samsung R&D Institute Poland >>>>> Samsung Electronics >>>>> Office +48 22 377 88 25 >>>>> l.wojciechow@partner.samsung.com >> -- >> Lukasz Wojciechowski >> Principal Software Engineer >> >> Samsung R&D Institute Poland >> Samsung Electronics >> Office +48 22 377 88 25 >> l.wojciechow@partner.samsung.com -- Lukasz Wojciechowski Principal Software Engineer Samsung R&D Institute Poland Samsung Electronics Office +48 22 377 88 25 l.wojciechow@partner.samsung.com