From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 1EB431B51D for ; Fri, 23 Nov 2018 11:30:39 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 71159AC2C6; Fri, 23 Nov 2018 10:30:38 +0000 (UTC) Received: from ktraynor.remote.csb (unknown [10.36.118.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 40A2917C2D; Fri, 23 Nov 2018 10:30:36 +0000 (UTC) From: Kevin Traynor To: Gavin Hu Cc: Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , dpdk stable Date: Fri, 23 Nov 2018 10:27:02 +0000 Message-Id: <20181123102713.17309-58-ktraynor@redhat.com> In-Reply-To: <20181123102713.17309-1-ktraynor@redhat.com> References: <20181123102713.17309-1-ktraynor@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 23 Nov 2018 10:30:38 +0000 (UTC) Subject: [dpdk-stable] patch 'ring/c11: keep deterministic order allowing retry to work' has been queued to stable release 18.08.1 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Nov 2018 10:30:39 -0000 Hi, FYI, your patch has been queued to stable release 18.08.1 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 11/29/18. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. If the code is different (ie: not only metadata diffs), due for example to a change in context or macro names, please double check it. Thanks. Kevin Traynor --- >>From 20a505bd4cc15804bfafd8e401c19c14cf2fe17c Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 9 Nov 2018 19:42:46 +0800 Subject: [PATCH] ring/c11: keep deterministic order allowing retry to work [ upstream commit 86757c2c3ed5006940f93725d39131dfb0d09b60 ] Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) 2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head forward. 3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail beyond the prod.head read by thread 1. 4) Thread 1 is re-scheduled. It reads cons.tail. cpu1(producer) cpu2(producer) cpu3(consumer) load r->prod.head ^ load r->prod.head | load r->cons.tail | store r->prod.head(+n) stalled <-- enqueue -----> | store r->prod.tail(+n) | load r->cons.head | load r->prod.tail | store r->cons.head(+n) | <...dequeue.....> v store r->cons.tail(+n) load r->cons.tail For thread 1, the __atomic_compare_exchange_n detects the outdated prod.head and retry the flow with the new one. This retry flow works ok on strong ordering platform(eg:x86). But for weak ordering platforms(arm, ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new but cons.tail becomes too old, the retry flow, based on the detection of outdated head, does not trigger as expected, thus the outdate cons.tail causes wrong free_entries. Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries. The fix is to keep the deterministic order of two loads allowing the retry to work. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 With the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 The results showed the thread fence degrade the performance slightly, but it is required for correctness. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl --- lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 7bc74a4cb..dc49a998f 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -67,4 +67,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* load-acquire synchronize with store-release of ht->tail * in update_tail. @@ -140,4 +143,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* this load-acquire synchronize with store-release of ht->tail * in update_tail. -- 2.19.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2018-11-23 10:22:55.753727339 +0000 +++ 0058-ring-c11-keep-deterministic-order-allowing-retry-to-.patch 2018-11-23 10:22:54.000000000 +0000 @@ -1,8 +1,10 @@ -From 86757c2c3ed5006940f93725d39131dfb0d09b60 Mon Sep 17 00:00:00 2001 +From 20a505bd4cc15804bfafd8e401c19c14cf2fe17c Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 9 Nov 2018 19:42:46 +0800 Subject: [PATCH] ring/c11: keep deterministic order allowing retry to work +[ upstream commit 86757c2c3ed5006940f93725d39131dfb0d09b60 ] + Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) @@ -65,7 +67,6 @@ it is required for correctness. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") -Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli