From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by dpdk.org (Postfix) with ESMTP id CC2254C9C; Fri, 9 Nov 2018 12:43:04 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 213F615AB; Fri, 9 Nov 2018 03:43:04 -0800 (PST) Received: from net-arm-c2400.shanghai.arm.com (net-arm-c2400.shanghai.arm.com [10.169.40.108]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5AA553F718; Fri, 9 Nov 2018 03:43:02 -0800 (PST) From: Gavin Hu To: dev@dpdk.org Cc: thomas@monjalon.net, stephen@networkplumber.org, olivier.matz@6wind.com, chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, jerin.jacob@caviumnetworks.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, stable@dpdk.org Date: Fri, 9 Nov 2018 19:42:46 +0800 Message-Id: <1541763767-7399-2-git-send-email-gavin.hu@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> References: <1541763767-7399-1-git-send-email-gavin.hu@arm.com> Subject: [dpdk-dev] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2018 11:43:05 -0000 Use case scenario: 1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some reasons (running out of cpu time, preempted,...) 2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head forward. 3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail beyond the prod.head read by thread 1. 4) Thread 1 is re-scheduled. It reads cons.tail. cpu1(producer) cpu2(producer) cpu3(consumer) load r->prod.head ^ load r->prod.head | load r->cons.tail | store r->prod.head(+n) stalled <-- enqueue -----> | store r->prod.tail(+n) | load r->cons.head | load r->prod.tail | store r->cons.head(+n) | <...dequeue.....> v store r->cons.tail(+n) load r->cons.tail For thread 1, the __atomic_compare_exchange_n detects the outdated prod.head and retry the flow with the new one. This retry flow works ok on strong ordering platform(eg:x86). But for weak ordering platforms(arm, ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new but cons.tail becomes too old, the retry flow, based on the detection of outdated head, does not trigger as expected, thus the outdate cons.tail causes wrong free_entries. Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries. The fix is to keep the deterministic order of two loads allowing the retry to work. Run the ring perf test on the following testbed: HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc gcc: 8.1.0 $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \ --socket-mem=1024 -- -i Without the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 With the patch: *** Testing using two physical cores *** SP/SC bulk enq/dequeue (size: 8): 5.75 MP/MC bulk enq/dequeue (size: 8): 10.18 SP/SC bulk enq/dequeue (size: 32): 1.80 MP/MC bulk enq/dequeue (size: 32): 2.34 The results showed the thread fence degrade the performance slightly, but it is required for correctness. Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl --- lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 7bc74a4..dc49a99 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -66,6 +66,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, /* Reset n to the initial burst count */ n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* load-acquire synchronize with store-release of ht->tail * in update_tail. */ @@ -139,6 +142,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, /* Restore n as it may change every loop */ n = max; + /* Ensure the head is read before tail */ + __atomic_thread_fence(__ATOMIC_ACQUIRE); + /* this load-acquire synchronize with store-release of ht->tail * in update_tail. */ -- 2.7.4