From: Gavin Hu <gavin.hu@arm.com>
To: dev@dpdk.org
Cc: thomas@monjalon.net, stephen@networkplumber.org,
olivier.matz@6wind.com, chaozhu@linux.vnet.ibm.com,
bruce.richardson@intel.com, konstantin.ananyev@intel.com,
jerin.jacob@caviumnetworks.com, Honnappa.Nagarahalli@arm.com,
gavin.hu@arm.com, stable@dpdk.org
Subject: [dpdk-dev] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work
Date: Fri, 9 Nov 2018 19:42:46 +0800 [thread overview]
Message-ID: <1541763767-7399-2-git-send-email-gavin.hu@arm.com> (raw)
In-Reply-To: <1541763767-7399-1-git-send-email-gavin.hu@arm.com>
Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.
cpu1(producer) cpu2(producer) cpu3(consumer)
load r->prod.head
^ load r->prod.head
| load r->cons.tail
| store r->prod.head(+n)
stalled <-- enqueue ----->
| store r->prod.tail(+n)
| load r->cons.head
| load r->prod.tail
| store r->cons.head(+n)
| <...dequeue.....>
v store r->cons.tail(+n)
load r->cons.tail
For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.
Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.
The fix is to keep the deterministic order of two loads allowing the retry
to work.
Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i
Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30
With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34
The results showed the thread fence degrade the performance slightly, but
it is required for correctness.
Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 7bc74a4..dc49a99 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -66,6 +66,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
/* Reset n to the initial burst count */
n = max;
+ /* Ensure the head is read before tail */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+
/* load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -139,6 +142,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
/* Restore n as it may change every loop */
n = max;
+ /* Ensure the head is read before tail */
+ __atomic_thread_fence(__ATOMIC_ACQUIRE);
+
/* this load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
--
2.7.4
next prev parent reply other threads:[~2018-11-09 11:43 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-09 11:42 [dpdk-dev] [PATCH v1 0/2] ring C11 library fix and optimization Gavin Hu
2018-11-09 11:42 ` Gavin Hu [this message]
2018-11-09 11:42 ` [dpdk-dev] [PATCH v1 2/2] ring: relaxed ordering for load and store the head Gavin Hu
2018-11-13 15:57 ` [dpdk-dev] [PATCH v1 0/2] ring C11 library fix and optimization Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1541763767-7399-2-git-send-email-gavin.hu@arm.com \
--to=gavin.hu@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=bruce.richardson@intel.com \
--cc=chaozhu@linux.vnet.ibm.com \
--cc=dev@dpdk.org \
--cc=jerin.jacob@caviumnetworks.com \
--cc=konstantin.ananyev@intel.com \
--cc=olivier.matz@6wind.com \
--cc=stable@dpdk.org \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).