patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work
       [not found] <1541763767-7399-1-git-send-email-gavin.hu@arm.com>
@ 2018-11-09 11:42 ` Gavin Hu
  2018-11-09 11:42 ` [dpdk-stable] [PATCH v1 2/2] ring: relaxed ordering for load and store the head Gavin Hu
  1 sibling, 0 replies; 2+ messages in thread
From: Gavin Hu @ 2018-11-09 11:42 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
	stable

Use case scenario:
1) Thread 1 is enqueuing. It reads prod.head and gets stalled for some
   reasons (running out of cpu time, preempted,...)
2) Thread 2 is enqueuing. It succeeds in enqueuing and moves prod.head
   forward.
3) Thread 3 is dequeuing. It succeeds in dequeuing and moves the cons.tail
   beyond the prod.head read by thread 1.
4) Thread 1 is re-scheduled. It reads cons.tail.

cpu1(producer)      cpu2(producer)          cpu3(consumer)
load r->prod.head
    ^               load r->prod.head
    |               load r->cons.tail
    |               store r->prod.head(+n)
  stalled           <-- enqueue ----->
    |               store r->prod.tail(+n)
    |                                        load r->cons.head
    |                                        load r->prod.tail
    |                                        store r->cons.head(+n)
    |                                        <...dequeue.....>
    v                                        store r->cons.tail(+n)
load r->cons.tail

For thread 1, the __atomic_compare_exchange_n detects the outdated
prod.head and retry the flow with the new one. This retry flow works ok on
strong ordering platform(eg:x86). But for weak ordering platforms(arm,
ppc), loading cons.tail and prod.head might be re-ordered, prod.head is new
but cons.tail becomes too old, the retry flow, based on the detection of
outdated head, does not trigger as expected, thus the outdate cons.tail
causes wrong free_entries.

Similarly, for dequeuing, outdated prod.tail leads to wrong avail_entries.

The fix is to keep the deterministic order of two loads allowing the retry
to work.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core, 4 threads/core, 2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

The results showed the thread fence degrade the performance slightly, but
it is required for correctness.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 7bc74a4..dc49a99 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -66,6 +66,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		/* Reset n to the initial burst count */
 		n = max;
 
+		/* Ensure the head is read before tail */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -139,6 +142,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		/* Restore n as it may change every loop */
 		n = max;
 
+		/* Ensure the head is read before tail */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [dpdk-stable] [PATCH v1 2/2] ring: relaxed ordering for load and store the head
       [not found] <1541763767-7399-1-git-send-email-gavin.hu@arm.com>
  2018-11-09 11:42 ` [dpdk-stable] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work Gavin Hu
@ 2018-11-09 11:42 ` Gavin Hu
  1 sibling, 0 replies; 2+ messages in thread
From: Gavin Hu @ 2018-11-09 11:42 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
	stable

When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.

Because the thread fence in place, ordering for the first iteration can be
relaxed.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38

No significant improvement, nor regression was seen,as the optimisation is
not at the critical path.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index dc49a99..0fb73a3 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -61,7 +61,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
+	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
@@ -97,7 +97,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 			/* on failure, *old_head is updated */
 			success = __atomic_compare_exchange_n(&r->prod.head,
 					old_head, *new_head,
-					0, __ATOMIC_ACQUIRE,
+					0, __ATOMIC_RELAXED,
 					__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
 	return n;
@@ -137,7 +137,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	int success;
 
 	/* move cons.head atomically */
-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
+	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
@@ -172,7 +172,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 			/* on failure, *old_head will be updated */
 			success = __atomic_compare_exchange_n(&r->cons.head,
 							old_head, *new_head,
-							0, __ATOMIC_ACQUIRE,
+							0, __ATOMIC_RELAXED,
 							__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
 	return n;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-11-09 11:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1541763767-7399-1-git-send-email-gavin.hu@arm.com>
2018-11-09 11:42 ` [dpdk-stable] [PATCH v1 1/2] ring: keep the deterministic order allowing retry to work Gavin Hu
2018-11-09 11:42 ` [dpdk-stable] [PATCH v1 2/2] ring: relaxed ordering for load and store the head Gavin Hu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).