From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 0F9981B4FC for ; Fri, 23 Nov 2018 11:29:30 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 613303086249; Fri, 23 Nov 2018 10:29:29 +0000 (UTC) Received: from ktraynor.remote.csb (unknown [10.36.118.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3D9A817DA1; Fri, 23 Nov 2018 10:29:26 +0000 (UTC) From: Kevin Traynor To: Gavin Hu Cc: Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , Jia He , Jerin Jacob , Olivier Matz , dpdk stable Date: Fri, 23 Nov 2018 10:26:31 +0000 Message-Id: <20181123102713.17309-27-ktraynor@redhat.com> In-Reply-To: <20181123102713.17309-1-ktraynor@redhat.com> References: <20181123102713.17309-1-ktraynor@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Fri, 23 Nov 2018 10:29:29 +0000 (UTC) Subject: [dpdk-stable] patch 'ring/c11: move atomic load of head above the loop' has been queued to stable release 18.08.1 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Nov 2018 10:29:30 -0000 Hi, FYI, your patch has been queued to stable release 18.08.1 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 11/29/18. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. If the code is different (ie: not only metadata diffs), due for example to a change in context or macro names, please double check it. Thanks. Kevin Traynor --- >>From b687a72eb402f07dc3fcdf1beb88a60e2cac422f Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 2 Nov 2018 19:21:28 +0800 Subject: [PATCH] ring/c11: move atomic load of head above the loop [ upstream commit 047adc17245892198be31c54cf6658080df3dc6d ] In __rte_ring_move_prod_head, move the __atomic_load_n up and out of the do {} while loop as upon failure the old_head will be updated, another load is costly and not necessary. This helps a little on the latency,about 1~5%. Test result with the patch(two cores): SP/SC bulk enq/dequeue (size: 8): 5.64 MP/MC bulk enq/dequeue (size: 8): 9.58 SP/SC bulk enq/dequeue (size: 32): 1.98 MP/MC bulk enq/dequeue (size: 32): 2.30 Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option") Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl Reviewed-by: Jia He Acked-by: Jerin Jacob Tested-by: Jerin Jacob Acked-by: Olivier Matz --- lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 52da95a21..7bc74a4cb 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -62,11 +62,9 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, int success; + *old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE); do { /* Reset n to the initial burst count */ n = max; - *old_head = __atomic_load_n(&r->prod.head, - __ATOMIC_ACQUIRE); - /* load-acquire synchronize with store-release of ht->tail * in update_tail. @@ -94,4 +92,5 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, r->prod.head = *new_head, success = 1; else + /* on failure, *old_head is updated */ success = __atomic_compare_exchange_n(&r->prod.head, old_head, *new_head, @@ -136,11 +135,9 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, /* move cons.head atomically */ + *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE); do { /* Restore n as it may change every loop */ n = max; - *old_head = __atomic_load_n(&r->cons.head, - __ATOMIC_ACQUIRE); - /* this load-acquire synchronize with store-release of ht->tail * in update_tail. @@ -167,4 +164,5 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, r->cons.head = *new_head, success = 1; else + /* on failure, *old_head will be updated */ success = __atomic_compare_exchange_n(&r->cons.head, old_head, *new_head, -- 2.19.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2018-11-23 10:22:54.960223892 +0000 +++ 0027-ring-c11-move-atomic-load-of-head-above-the-loop.patch 2018-11-23 10:22:54.000000000 +0000 @@ -1,8 +1,10 @@ -From 047adc17245892198be31c54cf6658080df3dc6d Mon Sep 17 00:00:00 2001 +From b687a72eb402f07dc3fcdf1beb88a60e2cac422f Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 2 Nov 2018 19:21:28 +0800 Subject: [PATCH] ring/c11: move atomic load of head above the loop +[ upstream commit 047adc17245892198be31c54cf6658080df3dc6d ] + In __rte_ring_move_prod_head, move the __atomic_load_n up and out of the do {} while loop as upon failure the old_head will be updated, another load is costly and not necessary. @@ -16,7 +18,6 @@ MP/MC bulk enq/dequeue (size: 32): 2.30 Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option") -Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli @@ -27,29 +28,9 @@ Tested-by: Jerin Jacob Acked-by: Olivier Matz --- - doc/guides/rel_notes/release_18_11.rst | 10 ++++++++++ - lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------ - 2 files changed, 14 insertions(+), 6 deletions(-) - -diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst -index c60879c69..cfa92b8c0 100644 ---- a/doc/guides/rel_notes/release_18_11.rst -+++ b/doc/guides/rel_notes/release_18_11.rst -@@ -70,4 +70,14 @@ New Features - one device has addressing limitations, the dma mask is the more restricted one. - -+* **Updated the C11 memory model version of ring library.** -+ -+ The latency is decreased for architectures using the C11 memory model -+ version of the ring library. -+ -+ On Cavium ThunderX2 platform, the changes decreased latency by 27~29% -+ and 3~15% for MPMC and SPSC cases respectively (with 2 lcores). The -+ real improvements may vary with the number of contending lcores and -+ the size of ring. -+ - * **Added hot-unplug handle mechanism.** - + lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------ + 1 file changed, 4 insertions(+), 6 deletions(-) + diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 52da95a21..7bc74a4cb 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h