From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 8806EA00E6 for ; Tue, 16 Apr 2019 16:39:19 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6B64B1B4DA; Tue, 16 Apr 2019 16:39:19 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 311AC1B4DF for ; Tue, 16 Apr 2019 16:39:17 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7D5A63005159; Tue, 16 Apr 2019 14:39:16 +0000 (UTC) Received: from rh.redhat.com (ovpn-117-214.ams2.redhat.com [10.36.117.214]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4E7811001E92; Tue, 16 Apr 2019 14:39:14 +0000 (UTC) From: Kevin Traynor To: Gavin Hu Cc: Phil Yang , Honnappa Nagarahalli , Ola Liljedahl , Steve Capper , Jerin Jacob , Nipun Gupta , Konstantin Ananyev , dpdk stable Date: Tue, 16 Apr 2019 15:37:16 +0100 Message-Id: <20190416143719.21601-58-ktraynor@redhat.com> In-Reply-To: <20190416143719.21601-1-ktraynor@redhat.com> References: <20190416143719.21601-1-ktraynor@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Tue, 16 Apr 2019 14:39:16 +0000 (UTC) Subject: [dpdk-stable] patch 'spinlock: reimplement with atomic one-way barrier' has been queued to LTS release 18.11.2 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, FYI, your patch has been queued to LTS release 18.11.2 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 04/24/19. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Thanks. Kevin Traynor --- >From 548edf47ab0869243541b1c855d1e9330b4718a7 Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 8 Mar 2019 15:56:37 +0800 Subject: [PATCH] spinlock: reimplement with atomic one-way barrier [ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ] The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. Here is the assembly code of __sync_compare_and_swap builtin. __sync_bool_compare_and_swap(dst, exp, src); 0x000000000090f1b0 <+16>: e0 07 40 f9 ldr x0, [sp, #8] 0x000000000090f1b4 <+20>: e1 0f 40 79 ldrh w1, [sp, #6] 0x000000000090f1b8 <+24>: e2 0b 40 79 ldrh w2, [sp, #4] 0x000000000090f1bc <+28>: 21 3c 00 12 and w1, w1, #0xffff 0x000000000090f1c0 <+32>: 03 7c 5f 48 ldxrh w3, [x0] 0x000000000090f1c4 <+36>: 7f 00 01 6b cmp w3, w1 0x000000000090f1c8 <+40>: 61 00 00 54 b.ne 0x90f1d4 // b.any 0x000000000090f1cc <+44>: 02 fc 04 48 stlxrh w4, w2, [x0] 0x000000000090f1d0 <+48>: 84 ff ff 35 cbnz w4, 0x90f1c0 0x000000000090f1d4 <+52>: bf 3b 03 d5 dmb ish 0x000000000090f1d8 <+56>: e0 17 9f 1a cset w0, eq // eq = none The benchmarking results showed constant improvements on all available platforms: 1. Cavium ThunderX2: 126% performance; 2. Hisilicon 1616: 30%; 3. Qualcomm Falkor: 13%; 4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7% Here is the example test result on TX2: $sudo ./build/app/test -l 16-27 -- i RTE>>spinlock_autotest *** spinlock_autotest without this patch *** Test with lock on 12 cores... Core [16] Cost Time = 53886 us Core [17] Cost Time = 53605 us Core [18] Cost Time = 53163 us Core [19] Cost Time = 49419 us Core [20] Cost Time = 34317 us Core [21] Cost Time = 53408 us Core [22] Cost Time = 53970 us Core [23] Cost Time = 53930 us Core [24] Cost Time = 53283 us Core [25] Cost Time = 51504 us Core [26] Cost Time = 50718 us Core [27] Cost Time = 51730 us Total Cost Time = 612933 us *** spinlock_autotest with this patch *** Test with lock on 12 cores... Core [16] Cost Time = 18808 us Core [17] Cost Time = 29497 us Core [18] Cost Time = 29132 us Core [19] Cost Time = 26150 us Core [20] Cost Time = 21892 us Core [21] Cost Time = 24377 us Core [22] Cost Time = 27211 us Core [23] Cost Time = 11070 us Core [24] Cost Time = 29802 us Core [25] Cost Time = 15793 us Core [26] Cost Time = 7474 us Core [27] Cost Time = 29550 us Total Cost Time = 270756 us In the tests on ThunderX2, with more cores contending, the performance gain was even higher, indicating the __atomic implementation scales up better than __sync. Fixes: af75078fece3 ("first public release") Signed-off-by: Gavin Hu Reviewed-by: Phil Yang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Ola Liljedahl Reviewed-by: Steve Capper Reviewed-by: Jerin Jacob Acked-by: Nipun Gupta Acked-by: Konstantin Ananyev --- .../common/include/generic/rte_spinlock.h | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h index c4c3fc31e..87ae7a4f1 100644 --- a/lib/librte_eal/common/include/generic/rte_spinlock.h +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h @@ -62,7 +62,12 @@ static inline void rte_spinlock_lock(rte_spinlock_t *sl) { - while (__sync_lock_test_and_set(&sl->locked, 1)) - while(sl->locked) + int exp = 0; + + while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0, + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) { + while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) rte_pause(); + exp = 0; + } } #endif @@ -81,5 +86,5 @@ static inline void rte_spinlock_unlock (rte_spinlock_t *sl) { - __sync_lock_release(&sl->locked); + __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE); } #endif @@ -100,5 +105,8 @@ static inline int rte_spinlock_trylock (rte_spinlock_t *sl) { - return __sync_lock_test_and_set(&sl->locked,1) == 0; + int exp = 0; + return __atomic_compare_exchange_n(&sl->locked, &exp, 1, + 0, /* disallow spurious failure */ + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } #endif @@ -114,5 +122,5 @@ rte_spinlock_trylock (rte_spinlock_t *sl) static inline int rte_spinlock_is_locked (rte_spinlock_t *sl) { - return sl->locked; + return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE); } -- 2.20.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2019-04-16 15:34:27.777024964 +0100 +++ 0058-spinlock-reimplement-with-atomic-one-way-barrier.patch 2019-04-16 15:34:25.236178707 +0100 @@ -1,8 +1,10 @@ -From 453d8f736676255696bce3c5e01b43b543ff896c Mon Sep 17 00:00:00 2001 +From 548edf47ab0869243541b1c855d1e9330b4718a7 Mon Sep 17 00:00:00 2001 From: Gavin Hu Date: Fri, 8 Mar 2019 15:56:37 +0800 Subject: [PATCH] spinlock: reimplement with atomic one-way barrier +[ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ] + The __sync builtin based implementation generates full memory barriers ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way barriers. @@ -71,7 +73,6 @@ than __sync. Fixes: af75078fece3 ("first public release") -Cc: stable@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Phil Yang