From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 17A76A046B
	for <public@inbox.dpdk.org>; Tue, 23 Jul 2019 03:02:51 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 095C31BEF8;
	Tue, 23 Jul 2019 03:02:51 +0200 (CEST)
Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129])
 by dpdk.org (Postfix) with ESMTP id 3AB7C1BEF8
 for <stable@dpdk.org>; Tue, 23 Jul 2019 03:02:50 +0200 (CEST)
Received: from Internal Mail-Server by MTLPINE2 (envelope-from
 yskoh@mellanox.com)
 with ESMTPS (AES256-SHA encrypted); 23 Jul 2019 04:02:47 +0300
Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx
 [10.101.0.96])
 by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x6N11HfY026580;
 Tue, 23 Jul 2019 04:02:44 +0300
From: Yongseok Koh <yskoh@mellanox.com>
To: Gavin Hu <gavin.hu@arm.com>
Cc: Phil Yang <phil.yang@arm.com>,
 Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
 Ola Liljedahl <ola.liljedahl@arm.com>,
 Steve Capper <steve.capper@arm.com>, Jerin Jacob <jerinj@marvell.com>,
 Nipun Gupta <nipun.gupta@nxp.com>,
 Konstantin Ananyev <konstantin.ananyev@intel.com>,
 dpdk stable <stable@dpdk.org>
Date: Mon, 22 Jul 2019 18:00:15 -0700
Message-Id: <20190723010115.6446-48-yskoh@mellanox.com>
X-Mailer: git-send-email 2.21.0
In-Reply-To: <20190723010115.6446-1-yskoh@mellanox.com>
References: <20190723010115.6446-1-yskoh@mellanox.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [dpdk-stable] patch 'spinlock: reimplement with atomic one-way
	barrier' has been queued to LTS release 17.11.7
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org
Sender: "stable" <stable-bounces@dpdk.org>

Hi,

FYI, your patch has been queued to LTS release 17.11.7

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objection by 07/27/19. So please
shout if anyone has objection.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Thanks.

Yongseok

---
>From 2f66f485b44da1c7bad27a92fe847a8b7888a831 Mon Sep 17 00:00:00 2001
From: Gavin Hu <gavin.hu@arm.com>
Date: Fri, 8 Mar 2019 15:56:37 +0800
Subject: [PATCH] spinlock: reimplement with atomic one-way barrier

[ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ]

The __sync builtin based implementation generates full memory barriers
('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
barriers.

Here is the assembly code of __sync_compare_and_swap builtin.
__sync_bool_compare_and_swap(dst, exp, src);
   0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
   0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
   0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
   0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
   0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
   0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
   0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
<rte_atomic16_cmpset+52>  // b.any
   0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
   0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
<rte_atomic16_cmpset+32>
   0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
   0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq = none

The benchmarking results showed constant improvements on all available
platforms:
1. Cavium ThunderX2: 126% performance;
2. Hisilicon 1616: 30%;
3. Qualcomm Falkor: 13%;
4. Marvell ARMADA 8040 with A72 cores on macchiatobin: 3.7%

Here is the example test result on TX2:
$sudo ./build/app/test -l 16-27 -- i
RTE>>spinlock_autotest

*** spinlock_autotest without this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 53886 us
Core [17] Cost Time = 53605 us
Core [18] Cost Time = 53163 us
Core [19] Cost Time = 49419 us
Core [20] Cost Time = 34317 us
Core [21] Cost Time = 53408 us
Core [22] Cost Time = 53970 us
Core [23] Cost Time = 53930 us
Core [24] Cost Time = 53283 us
Core [25] Cost Time = 51504 us
Core [26] Cost Time = 50718 us
Core [27] Cost Time = 51730 us
Total Cost Time = 612933 us

*** spinlock_autotest with this patch ***
Test with lock on 12 cores...
Core [16] Cost Time = 18808 us
Core [17] Cost Time = 29497 us
Core [18] Cost Time = 29132 us
Core [19] Cost Time = 26150 us
Core [20] Cost Time = 21892 us
Core [21] Cost Time = 24377 us
Core [22] Cost Time = 27211 us
Core [23] Cost Time = 11070 us
Core [24] Cost Time = 29802 us
Core [25] Cost Time = 15793 us
Core [26] Cost Time = 7474 us
Core [27] Cost Time = 29550 us
Total Cost Time = 270756 us

In the tests on ThunderX2, with more cores contending, the performance gain
was even higher, indicating the __atomic implementation scales up better
than __sync.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 .../common/include/generic/rte_spinlock.h      | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
index 54f83a4c56..f260136f28 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -90,9 +90,14 @@ rte_spinlock_lock(rte_spinlock_t *sl);
 static inline void
 rte_spinlock_lock(rte_spinlock_t *sl)
 {
-	while (__sync_lock_test_and_set(&sl->locked, 1))
-		while(sl->locked)
+	int exp = 0;
+
+	while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
 			rte_pause();
+		exp = 0;
+	}
 }
 #endif
 
@@ -109,7 +114,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl);
 static inline void
 rte_spinlock_unlock (rte_spinlock_t *sl)
 {
-	__sync_lock_release(&sl->locked);
+	__atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
 }
 #endif
 
@@ -128,7 +133,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
 static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl)
 {
-	return __sync_lock_test_and_set(&sl->locked,1) == 0;
+	int exp = 0;
+	return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+				0, /* disallow spurious failure */
+				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
 }
 #endif
 
@@ -142,7 +150,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
  */
 static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
 {
-	return sl->locked;
+	return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE);
 }
 
 /**
-- 
2.21.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2019-07-22 17:55:09.104148109 -0700
+++ 0048-spinlock-reimplement-with-atomic-one-way-barrier.patch	2019-07-22 17:55:06.100479000 -0700
@@ -1,8 +1,10 @@
-From 453d8f736676255696bce3c5e01b43b543ff896c Mon Sep 17 00:00:00 2001
+From 2f66f485b44da1c7bad27a92fe847a8b7888a831 Mon Sep 17 00:00:00 2001
 From: Gavin Hu <gavin.hu@arm.com>
 Date: Fri, 8 Mar 2019 15:56:37 +0800
 Subject: [PATCH] spinlock: reimplement with atomic one-way barrier
 
+[ upstream commit 453d8f736676255696bce3c5e01b43b543ff896c ]
+
 The __sync builtin based implementation generates full memory barriers
 ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate one way
 barriers.
@@ -71,7 +73,6 @@
 than __sync.
 
 Fixes: af75078fece3 ("first public release")
-Cc: stable@dpdk.org
 
 Signed-off-by: Gavin Hu <gavin.hu@arm.com>
 Reviewed-by: Phil Yang <phil.yang@arm.com>
@@ -86,10 +87,10 @@
  1 file changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h
-index c4c3fc31ec..87ae7a4f18 100644
+index 54f83a4c56..f260136f28 100644
 --- a/lib/librte_eal/common/include/generic/rte_spinlock.h
 +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
-@@ -61,9 +61,14 @@ rte_spinlock_lock(rte_spinlock_t *sl);
+@@ -90,9 +90,14 @@ rte_spinlock_lock(rte_spinlock_t *sl);
  static inline void
  rte_spinlock_lock(rte_spinlock_t *sl)
  {
@@ -106,7 +107,7 @@
  }
  #endif
  
-@@ -80,7 +85,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl);
+@@ -109,7 +114,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl);
  static inline void
  rte_spinlock_unlock (rte_spinlock_t *sl)
  {
@@ -115,7 +116,7 @@
  }
  #endif
  
-@@ -99,7 +104,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
+@@ -128,7 +133,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
  static inline int
  rte_spinlock_trylock (rte_spinlock_t *sl)
  {
@@ -127,7 +128,7 @@
  }
  #endif
  
-@@ -113,7 +121,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
+@@ -142,7 +150,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
   */
  static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
  {