DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
@ 2019-06-05 15:58 Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Phil Yang @ 2019-06-05 15:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu,
	phil.yang, nd

This patch set added MCS lock library and its unit test.

The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Mirco-benchmarking result:
------------------------------------------------------------------------------------------------
MCS lock                      | spinlock                       | ticket lock
------------------------------+--------------------------------+--------------------------------
Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock on 14 cores...
Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] cost time = 66761 us
Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] cost time = 66766 us
Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time = 66761 us
Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] cost time = 66767 us
Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] cost time = 66766 us
Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] cost time = 66763 us
Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time = 66766 us
Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] cost time = 66749 us
Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us
Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time = 66762 us
Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] cost time = 66768 us
Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] cost time = 66768 us
                              |  Core [27] Cost Time = 38071 us|  Core [27] cost time = 66767 us
------------------------------+--------------------------------+--------------------------------
Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost time = 934687 us
------------------------------------------------------------------------------------------------


Phil Yang (3):
  eal/mcslock: add mcs queued lock implementation
  eal/mcslock: use generic msc queued lock on all arch
  test/mcslock: add mcs queued lock unit test

 MAINTAINERS                                        |   5 +
 app/test/Makefile                                  |   1 +
 app/test/autotest_data.py                          |   6 +
 app/test/autotest_test_funcs.py                    |  32 +++
 app/test/meson.build                               |   2 +
 app/test/test_mcslock.c                            | 248 +++++++++++++++++++++
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/arch/arm/rte_mcslock.h          |  23 ++
 .../common/include/arch/ppc_64/rte_mcslock.h       |  19 ++
 .../common/include/arch/x86/rte_mcslock.h          |  19 ++
 .../common/include/generic/rte_mcslock.h           | 169 ++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 14 files changed, 533 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_mcslock.c
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs queued lock implementation
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
@ 2019-06-05 15:58 ` Phil Yang
  2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Phil Yang @ 2019-06-05 15:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu,
	phil.yang, nd

If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.

This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 MAINTAINERS                                        |   4 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/generic/rte_mcslock.h           | 169 +++++++++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 6 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829..1390238 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -222,6 +222,10 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/librte_eal/common/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+MCSlock - EXPERIMENTAL
+M: Phil Yang <phil.yang@arm.com>
+F: lib/librte_eal/common/include/generic/rte_mcslock.h
+
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
 M: Gavin Hu <gavin.hu@arm.com>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 715248d..d0e32b1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -63,6 +63,7 @@ The public API headers are grouped by topics:
 
 - **locks**:
   [atomic]             (@ref rte_atomic.h),
+  [mcslock]            (@ref rte_mcslock.h),
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index a17e7de..ebd5105 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added MCS lock library.**
+
+  Added MCS lock library. It provides scalability by spinning on a
+  CPU/thread local variable which avoids expensive cache bouncings.
+  It provides fairness by maintaining a list of acquirers and passing
+  the lock to each CPU/thread in the order they acquired the lock.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 1647af7..a00d4fc 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -21,7 +21,7 @@ INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_memcpy.h rte_cpuflags.h
-GENERIC_INC += rte_spinlock.h rte_rwlock.h rte_ticketlock.h
+GENERIC_INC += rte_mcslock.h rte_spinlock.h rte_rwlock.h rte_ticketlock.h
 GENERIC_INC += rte_vect.h rte_pause.h rte_io.h
 
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
diff --git a/lib/librte_eal/common/include/generic/rte_mcslock.h b/lib/librte_eal/common/include/generic/rte_mcslock.h
new file mode 100644
index 0000000..20e9bb8
--- /dev/null
+++ b/lib/librte_eal/common/include/generic/rte_mcslock.h
@@ -0,0 +1,169 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_H_
+#define _RTE_MCSLOCK_H_
+
+/**
+ * @file
+ *
+ * RTE MCS lock
+ *
+ * This file defines the main data structure and APIs for MCS queued lock.
+ *
+ * The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT)
+ * provides scalability by spinning on a CPU/thread local variable which
+ * avoids expensive cache bouncings. It provides fairness by maintaining
+ * a list of acquirers and passing the lock to each CPU/thread in the order
+ * they acquired the lock.
+ */
+
+#include <rte_lcore.h>
+#include <rte_common.h>
+#include <rte_pause.h>
+
+/**
+ * The rte_mcslock_t type.
+ */
+typedef struct rte_mcslock {
+	struct rte_mcslock *next;
+	int locked; /* 1 if the queue locked, 0 otherwise */
+} rte_mcslock_t;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Take the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ *   When the lock is initialized or declared, the msl pointer should be
+ *   set to NULL.
+ * @param me
+ *   A pointer to a new node of MCS lock. Each CPU/thread acquiring the
+ *   lock should use its 'own node'.
+ */
+static inline __rte_experimental void
+rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	rte_mcslock_t *prev;
+
+	/* Init me node */
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* If the queue is empty, the exchange operation is enough to acquire
+	 * the lock. Hence, the exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node is
+	 * visible to other CPUs/threads. Hence, the exchange operation requires
+	 * release semantics as well.
+	 */
+	prev = __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL);
+	if (likely(prev == NULL)) {
+		/* Queue was empty, no further action required,
+		 * proceed with lock taken.
+		 */
+		return;
+	}
+	__atomic_store_n(&me->locked, 1, __ATOMIC_RELAXED);
+	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
+
+	/* The while-load of me->locked should not move above the previous
+	 * store to prev->next. Otherwise it will cause a deadlock. Need a
+	 * store-load barrier.
+	 */
+	__atomic_thread_fence(__ATOMIC_ACQ_REL);
+	/* If the lock has already been acquired, it first atomically
+	 * places the node at the end of the queue and then proceeds
+	 * to spin on me->locked until the previous lock holder resets
+	 * the me->locked using mcslock_unlock().
+	 */
+	while (__atomic_load_n(&me->locked, __ATOMIC_ACQUIRE))
+		rte_pause();
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Release the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to the node of MCS lock passed in rte_mcslock_lock.
+ */
+static inline __rte_experimental void
+rte_mcslock_unlock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Check if there are more nodes in the queue. */
+	if (likely(__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)) {
+		/* No, last member in the queue. */
+		rte_mcslock_t *save_me = __atomic_load_n(&me, __ATOMIC_RELAXED);
+
+		/* Release the lock by setting it to NULL */
+		if (likely(__atomic_compare_exchange_n(msl, &save_me, NULL, 0,
+				__ATOMIC_RELEASE, __ATOMIC_RELAXED)))
+			return;
+		/* More nodes added to the queue by other CPUs.
+		 * Wait until the next pointer is set.
+		 */
+		while (__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)
+			rte_pause();
+	}
+
+	/* Pass lock to next waiter. */
+	__atomic_store_n(&me->next->locked, 0, __ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Try to take the lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to a new node of MCS lock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_mcslock_trylock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Init me node */
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* Try to lock */
+	rte_mcslock_t *expected = NULL;
+
+	/* The lock can be taken only when the queue is empty. Hence,
+	 * the compare-exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node
+	 * is visible to other CPUs/threads. Hence, the compare-exchange
+	 * operation requires release semantics as well.
+	 */
+	return __atomic_compare_exchange_n(msl, &expected, me, 0,
+			__ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Test if the lock is taken.
+ *
+ * @param msl
+ *   A pointer to a MCS lock node.
+ * @return
+ *   1 if the lock is currently taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_mcslock_is_locked(rte_mcslock_t *msl)
+{
+	return (__atomic_load_n(&msl, __ATOMIC_RELAXED) != NULL);
+}
+
+#endif /* _RTE_MCSLOCK_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index 0670e41..f74db29 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -94,6 +94,7 @@ generic_headers = files(
 	'include/generic/rte_cpuflags.h',
 	'include/generic/rte_cycles.h',
 	'include/generic/rte_io.h',
+	'include/generic/rte_mcslock.h',
 	'include/generic/rte_memcpy.h',
 	'include/generic/rte_pause.h',
 	'include/generic/rte_prefetch.h',
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v1 2/3] eal/mcslock: use generic msc queued lock on all arch
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
@ 2019-06-05 15:58 ` Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-06-05 15:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu,
	phil.yang, nd

Let all architectures use generic MCS queued lock implementation.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 .../common/include/arch/arm/rte_mcslock.h          | 23 ++++++++++++++++++++++
 .../common/include/arch/ppc_64/rte_mcslock.h       | 19 ++++++++++++++++++
 .../common/include/arch/x86/rte_mcslock.h          | 19 ++++++++++++++++++
 3 files changed, 61 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_mcslock.h b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
new file mode 100644
index 0000000..5e41e32
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_ARM_H_
+#define _RTE_MCSLOCK_ARM_H_
+
+#ifndef RTE_FORCE_INTRINSICS
+#  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_ARM_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
new file mode 100644
index 0000000..951b6dd
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_PPC_64_H_
+#define _RTE_MCSLOCK_PPC_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_PPC_64_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/x86/rte_mcslock.h b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
new file mode 100644
index 0000000..573b700
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_X86_64_H_
+#define _RTE_MCSLOCK_X86_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_X86_64_H_ */
+
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
@ 2019-06-05 15:58 ` Phil Yang
  2019-06-06 13:42   ` Ananyev, Konstantin
  2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Phil Yang @ 2019-06-05 15:58 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu,
	phil.yang, nd

Unit test and perf test for MCS queued lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 MAINTAINERS                     |   1 +
 app/test/Makefile               |   1 +
 app/test/autotest_data.py       |   6 +
 app/test/autotest_test_funcs.py |  32 ++++++
 app/test/meson.build            |   2 +
 app/test/test_mcslock.c         | 248 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 290 insertions(+)
 create mode 100644 app/test/test_mcslock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1390238..33fdc8f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -225,6 +225,7 @@ F: app/test/test_ticketlock.c
 MCSlock - EXPERIMENTAL
 M: Phil Yang <phil.yang@arm.com>
 F: lib/librte_eal/common/include/generic/rte_mcslock.h
+F: app/test/test_mcslock.c
 
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
diff --git a/app/test/Makefile b/app/test/Makefile
index 68d6b4f..be405cd 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -64,6 +64,7 @@ SRCS-y += test_atomic.c
 SRCS-y += test_barrier.c
 SRCS-y += test_malloc.c
 SRCS-y += test_cycles.c
+SRCS-y += test_mcslock.c
 SRCS-y += test_spinlock.c
 SRCS-y += test_ticketlock.c
 SRCS-y += test_memory.c
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 0f2c9a7..68ca23d 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -177,6 +177,12 @@
         "Report":  None,
     },
     {
+        "Name":    "MCSlock autotest",
+        "Command": "mcslock_autotest",
+        "Func":    mcslock_autotest,
+        "Report":  None,
+    },
+    {
         "Name":    "Byte order autotest",
         "Command": "byteorder_autotest",
         "Func":    default_autotest,
diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
index 31cc0f5..26688b7 100644
--- a/app/test/autotest_test_funcs.py
+++ b/app/test/autotest_test_funcs.py
@@ -164,6 +164,38 @@ def ticketlock_autotest(child, test_name):
 
     return 0, "Success"
 
+def mcslock_autotest(child, test_name):
+    i = 0
+    ir = 0
+    child.sendline(test_name)
+    while True:
+        index = child.expect(["Test OK",
+                              "Test Failed",
+                              "lcore ([0-9]*) state: ([0-1])"
+                              "MCS lock taken on core ([0-9]*)",
+                              "MCS lock released on core ([0-9]*)",
+                              pexpect.TIMEOUT], timeout=5)
+        # ok
+        if index == 0:
+            break
+
+        # message, check ordering
+        elif index == 2:
+            if int(child.match.groups()[0]) < i:
+                return -1, "Fail [Bad order]"
+            i = int(child.match.groups()[0])
+        elif index == 3:
+            if int(child.match.groups()[0]) < ir:
+                return -1, "Fail [Bad order]"
+            ir = int(child.match.groups()[0])
+
+        # fail
+        elif index == 4:
+            return -1, "Fail [Timeout]"
+        elif index == 1:
+            return -1, "Fail"
+
+    return 0, "Success"
 
 def logs_autotest(child, test_name):
     child.sendline(test_name)
diff --git a/app/test/meson.build b/app/test/meson.build
index 83391ce..3f5f17a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -75,6 +75,7 @@ test_sources = files('commands.c',
 	'test_memzone.c',
 	'test_meter.c',
 	'test_metrics.c',
+	'test_mcslock.c',
 	'test_mp_secondary.c',
 	'test_pdump.c',
 	'test_per_lcore.c',
@@ -167,6 +168,7 @@ fast_parallel_test_names = [
         'lpm6_autotest',
         'malloc_autotest',
         'mbuf_autotest',
+        'mcslock_autotest',
         'memcpy_autotest',
         'memory_autotest',
         'mempool_autotest',
diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
new file mode 100644
index 0000000..a2274e5
--- /dev/null
+++ b/app/test/test_mcslock.c
@@ -0,0 +1,248 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_per_lcore.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_mcslock.h>
+#include <rte_atomic.h>
+
+#include "test.h"
+
+/*
+ * RTE MCS lock test
+ * =================
+ *
+ * These tests are derived from spin lock test cases.
+ *
+ * - The functional test takes all of these locks and launches the
+ *   ''test_mcslock_per_core()'' function on each core (except the master).
+ *
+ *   - The function takes the global lock, display something, then releases
+ *     the global lock on each core.
+ *
+ * - A load test is carried out, with all cores attempting to lock a single
+ *   lock multiple times.
+ */
+#include <rte_per_lcore.h>
+
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
+
+rte_mcslock_t *p_ml;
+rte_mcslock_t *p_ml_try;
+rte_mcslock_t *p_ml_perf;
+
+static unsigned int count;
+
+static rte_atomic32_t synchro;
+
+static int
+test_mcslock_per_core(__attribute__((unused)) void *arg)
+{
+	/* Per core me node. */
+	rte_mcslock_t ml_me = RTE_PER_LCORE(_ml_me);
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+	printf("MCS lock taken on core %u\n", rte_lcore_id());
+	rte_mcslock_unlock(&p_ml, &ml_me);
+	printf("MCS lock released on core %u\n", rte_lcore_id());
+
+	return 0;
+}
+
+static uint64_t time_count[RTE_MAX_LCORE] = {0};
+
+#define MAX_LOOP 10000
+
+static int
+load_loop_fn(void *func_param)
+{
+	uint64_t time_diff = 0, begin;
+	uint64_t hz = rte_get_timer_hz();
+	volatile uint64_t lcount = 0;
+	const int use_lock = *(int *)func_param;
+	const unsigned int lcore = rte_lcore_id();
+
+	/**< Per core me node. */
+	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
+
+	/* wait synchro */
+	while (rte_atomic32_read(&synchro) == 0)
+		;
+
+	begin = rte_get_timer_cycles();
+	while (lcount < MAX_LOOP) {
+		if (use_lock)
+			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
+
+		lcount++;
+		if (use_lock)
+			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
+	}
+	time_diff = rte_get_timer_cycles() - begin;
+	time_count[lcore] = time_diff * 1000000 / hz;
+	return 0;
+}
+
+static int
+test_mcslock_perf(void)
+{
+	unsigned int i;
+	uint64_t total = 0;
+	int lock = 0;
+	const unsigned int lcore = rte_lcore_id();
+
+	printf("\nTest with no lock on single core...\n");
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on single core...\n");
+	lock = 1;
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()-1));
+
+	rte_atomic32_set(&synchro, 0);
+	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
+	rte_atomic32_set(&synchro, 1);
+
+	rte_eal_mp_wait_lcore();
+
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("Core [%u] Cost Time = %"PRIu64" us\n",
+				i, time_count[i]);
+		total += time_count[i];
+	}
+
+	printf("Total Cost Time = %"PRIu64" us\n", total);
+
+	return 0;
+}
+
+/*
+ * Use rte_mcslock_trylock() to trylock a mcs lock object,
+ * If it could not lock the object successfully, it would
+ * return immediately.
+ */
+static int
+test_mcslock_try(__attribute__((unused)) void *arg)
+{
+	/**< Per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/* Locked ml_try in the master lcore, so it should fail
+	 * when trying to lock it in the slave lcore.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0) {
+		rte_mcslock_lock(&p_ml, &ml_me);
+		count++;
+		rte_mcslock_unlock(&p_ml, &ml_me);
+	}
+
+	return 0;
+}
+
+
+/*
+ * Test rte_eal_get_lcore_state() in addition to mcs locks
+ * as we have "waiting" then "running" lcores.
+ */
+static int
+test_mcslock(void)
+{
+	int ret = 0;
+	int i;
+
+	/* Define per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/*
+	 * Test mcs lock & unlock on each core
+	 */
+
+	/* slave cores should be waiting: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_per_core, NULL, i);
+	}
+
+	/* slave cores should be busy: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	rte_eal_mp_wait_lcore();
+
+	/*
+	 * Test if it could return immediately from try-locking a locked object.
+	 * Here it will lock the mcs lock object first, then launch all the
+	 * slave lcores to trylock the same mcs lock object.
+	 * All the slave lcores should give up try-locking a locked object and
+	 * return immediately, and then increase the "count" initialized with
+	 * zero by one per times.
+	 * We can check if the "count" is finally equal to the number of all
+	 * slave lcores to see if the behavior of try-locking a locked
+	 * mcslock object is correct.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0)
+		return -1;
+
+	count = 0;
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_try, NULL, i);
+	}
+	rte_mcslock_unlock(&p_ml_try, &ml_try_me);
+	rte_eal_mp_wait_lcore();
+
+	/* Test is_locked API */
+	if (rte_mcslock_is_locked(p_ml)) {
+		printf("mcslock is locked but it should not be\n");
+		return -1;
+	}
+
+	/* Counting the locked times in each core */
+	rte_mcslock_lock(&p_ml, &ml_me);
+	if (count != (rte_lcore_count() - 1))
+		ret = -1;
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	/* mcs lock perf test */
+	if (test_mcslock_perf() < 0)
+		return -1;
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(mcslock_autotest, test_mcslock);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
                   ` (2 preceding siblings ...)
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
@ 2019-06-05 16:29 ` David Marchand
  2019-06-05 19:59   ` Honnappa Nagarahalli
  2019-06-06 10:17   ` Phil Yang (Arm Technology China)
  2019-06-05 16:47 ` Stephen Hemminger
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: David Marchand @ 2019-06-05 16:29 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, Thomas Monjalon, Jerin Jacob Kollanukkaran, hemant.agrawal,
	Honnappa Nagarahalli, Gavin Hu, nd

On Wed, Jun 5, 2019 at 6:00 PM Phil Yang <phil.yang@arm.com> wrote:

> This patch set added MCS lock library and its unit test.
>
> The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT)
> provides
> scalability by spinning on a CPU/thread local variable which avoids
> expensive
> cache bouncings. It provides fairness by maintaining a list of acquirers
> and
> passing the lock to each CPU/thread in the order they acquired the lock.
>
> References:
> 1.
> http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
> 2. https://lwn.net/Articles/590243/
>
> Mirco-benchmarking result:
>
> ------------------------------------------------------------------------------------------------
> MCS lock                      | spinlock                       | ticket
> lock
>
> ------------------------------+--------------------------------+--------------------------------
> Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test
> with lock on 14 cores...
> Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core
> [14] cost time = 66761 us
> Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core
> [15] cost time = 66766 us
> Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core
> [16] cost time = 66761 us
> Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core
> [17] cost time = 66767 us
> Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core
> [18] cost time = 66758 us
> Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core
> [19] cost time = 66766 us
> Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core
> [20] cost time = 66763 us
> Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core
> [21] cost time = 66766 us
> Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core
> [22] cost time = 66749 us
> Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core
> [23] cost time = 66765 us
> Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core
> [24] cost time = 66762 us
> Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core
> [25] cost time = 66768 us
> Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core
> [26] cost time = 66768 us
>                               |  Core [27] Cost Time = 38071 us|  Core
> [27] cost time = 66767 us
>
> ------------------------------+--------------------------------+--------------------------------
> Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total
> cost time = 934687 us
>
> ------------------------------------------------------------------------------------------------
>

Had a quick look, interesting.

Quick comments:
- your numbers are for 13 cores, while the other are for 14, what is the
reason?
- do we need per architecture header? all I can see is generic code, we
might as well directly put rte_mcslock.h in the common/include directory.
- could we replace the current spinlock with this approach? is this more
expensive than spinlock on lowly contended locks? is there a reason we want
to keep all these approaches? we could have now 3 lock implementations.
- do we need to write the authors names in full capitalized version? it
seems like you are shouting :-)


-- 
David Marchand

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
                   ` (3 preceding siblings ...)
  2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
@ 2019-06-05 16:47 ` Stephen Hemminger
  2019-06-05 20:48   ` Honnappa Nagarahalli
  2019-06-05 17:35 ` Thomas Monjalon
  2019-07-04 20:12 ` Thomas Monjalon
  6 siblings, 1 reply; 24+ messages in thread
From: Stephen Hemminger @ 2019-06-05 16:47 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd

On Wed,  5 Jun 2019 23:58:45 +0800
Phil Yang <phil.yang@arm.com> wrote:

> This patch set added MCS lock library and its unit test.
> 
> The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
> scalability by spinning on a CPU/thread local variable which avoids expensive
> cache bouncings. It provides fairness by maintaining a list of acquirers and
> passing the lock to each CPU/thread in the order they acquired the lock.
> 
> References:
> 1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
> 2. https://lwn.net/Articles/590243/
> 
> Mirco-benchmarking result:
> ------------------------------------------------------------------------------------------------
> MCS lock                      | spinlock                       | ticket lock
> ------------------------------+--------------------------------+--------------------------------
> Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock on 14 cores...
> Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] cost time = 66761 us
> Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] cost time = 66766 us
> Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time = 66761 us
> Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] cost time = 66767 us
> Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
> Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] cost time = 66766 us
> Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] cost time = 66763 us
> Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time = 66766 us
> Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] cost time = 66749 us
> Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us
> Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time = 66762 us
> Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] cost time = 66768 us
> Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] cost time = 66768 us
>                               |  Core [27] Cost Time = 38071 us|  Core [27] cost time = 66767 us
> ------------------------------+--------------------------------+--------------------------------
> Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost time = 934687 us
> ------------------------------------------------------------------------------------------------

From the user point of view there needs to be clear recommendations
on which lock type to use.

And if one of the lock types is always slower it should be deprecated long term.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
                   ` (4 preceding siblings ...)
  2019-06-05 16:47 ` Stephen Hemminger
@ 2019-06-05 17:35 ` Thomas Monjalon
  2019-07-04 20:12 ` Thomas Monjalon
  6 siblings, 0 replies; 24+ messages in thread
From: Thomas Monjalon @ 2019-06-05 17:35 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd,
	techboard

05/06/2019 17:58, Phil Yang:
> This patch set added MCS lock library and its unit test.

It seems this patch is targetting 19.08,
however it missed the proposal deadline by 2 days.

+Cc techboard for a decision.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
@ 2019-06-05 19:59   ` Honnappa Nagarahalli
  2019-06-06 10:17   ` Phil Yang (Arm Technology China)
  1 sibling, 0 replies; 24+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-05 19:59 UTC (permalink / raw)
  To: David Marchand, Phil Yang (Arm Technology China)
  Cc: dev, thomas, jerinj, hemant.agrawal,
	Gavin Hu (Arm Technology China),
	nd, nd




On Wed, Jun 5, 2019 at 6:00 PM Phil Yang <phil.yang@arm.com<mailto:phil.yang@arm.com>> wrote:
This patch set added MCS lock library and its unit test.

The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Mirco-benchmarking result:
------------------------------------------------------------------------------------------------
MCS lock                      | spinlock                       | ticket lock
------------------------------+--------------------------------+--------------------------------
Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock on 14 cores...
Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] cost time = 66761 us
Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] cost time = 66766 us
Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time = 66761 us
Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] cost time = 66767 us
Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] cost time = 66766 us
Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] cost time = 66763 us
Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time = 66766 us
Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] cost time = 66749 us
Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us
Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time = 66762 us
Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] cost time = 66768 us
Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] cost time = 66768 us
                              |  Core [27] Cost Time = 38071 us|  Core [27] cost time = 66767 us
------------------------------+--------------------------------+--------------------------------
Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost time = 934687 us
------------------------------------------------------------------------------------------------

Had a quick look, interesting.

Quick comments:
- your numbers are for 13 cores, while the other are for 14, what is the reason?
- do we need per architecture header? all I can see is generic code, we might as well directly put rte_mcslock.h in the common/include directory.
- could we replace the current spinlock with this approach? is this more expensive than spinlock on lowly contended locks? is there a reason we want to keep all these approaches? we could have now 3 lock implementations.
- do we need to write the authors names in full capitalized version? it seems like you are shouting :-)
[Honnappa] IMO, writing full names is a cultural thing? I do not see any harm. But, I do think, we do not need to capitalize everything.


--
David Marchand
[Honnappa] This mail is appearing as HTML, would be good to change to text mode. It is much easier to add inline comments.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 16:47 ` Stephen Hemminger
@ 2019-06-05 20:48   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 24+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-05 20:48 UTC (permalink / raw)
  To: Stephen Hemminger, Phil Yang (Arm Technology China), David Marchand
  Cc: dev, thomas, jerinj, hemant.agrawal,
	Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

+David
(had similar questions)

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, June 5, 2019 11:48 AM
> To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
> 
> On Wed,  5 Jun 2019 23:58:45 +0800
> Phil Yang <phil.yang@arm.com> wrote:
> 
> > This patch set added MCS lock library and its unit test.
> >
> > The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L.
> SCOTT)
> > provides scalability by spinning on a CPU/thread local variable which
> > avoids expensive cache bouncings. It provides fairness by maintaining
> > a list of acquirers and passing the lock to each CPU/thread in the order they
> acquired the lock.
> >
> > References:
> > 1.
> > http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-syn
> > chronization-1991.pdf
> > 2. https://lwn.net/Articles/590243/
> >
> > Mirco-benchmarking result:
> > ------------------------------------------------------------------------------------------------
> > MCS lock                      | spinlock                       | ticket lock
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+--
> > Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock
> on 14 cores...
> > Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core
> > [14] cost time = 66761 us Core [16] Cost Time = 22382 us|  Core [15]
> > Cost Time = 46979 us|  Core [15] cost time = 66766 us Core [17] Cost
> > Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time
> > = 66761 us Core [18] Cost Time = 22412 us|  Core [17] Cost Time =
> > 28793 us|  Core [17] cost time = 66767 us Core [19] Cost Time = 22407
> > us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
> > Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core
> > [19] cost time = 66766 us Core [21] Cost Time = 22414 us|  Core [20]
> > Cost Time = 47914 us|  Core [20] cost time = 66763 us Core [22] Cost
> > Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time
> > = 66766 us Core [23] Cost Time = 22435 us|  Core [22] Cost Time =
> > 38900 us|  Core [22] cost time = 66749 us Core [24] Cost Time = 22401 us|
> Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us Core [25] Cost
> Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time =
> 66762 us Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|
> Core [25] cost time = 66768 us Core [27] Cost Time = 22395 us|  Core [26] Cost
> Time = 29439 us|  Core [26] cost time = 66768 us
> >                               |  Core [27] Cost Time = 38071 us|  Core
> > [27] cost time = 66767 us
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+------
> > ------------------------------+--------------------------------+--
> > Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost
> time = 934687 us
> > ----------------------------------------------------------------------
> > --------------------------
> 
> From the user point of view there needs to be clear recommendations on
> which lock type to use.
I think the data about fairness needs to be added to this - especially for ticket lock. IMO, we should consider fairness and space (cache lines) along with cycles.

> 
> And if one of the lock types is always slower it should be deprecated long term.
The ticket lock can be a drop-in replacement for spinlock. Gavin is working on a patch which will make ticket lock as the backend for spinlock through a configuration. But the performance needs to be considered.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
  2019-06-05 19:59   ` Honnappa Nagarahalli
@ 2019-06-06 10:17   ` Phil Yang (Arm Technology China)
  1 sibling, 0 replies; 24+ messages in thread
From: Phil Yang (Arm Technology China) @ 2019-06-06 10:17 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, jerinj, hemant.agrawal, Honnappa Nagarahalli,
	Gavin Hu (Arm Technology China),
	nd, nd


From: David Marchand <david.marchand@redhat.com>
Sent: Thursday, June 6, 2019 12:30 AM
To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>
Cc: dev <dev@dpdk.org>; thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation



On Wed, Jun 5, 2019 at 6:00 PM Phil Yang <phil.yang@arm.com<mailto:phil.yang@arm.com>> wrote:
This patch set added MCS lock library and its unit test.

The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Mirco-benchmarking result:
------------------------------------------------------------------------------------------------
MCS lock                      | spinlock                       | ticket lock
------------------------------+--------------------------------+--------------------------------
Test with lock on 13 cores... |  Test with lock on 14 cores... |  Test with lock on 14 cores...
Core [15] Cost Time = 22426 us|  Core [14] Cost Time = 47974 us|  Core [14] cost time = 66761 us
Core [16] Cost Time = 22382 us|  Core [15] Cost Time = 46979 us|  Core [15] cost time = 66766 us
Core [17] Cost Time = 22294 us|  Core [16] Cost Time = 46044 us|  Core [16] cost time = 66761 us
Core [18] Cost Time = 22412 us|  Core [17] Cost Time = 28793 us|  Core [17] cost time = 66767 us
Core [19] Cost Time = 22407 us|  Core [18] Cost Time = 48349 us|  Core [18] cost time = 66758 us
Core [20] Cost Time = 22436 us|  Core [19] Cost Time = 19381 us|  Core [19] cost time = 66766 us
Core [21] Cost Time = 22414 us|  Core [20] Cost Time = 47914 us|  Core [20] cost time = 66763 us
Core [22] Cost Time = 22405 us|  Core [21] Cost Time = 48333 us|  Core [21] cost time = 66766 us
Core [23] Cost Time = 22435 us|  Core [22] Cost Time = 38900 us|  Core [22] cost time = 66749 us
Core [24] Cost Time = 22401 us|  Core [23] Cost Time = 45374 us|  Core [23] cost time = 66765 us
Core [25] Cost Time = 22408 us|  Core [24] Cost Time = 16121 us|  Core [24] cost time = 66762 us
Core [26] Cost Time = 22380 us|  Core [25] Cost Time = 42731 us|  Core [25] cost time = 66768 us
Core [27] Cost Time = 22395 us|  Core [26] Cost Time = 29439 us|  Core [26] cost time = 66768 us
                              |  Core [27] Cost Time = 38071 us|  Core [27] cost time = 66767 us
------------------------------+--------------------------------+--------------------------------
Total Cost Time = 291195 us   |  Total Cost Time = 544403 us   |  Total cost time = 934687 us
------------------------------------------------------------------------------------------------

Had a quick look, interesting.

Hi David,

Thanks for your comments.

Quick comments:
- your numbers are for 13 cores, while the other are for 14, what is the reason?
[Phil]The test case skipped the master thread while doing the load test. The master thread just controls the trigger. So all the other threads acquiring the lock and running the same workload at the same time.
Actually, there is no difference on per core performance when it involved the master thread in the load test.

- do we need per architecture header? all I can see is generic code, we might as well directly put rte_mcslock.h in the common/include directory.
[Phil] I just trying to leave it for architecture specific optimization.

- could we replace the current spinlock with this approach? is this more expensive than spinlock on lowly contended locks? is there a reason we want to keep all these approaches? we could have now 3 lock implementations.
[Phil] Under the high lock contention scenarios, MCS is much better than spinlock. However, MCS lock is more complicated than spinlock and more expensive than spinlock in the single thread scenario. E.g:
Test with lock on single core..
MCS lock :
Core [14] Cost Time = 327 us

Spinlock:
Core [14] Cost Time = 258 us

ticket lock:
Core [14] cost time = 195 us
I think in low-contention scenarios but you still need mutual exclusion you can use spinlock. It is lighter. I think that all depends on the application.

- do we need to write the authors names in full capitalized version? it seems like you are shouting :-)
[Phil] :-)  I will modify it in the next version. Thanks.


--
David Marchand


Thanks,
Phil Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
@ 2019-06-06 13:42   ` Ananyev, Konstantin
  2019-06-07  5:27     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 24+ messages in thread
From: Ananyev, Konstantin @ 2019-06-06 13:42 UTC (permalink / raw)
  To: Phil Yang, dev
  Cc: thomas, jerinj, hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd

Hi,

> 
> Unit test and perf test for MCS queued lock.

Perf test is important of course, but first I think we need
more robust functional test to make sure that lock does work properly.
At least something like ticketlock test but probably with bigger number of iterations.
10K seems quite small here.
In fact with this one we'll have 3 lock methods, I think it makes sense to have 
one united UT framework for them, so only actual lock/unlock would be different.
Konstantin

> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> ---
>  MAINTAINERS                     |   1 +
>  app/test/Makefile               |   1 +
>  app/test/autotest_data.py       |   6 +
>  app/test/autotest_test_funcs.py |  32 ++++++
>  app/test/meson.build            |   2 +
>  app/test/test_mcslock.c         | 248 ++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 290 insertions(+)
>  create mode 100644 app/test/test_mcslock.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1390238..33fdc8f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -225,6 +225,7 @@ F: app/test/test_ticketlock.c
>  MCSlock - EXPERIMENTAL
>  M: Phil Yang <phil.yang@arm.com>
>  F: lib/librte_eal/common/include/generic/rte_mcslock.h
> +F: app/test/test_mcslock.c
> 
>  ARM v7
>  M: Jan Viktorin <viktorin@rehivetech.com>
> diff --git a/app/test/Makefile b/app/test/Makefile
> index 68d6b4f..be405cd 100644
> --- a/app/test/Makefile
> +++ b/app/test/Makefile
> @@ -64,6 +64,7 @@ SRCS-y += test_atomic.c
>  SRCS-y += test_barrier.c
>  SRCS-y += test_malloc.c
>  SRCS-y += test_cycles.c
> +SRCS-y += test_mcslock.c
>  SRCS-y += test_spinlock.c
>  SRCS-y += test_ticketlock.c
>  SRCS-y += test_memory.c
> diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
> index 0f2c9a7..68ca23d 100644
> --- a/app/test/autotest_data.py
> +++ b/app/test/autotest_data.py
> @@ -177,6 +177,12 @@
>          "Report":  None,
>      },
>      {
> +        "Name":    "MCSlock autotest",
> +        "Command": "mcslock_autotest",
> +        "Func":    mcslock_autotest,
> +        "Report":  None,
> +    },
> +    {
>          "Name":    "Byte order autotest",
>          "Command": "byteorder_autotest",
>          "Func":    default_autotest,
> diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
> index 31cc0f5..26688b7 100644
> --- a/app/test/autotest_test_funcs.py
> +++ b/app/test/autotest_test_funcs.py
> @@ -164,6 +164,38 @@ def ticketlock_autotest(child, test_name):
> 
>      return 0, "Success"
> 
> +def mcslock_autotest(child, test_name):
> +    i = 0
> +    ir = 0
> +    child.sendline(test_name)
> +    while True:
> +        index = child.expect(["Test OK",
> +                              "Test Failed",
> +                              "lcore ([0-9]*) state: ([0-1])"
> +                              "MCS lock taken on core ([0-9]*)",
> +                              "MCS lock released on core ([0-9]*)",
> +                              pexpect.TIMEOUT], timeout=5)
> +        # ok
> +        if index == 0:
> +            break
> +
> +        # message, check ordering
> +        elif index == 2:
> +            if int(child.match.groups()[0]) < i:
> +                return -1, "Fail [Bad order]"
> +            i = int(child.match.groups()[0])
> +        elif index == 3:
> +            if int(child.match.groups()[0]) < ir:
> +                return -1, "Fail [Bad order]"
> +            ir = int(child.match.groups()[0])
> +
> +        # fail
> +        elif index == 4:
> +            return -1, "Fail [Timeout]"
> +        elif index == 1:
> +            return -1, "Fail"
> +
> +    return 0, "Success"
> 
>  def logs_autotest(child, test_name):
>      child.sendline(test_name)
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 83391ce..3f5f17a 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -75,6 +75,7 @@ test_sources = files('commands.c',
>  	'test_memzone.c',
>  	'test_meter.c',
>  	'test_metrics.c',
> +	'test_mcslock.c',
>  	'test_mp_secondary.c',
>  	'test_pdump.c',
>  	'test_per_lcore.c',
> @@ -167,6 +168,7 @@ fast_parallel_test_names = [
>          'lpm6_autotest',
>          'malloc_autotest',
>          'mbuf_autotest',
> +        'mcslock_autotest',
>          'memcpy_autotest',
>          'memory_autotest',
>          'mempool_autotest',
> diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
> new file mode 100644
> index 0000000..a2274e5
> --- /dev/null
> +++ b/app/test/test_mcslock.c
> @@ -0,0 +1,248 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Arm Limited
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/queue.h>
> +
> +#include <rte_common.h>
> +#include <rte_memory.h>
> +#include <rte_per_lcore.h>
> +#include <rte_launch.h>
> +#include <rte_eal.h>
> +#include <rte_lcore.h>
> +#include <rte_cycles.h>
> +#include <rte_mcslock.h>
> +#include <rte_atomic.h>
> +
> +#include "test.h"
> +
> +/*
> + * RTE MCS lock test
> + * =================
> + *
> + * These tests are derived from spin lock test cases.
> + *
> + * - The functional test takes all of these locks and launches the
> + *   ''test_mcslock_per_core()'' function on each core (except the master).
> + *
> + *   - The function takes the global lock, display something, then releases
> + *     the global lock on each core.
> + *
> + * - A load test is carried out, with all cores attempting to lock a single
> + *   lock multiple times.
> + */
> +#include <rte_per_lcore.h>
> +
> +RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me);
> +RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me);
> +RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
> +
> +rte_mcslock_t *p_ml;
> +rte_mcslock_t *p_ml_try;
> +rte_mcslock_t *p_ml_perf;
> +
> +static unsigned int count;
> +
> +static rte_atomic32_t synchro;
> +
> +static int
> +test_mcslock_per_core(__attribute__((unused)) void *arg)
> +{
> +	/* Per core me node. */
> +	rte_mcslock_t ml_me = RTE_PER_LCORE(_ml_me);
> +
> +	rte_mcslock_lock(&p_ml, &ml_me);
> +	printf("MCS lock taken on core %u\n", rte_lcore_id());
> +	rte_mcslock_unlock(&p_ml, &ml_me);
> +	printf("MCS lock released on core %u\n", rte_lcore_id());
> +
> +	return 0;
> +}
> +
> +static uint64_t time_count[RTE_MAX_LCORE] = {0};
> +
> +#define MAX_LOOP 10000
> +
> +static int
> +load_loop_fn(void *func_param)
> +{
> +	uint64_t time_diff = 0, begin;
> +	uint64_t hz = rte_get_timer_hz();
> +	volatile uint64_t lcount = 0;
> +	const int use_lock = *(int *)func_param;
> +	const unsigned int lcore = rte_lcore_id();
> +
> +	/**< Per core me node. */
> +	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
> +
> +	/* wait synchro */
> +	while (rte_atomic32_read(&synchro) == 0)
> +		;
> +
> +	begin = rte_get_timer_cycles();
> +	while (lcount < MAX_LOOP) {
> +		if (use_lock)
> +			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
> +
> +		lcount++;
> +		if (use_lock)
> +			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
> +	}
> +	time_diff = rte_get_timer_cycles() - begin;
> +	time_count[lcore] = time_diff * 1000000 / hz;
> +	return 0;
> +}
> +
> +static int
> +test_mcslock_perf(void)
> +{
> +	unsigned int i;
> +	uint64_t total = 0;
> +	int lock = 0;
> +	const unsigned int lcore = rte_lcore_id();
> +
> +	printf("\nTest with no lock on single core...\n");
> +	rte_atomic32_set(&synchro, 1);
> +	load_loop_fn(&lock);
> +	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> +			lcore, time_count[lcore]);
> +	memset(time_count, 0, sizeof(time_count));
> +
> +	printf("\nTest with lock on single core...\n");
> +	lock = 1;
> +	rte_atomic32_set(&synchro, 1);
> +	load_loop_fn(&lock);
> +	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> +			lcore, time_count[lcore]);
> +	memset(time_count, 0, sizeof(time_count));
> +
> +	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()-1));
> +
> +	rte_atomic32_set(&synchro, 0);
> +	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
> +	rte_atomic32_set(&synchro, 1);
> +
> +	rte_eal_mp_wait_lcore();
> +
> +	RTE_LCORE_FOREACH_SLAVE(i) {
> +		printf("Core [%u] Cost Time = %"PRIu64" us\n",
> +				i, time_count[i]);
> +		total += time_count[i];
> +	}
> +
> +	printf("Total Cost Time = %"PRIu64" us\n", total);
> +
> +	return 0;
> +}
> +
> +/*
> + * Use rte_mcslock_trylock() to trylock a mcs lock object,
> + * If it could not lock the object successfully, it would
> + * return immediately.
> + */
> +static int
> +test_mcslock_try(__attribute__((unused)) void *arg)
> +{
> +	/**< Per core me node. */
> +	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
> +	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
> +
> +	/* Locked ml_try in the master lcore, so it should fail
> +	 * when trying to lock it in the slave lcore.
> +	 */
> +	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0) {
> +		rte_mcslock_lock(&p_ml, &ml_me);
> +		count++;
> +		rte_mcslock_unlock(&p_ml, &ml_me);
> +	}
> +
> +	return 0;
> +}
> +
> +
> +/*
> + * Test rte_eal_get_lcore_state() in addition to mcs locks
> + * as we have "waiting" then "running" lcores.
> + */
> +static int
> +test_mcslock(void)
> +{
> +	int ret = 0;
> +	int i;
> +
> +	/* Define per core me node. */
> +	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
> +	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
> +
> +	/*
> +	 * Test mcs lock & unlock on each core
> +	 */
> +
> +	/* slave cores should be waiting: print it */
> +	RTE_LCORE_FOREACH_SLAVE(i) {
> +		printf("lcore %d state: %d\n", i,
> +				(int) rte_eal_get_lcore_state(i));
> +	}
> +
> +	rte_mcslock_lock(&p_ml, &ml_me);
> +
> +	RTE_LCORE_FOREACH_SLAVE(i) {
> +		rte_eal_remote_launch(test_mcslock_per_core, NULL, i);
> +	}
> +
> +	/* slave cores should be busy: print it */
> +	RTE_LCORE_FOREACH_SLAVE(i) {
> +		printf("lcore %d state: %d\n", i,
> +				(int) rte_eal_get_lcore_state(i));
> +	}
> +
> +	rte_mcslock_unlock(&p_ml, &ml_me);
> +
> +	rte_eal_mp_wait_lcore();
> +
> +	/*
> +	 * Test if it could return immediately from try-locking a locked object.
> +	 * Here it will lock the mcs lock object first, then launch all the
> +	 * slave lcores to trylock the same mcs lock object.
> +	 * All the slave lcores should give up try-locking a locked object and
> +	 * return immediately, and then increase the "count" initialized with
> +	 * zero by one per times.
> +	 * We can check if the "count" is finally equal to the number of all
> +	 * slave lcores to see if the behavior of try-locking a locked
> +	 * mcslock object is correct.
> +	 */
> +	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0)
> +		return -1;
> +
> +	count = 0;
> +	RTE_LCORE_FOREACH_SLAVE(i) {
> +		rte_eal_remote_launch(test_mcslock_try, NULL, i);
> +	}
> +	rte_mcslock_unlock(&p_ml_try, &ml_try_me);
> +	rte_eal_mp_wait_lcore();
> +
> +	/* Test is_locked API */
> +	if (rte_mcslock_is_locked(p_ml)) {
> +		printf("mcslock is locked but it should not be\n");
> +		return -1;
> +	}
> +
> +	/* Counting the locked times in each core */
> +	rte_mcslock_lock(&p_ml, &ml_me);
> +	if (count != (rte_lcore_count() - 1))
> +		ret = -1;
> +	rte_mcslock_unlock(&p_ml, &ml_me);
> +
> +	/* mcs lock perf test */
> +	if (test_mcslock_perf() < 0)
> +		return -1;
> +
> +	return ret;
> +}
> +
> +REGISTER_TEST_COMMAND(mcslock_autotest, test_mcslock);
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test
  2019-06-06 13:42   ` Ananyev, Konstantin
@ 2019-06-07  5:27     ` Honnappa Nagarahalli
  2019-06-10 16:36       ` Phil Yang (Arm Technology China)
  0 siblings, 1 reply; 24+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-07  5:27 UTC (permalink / raw)
  To: Ananyev, Konstantin, Phil Yang (Arm Technology China), dev
  Cc: thomas, jerinj, hemant.agrawal, Gavin Hu (Arm Technology China), nd, nd

> Hi,
> 
> >
> > Unit test and perf test for MCS queued lock.
> 
> Perf test is important of course, but first I think we need more robust
> functional test to make sure that lock does work properly.
> At least something like ticketlock test but probably with bigger number of
> iterations.
> 10K seems quite small here.
> In fact with this one we'll have 3 lock methods, I think it makes sense to have
> one united UT framework for them, so only actual lock/unlock would be
> different.
+1

> Konstantin
> 
> >

<snip>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test
  2019-06-07  5:27     ` Honnappa Nagarahalli
@ 2019-06-10 16:36       ` Phil Yang (Arm Technology China)
  0 siblings, 0 replies; 24+ messages in thread
From: Phil Yang (Arm Technology China) @ 2019-06-10 16:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, dev
  Cc: thomas, jerinj, hemant.agrawal, Gavin Hu (Arm Technology China),
	nd, nd, nd

Hi,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Friday, June 7, 2019 1:27 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Phil Yang (Arm
> Technology China) <Phil.Yang@arm.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com;
> Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock
> unit test
> 
> > Hi,
> >
> > >
> > > Unit test and perf test for MCS queued lock.
> >
> > Perf test is important of course, but first I think we need more
> > robust functional test to make sure that lock does work properly.
> > At least something like ticketlock test but probably with bigger
> > number of iterations.
> > 10K seems quite small here.
Yes. I agree. I think we should also consider the total time consuming of the test. As more cores are involved in the lock contention, more time is required to complete the test. 

> > In fact with this one we'll have 3 lock methods, I think it makes
> > sense to have one united UT framework for them, so only actual
> > lock/unlock would be different.
+1.  
Since the APIs of MCS lock is different with spinlock and ticket lock, so I am wondering that what will this united UT framework look like?
Will it be the same test case that integrates 3 sets of lock tests running with different cmd line?

> +1
> 
> > Konstantin
> >
> > >
> 
> <snip>

Thanks,
Phil Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
                   ` (5 preceding siblings ...)
  2019-06-05 17:35 ` Thomas Monjalon
@ 2019-07-04 20:12 ` Thomas Monjalon
  2019-07-05 10:33   ` Phil Yang (Arm Technology China)
  6 siblings, 1 reply; 24+ messages in thread
From: Thomas Monjalon @ 2019-07-04 20:12 UTC (permalink / raw)
  To: Phil Yang, Honnappa.Nagarahalli; +Cc: dev, jerinj, hemant.agrawal, gavin.hu, nd

05/06/2019 17:58, Phil Yang:
> This patch set added MCS lock library and its unit test.
> 
> The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L. SCOTT) provides
> scalability by spinning on a CPU/thread local variable which avoids expensive
> cache bouncings. It provides fairness by maintaining a list of acquirers and
> passing the lock to each CPU/thread in the order they acquired the lock.

What is the plan for this patch?
Do we try to get more reviews for a merge in 19.08-rc2?
Or we target 19.11?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v2 0/3] MCS queued lock implementation
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
@ 2019-07-05  9:56   ` Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs " Phil Yang
                       ` (2 more replies)
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
  1 sibling, 3 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05  9:56 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

This patch set added MCS lock library and its unit test.

The MCS lock (proposed by John M. Mellor-Crummey and Michael L. Scott) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Micro benchmark (1M iterations):
---------------------------------------------------------------------------------------------------------
MCS lock                          | spinlock                          | ticket lock                      
---------------------------------------------------------------------------------------------------------
Test with lock on 20 cores...     | Test with lock on 20 cores...     | Test with lock on 20 cores...    
Core [0]  Cost Time = 4221256 us  | Core [0]  Cost Time = 1948341 us  | Core [0]  cost time = 25388253 us
Core [14] Cost Time = 4221260 us  | Core [14] Cost Time = 5136122 us  | Core [14] cost time = 25389593 us
Core [28] Cost Time = 4221239 us  | Core [28] Cost Time = 4849188 us  | Core [28] cost time = 25387857 us
Core [29] Cost Time = 4221190 us  | Core [29] Cost Time = 3424137 us  | Core [29] cost time = 25387625 us
Core [30] Cost Time = 4221249 us  | Core [30] Cost Time = 3455813 us  | Core [30] cost time = 25387662 us
Core [31] Cost Time = 4221090 us  | Core [31] Cost Time = 4742221 us  | Core [31] cost time = 25387968 us
Core [32] Cost Time = 4221169 us  | Core [32] Cost Time = 4955011 us  | Core [32] cost time = 25387991 us
Core [33] Cost Time = 4221192 us  | Core [33] Cost Time = 3807345 us  | Core [33] cost time = 25387679 us
Core [34] Cost Time = 4221209 us  | Core [34] Cost Time = 5011178 us  | Core [34] cost time = 25389370 us
Core [35] Cost Time = 4221232 us  | Core [35] Cost Time = 4983119 us  | Core [35] cost time = 25387899 us
Core [36] Cost Time = 4221260 us  | Core [36] Cost Time = 5178121 us  | Core [36] cost time = 25389593 us
Core [37] Cost Time = 4221203 us  | Core [37] Cost Time = 5148525 us  | Core [37] cost time = 25389347 us
Core [38] Cost Time = 4221229 us  | Core [38] Cost Time = 5186183 us  | Core [38] cost time = 25389363 us
Core [39] Cost Time = 4221253 us  | Core [39] Cost Time = 4650058 us  | Core [39] cost time = 25387948 us
Core [40] Cost Time = 4221121 us  | Core [40] Cost Time = 4682572 us  | Core [40] cost time = 25387857 us
Core [41] Cost Time = 4221238 us  | Core [41] Cost Time = 4327049 us  | Core [41] cost time = 25389261 us
Core [42] Cost Time = 4221234 us  | Core [42] Cost Time = 5141807 us  | Core [42] cost time = 25389284 us
Core [43] Cost Time = 4221218 us  | Core [43] Cost Time = 3346939 us  | Core [43] cost time = 25387967 us
Core [44] Cost Time = 4221220 us  | Core [44] Cost Time = 2768786 us  | Core [44] cost time = 25387771 us
Core [45] Cost Time = 4221221 us  | Core [45] Cost Time = 3525078 us  | Core [45] cost time = 25389044 us
---------------------------------------------------------------------------------------------------------
Total Cost Time = 84424283 us     | Total Cost Time = 86267593 us     | Total cost time = 507769332 us   
---------------------------------------------------------------------------------------------------------

Summary:
1. In the lock contention scenario, MCS lock and ticket lock can grantee
the fairness for each lock acquirers. MCS lock has better performance
than ticket lock.
2. Spinlock is fast, however spinlock has the unfairness issue in the
lock contention case. This will make some lock acquirers got starved.
MCS lock is fast and fair comparing with spinlock.


v2
1. Lowercase the algrithom author's name; (David Marchand)
2. Add the load test on master core to align with other locks test; (David Marchand)
3. Enlarge the test iterations from 10K to 1M; (Ananyev Konstantin)
4. Fixed potential deadlock issues.

v1
Initial version.

Phil Yang (3):
  eal/mcslock: add mcs queued lock implementation
  eal/mcslock: use generic msc queued lock on all arch
  test/mcslock: add mcs queued lock unit test

 MAINTAINERS                                        |   5 +
 app/test/Makefile                                  |   1 +
 app/test/autotest_data.py                          |   6 +
 app/test/autotest_test_funcs.py                    |  32 +++
 app/test/meson.build                               |   2 +
 app/test/test_mcslock.c                            | 251 +++++++++++++++++++++
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/arch/arm/rte_mcslock.h          |  23 ++
 .../common/include/arch/ppc_64/rte_mcslock.h       |  19 ++
 .../common/include/arch/x86/rte_mcslock.h          |  19 ++
 .../common/include/generic/rte_mcslock.h           | 175 ++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 14 files changed, 542 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_mcslock.c
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs queued lock implementation
  2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
@ 2019-07-05  9:56     ` Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
  2 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05  9:56 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.

This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>

---
 MAINTAINERS                                        |   4 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/generic/rte_mcslock.h           | 175 +++++++++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 6 files changed, 188 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6054220..c6f81f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -234,6 +234,10 @@ F: lib/librte_eal/common/include/rte_random.h
 F: lib/librte_eal/common/rte_random.c
 F: app/test/test_rand_perf.c
 
+MCSlock - EXPERIMENTAL
+M: Phil Yang <phil.yang@arm.com>
+F: lib/librte_eal/common/include/generic/rte_mcslock.h
+
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
 M: Gavin Hu <gavin.hu@arm.com>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 715248d..d0e32b1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -63,6 +63,7 @@ The public API headers are grouped by topics:
 
 - **locks**:
   [atomic]             (@ref rte_atomic.h),
+  [mcslock]            (@ref rte_mcslock.h),
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index 21934bf..5afaeab 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -139,6 +139,12 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
+* **Added MCS lock library.**
+
+  Added MCS lock library. It provides scalability by spinning on a
+  CPU/thread local variable which avoids expensive cache bouncings.
+  It provides fairness by maintaining a list of acquirers and passing
+  the lock to each CPU/thread in the order they acquired the lock.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 1647af7..a00d4fc 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -21,7 +21,7 @@ INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_memcpy.h rte_cpuflags.h
-GENERIC_INC += rte_spinlock.h rte_rwlock.h rte_ticketlock.h
+GENERIC_INC += rte_mcslock.h rte_spinlock.h rte_rwlock.h rte_ticketlock.h
 GENERIC_INC += rte_vect.h rte_pause.h rte_io.h
 
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
diff --git a/lib/librte_eal/common/include/generic/rte_mcslock.h b/lib/librte_eal/common/include/generic/rte_mcslock.h
new file mode 100644
index 0000000..118f4c9
--- /dev/null
+++ b/lib/librte_eal/common/include/generic/rte_mcslock.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_H_
+#define _RTE_MCSLOCK_H_
+
+/**
+ * @file
+ *
+ * RTE MCS lock
+ *
+ * This file defines the main data structure and APIs for MCS queued lock.
+ *
+ * The MCS lock (proposed by John M. Mellor-Crummey and Michael L. Scott)
+ * provides scalability by spinning on a CPU/thread local variable which
+ * avoids expensive cache bouncings. It provides fairness by maintaining
+ * a list of acquirers and passing the lock to each CPU/thread in the order
+ * they acquired the lock.
+ */
+
+#include <rte_lcore.h>
+#include <rte_common.h>
+#include <rte_pause.h>
+
+/**
+ * The rte_mcslock_t type.
+ */
+typedef struct rte_mcslock {
+	struct rte_mcslock *next;
+	int locked; /* 1 if the queue locked, 0 otherwise */
+} rte_mcslock_t;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Take the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ *   When the lock is initialized or declared, the msl pointer should be
+ *   set to NULL.
+ * @param me
+ *   A pointer to a new node of MCS lock. Each CPU/thread acquiring the
+ *   lock should use its 'own node'.
+ */
+static inline __rte_experimental void
+rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	rte_mcslock_t *prev;
+
+	/* Init me node */
+	__atomic_store_n(&me->locked, 1, __ATOMIC_RELAXED);
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* If the queue is empty, the exchange operation is enough to acquire
+	 * the lock. Hence, the exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node is
+	 * visible to other CPUs/threads. Hence, the exchange operation requires
+	 * release semantics as well.
+	 */
+	prev = __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL);
+	if (likely(prev == NULL)) {
+		/* Queue was empty, no further action required,
+		 * proceed with lock taken.
+		 */
+		return;
+	}
+	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
+
+	/* The while-load of me->locked should not move above the previous
+	 * store to prev->next. Otherwise it will cause a deadlock. Need a
+	 * store-load barrier.
+	 */
+	__atomic_thread_fence(__ATOMIC_ACQ_REL);
+	/* If the lock has already been acquired, it first atomically
+	 * places the node at the end of the queue and then proceeds
+	 * to spin on me->locked until the previous lock holder resets
+	 * the me->locked using mcslock_unlock().
+	 */
+	while (__atomic_load_n(&me->locked, __ATOMIC_ACQUIRE))
+		rte_pause();
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Release the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to the node of MCS lock passed in rte_mcslock_lock.
+ */
+static inline __rte_experimental void
+rte_mcslock_unlock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Check if there are more nodes in the queue. */
+	if (likely(__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)) {
+		/* No, last member in the queue. */
+		rte_mcslock_t *save_me = __atomic_load_n(&me, __ATOMIC_RELAXED);
+
+		/* Release the lock by setting it to NULL */
+		if (likely(__atomic_compare_exchange_n(msl, &save_me, NULL, 0,
+				__ATOMIC_RELEASE, __ATOMIC_RELAXED)))
+			return;
+
+		/* Speculative execution would be allowed to read in the
+		 * while-loop first. This has the potential to cause a
+		 * deadlock. Need a load barrier.
+		 */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* More nodes added to the queue by other CPUs.
+		 * Wait until the next pointer is set.
+		 */
+		while (__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)
+			rte_pause();
+	}
+
+	/* Pass lock to next waiter. */
+	__atomic_store_n(&me->next->locked, 0, __ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Try to take the lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to a new node of MCS lock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_mcslock_trylock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Init me node */
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* Try to lock */
+	rte_mcslock_t *expected = NULL;
+
+	/* The lock can be taken only when the queue is empty. Hence,
+	 * the compare-exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node
+	 * is visible to other CPUs/threads. Hence, the compare-exchange
+	 * operation requires release semantics as well.
+	 */
+	return __atomic_compare_exchange_n(msl, &expected, me, 0,
+			__ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Test if the lock is taken.
+ *
+ * @param msl
+ *   A pointer to a MCS lock node.
+ * @return
+ *   1 if the lock is currently taken; 0 otherwise.
+ */
+static inline __rte_experimental int
+rte_mcslock_is_locked(rte_mcslock_t *msl)
+{
+	return (__atomic_load_n(&msl, __ATOMIC_RELAXED) != NULL);
+}
+
+#endif /* _RTE_MCSLOCK_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index bafd232..a54ece8 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -95,6 +95,7 @@ generic_headers = files(
 	'include/generic/rte_cpuflags.h',
 	'include/generic/rte_cycles.h',
 	'include/generic/rte_io.h',
+	'include/generic/rte_mcslock.h',
 	'include/generic/rte_memcpy.h',
 	'include/generic/rte_pause.h',
 	'include/generic/rte_prefetch.h',
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v2 2/3] eal/mcslock: use generic msc queued lock on all arch
  2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs " Phil Yang
@ 2019-07-05  9:56     ` Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
  2 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05  9:56 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

Let all architectures use generic MCS queued lock implementation.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 .../common/include/arch/arm/rte_mcslock.h          | 23 ++++++++++++++++++++++
 .../common/include/arch/ppc_64/rte_mcslock.h       | 19 ++++++++++++++++++
 .../common/include/arch/x86/rte_mcslock.h          | 19 ++++++++++++++++++
 3 files changed, 61 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_mcslock.h b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
new file mode 100644
index 0000000..5e41e32
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_ARM_H_
+#define _RTE_MCSLOCK_ARM_H_
+
+#ifndef RTE_FORCE_INTRINSICS
+#  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_ARM_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
new file mode 100644
index 0000000..951b6dd
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_PPC_64_H_
+#define _RTE_MCSLOCK_PPC_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_PPC_64_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/x86/rte_mcslock.h b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
new file mode 100644
index 0000000..573b700
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_X86_64_H_
+#define _RTE_MCSLOCK_X86_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_X86_64_H_ */
+
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v2 3/3] test/mcslock: add mcs queued lock unit test
  2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs " Phil Yang
  2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
@ 2019-07-05  9:56     ` Phil Yang
  2 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05  9:56 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

Unit test and perf test for MCS queued lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 MAINTAINERS                     |   1 +
 app/test/Makefile               |   1 +
 app/test/autotest_data.py       |   6 +
 app/test/autotest_test_funcs.py |  32 +++++
 app/test/meson.build            |   2 +
 app/test/test_mcslock.c         | 251 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 293 insertions(+)
 create mode 100644 app/test/test_mcslock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c6f81f4..f2760f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -237,6 +237,7 @@ F: app/test/test_rand_perf.c
 MCSlock - EXPERIMENTAL
 M: Phil Yang <phil.yang@arm.com>
 F: lib/librte_eal/common/include/generic/rte_mcslock.h
+F: app/test/test_mcslock.c
 
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
diff --git a/app/test/Makefile b/app/test/Makefile
index be0f392..1cea026 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -64,6 +64,7 @@ SRCS-y += test_atomic.c
 SRCS-y += test_barrier.c
 SRCS-y += test_malloc.c
 SRCS-y += test_cycles.c
+SRCS-y += test_mcslock.c
 SRCS-y += test_spinlock.c
 SRCS-y += test_ticketlock.c
 SRCS-y += test_memory.c
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 6cf7eca..e10a60d 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -177,6 +177,12 @@
         "Report":  None,
     },
     {
+        "Name":    "MCSlock autotest",
+        "Command": "mcslock_autotest",
+        "Func":    mcslock_autotest,
+        "Report":  None,
+    },
+    {
         "Name":    "Byte order autotest",
         "Command": "byteorder_autotest",
         "Func":    default_autotest,
diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
index 31cc0f5..26688b7 100644
--- a/app/test/autotest_test_funcs.py
+++ b/app/test/autotest_test_funcs.py
@@ -164,6 +164,38 @@ def ticketlock_autotest(child, test_name):
 
     return 0, "Success"
 
+def mcslock_autotest(child, test_name):
+    i = 0
+    ir = 0
+    child.sendline(test_name)
+    while True:
+        index = child.expect(["Test OK",
+                              "Test Failed",
+                              "lcore ([0-9]*) state: ([0-1])"
+                              "MCS lock taken on core ([0-9]*)",
+                              "MCS lock released on core ([0-9]*)",
+                              pexpect.TIMEOUT], timeout=5)
+        # ok
+        if index == 0:
+            break
+
+        # message, check ordering
+        elif index == 2:
+            if int(child.match.groups()[0]) < i:
+                return -1, "Fail [Bad order]"
+            i = int(child.match.groups()[0])
+        elif index == 3:
+            if int(child.match.groups()[0]) < ir:
+                return -1, "Fail [Bad order]"
+            ir = int(child.match.groups()[0])
+
+        # fail
+        elif index == 4:
+            return -1, "Fail [Timeout]"
+        elif index == 1:
+            return -1, "Fail"
+
+    return 0, "Success"
 
 def logs_autotest(child, test_name):
     child.sendline(test_name)
diff --git a/app/test/meson.build b/app/test/meson.build
index 466fd56..fe095cf 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -80,6 +80,7 @@ test_sources = files('commands.c',
 	'test_memzone.c',
 	'test_meter.c',
 	'test_metrics.c',
+	'test_mcslock.c',
 	'test_mp_secondary.c',
 	'test_pdump.c',
 	'test_per_lcore.c',
@@ -185,6 +186,7 @@ fast_test_names = [
         'lpm6_autotest',
         'malloc_autotest',
         'mbuf_autotest',
+        'mcslock_autotest',
         'memcpy_autotest',
         'memory_autotest',
         'mempool_autotest',
diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
new file mode 100644
index 0000000..1004563
--- /dev/null
+++ b/app/test/test_mcslock.c
@@ -0,0 +1,251 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_per_lcore.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_mcslock.h>
+#include <rte_atomic.h>
+
+#include "test.h"
+
+/*
+ * RTE MCS lock test
+ * =================
+ *
+ * These tests are derived from spin lock test cases.
+ *
+ * - The functional test takes all of these locks and launches the
+ *   ''test_mcslock_per_core()'' function on each core (except the master).
+ *
+ *   - The function takes the global lock, display something, then releases
+ *     the global lock on each core.
+ *
+ * - A load test is carried out, with all cores attempting to lock a single
+ *   lock multiple times.
+ */
+#include <rte_per_lcore.h>
+
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
+
+rte_mcslock_t *p_ml;
+rte_mcslock_t *p_ml_try;
+rte_mcslock_t *p_ml_perf;
+
+static unsigned int count;
+
+static rte_atomic32_t synchro;
+
+static int
+test_mcslock_per_core(__attribute__((unused)) void *arg)
+{
+	/* Per core me node. */
+	rte_mcslock_t ml_me = RTE_PER_LCORE(_ml_me);
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+	printf("MCS lock taken on core %u\n", rte_lcore_id());
+	rte_mcslock_unlock(&p_ml, &ml_me);
+	printf("MCS lock released on core %u\n", rte_lcore_id());
+
+	return 0;
+}
+
+static uint64_t time_count[RTE_MAX_LCORE] = {0};
+
+#define MAX_LOOP 1000000
+
+static int
+load_loop_fn(void *func_param)
+{
+	uint64_t time_diff = 0, begin;
+	uint64_t hz = rte_get_timer_hz();
+	volatile uint64_t lcount = 0;
+	const int use_lock = *(int *)func_param;
+	const unsigned int lcore = rte_lcore_id();
+
+	/**< Per core me node. */
+	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
+
+	/* wait synchro */
+	while (rte_atomic32_read(&synchro) == 0)
+		;
+
+	begin = rte_get_timer_cycles();
+	while (lcount < MAX_LOOP) {
+		if (use_lock)
+			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
+
+		lcount++;
+		if (use_lock)
+			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
+	}
+	time_diff = rte_get_timer_cycles() - begin;
+	time_count[lcore] = time_diff * 1000000 / hz;
+	return 0;
+}
+
+static int
+test_mcslock_perf(void)
+{
+	unsigned int i;
+	uint64_t total = 0;
+	int lock = 0;
+	const unsigned int lcore = rte_lcore_id();
+
+	printf("\nTest with no lock on single core...\n");
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on single core...\n");
+	lock = 1;
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
+
+	rte_atomic32_set(&synchro, 0);
+	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
+
+	/* start synchro and launch test on master */
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+
+	rte_eal_mp_wait_lcore();
+
+	RTE_LCORE_FOREACH(i) {
+		printf("Core [%u] Cost Time = %"PRIu64" us\n",
+				i, time_count[i]);
+		total += time_count[i];
+	}
+
+	printf("Total Cost Time = %"PRIu64" us\n", total);
+
+	return 0;
+}
+
+/*
+ * Use rte_mcslock_trylock() to trylock a mcs lock object,
+ * If it could not lock the object successfully, it would
+ * return immediately.
+ */
+static int
+test_mcslock_try(__attribute__((unused)) void *arg)
+{
+	/**< Per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/* Locked ml_try in the master lcore, so it should fail
+	 * when trying to lock it in the slave lcore.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0) {
+		rte_mcslock_lock(&p_ml, &ml_me);
+		count++;
+		rte_mcslock_unlock(&p_ml, &ml_me);
+	}
+
+	return 0;
+}
+
+
+/*
+ * Test rte_eal_get_lcore_state() in addition to mcs locks
+ * as we have "waiting" then "running" lcores.
+ */
+static int
+test_mcslock(void)
+{
+	int ret = 0;
+	int i;
+
+	/* Define per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/*
+	 * Test mcs lock & unlock on each core
+	 */
+
+	/* slave cores should be waiting: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_per_core, NULL, i);
+	}
+
+	/* slave cores should be busy: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	rte_eal_mp_wait_lcore();
+
+	/*
+	 * Test if it could return immediately from try-locking a locked object.
+	 * Here it will lock the mcs lock object first, then launch all the
+	 * slave lcores to trylock the same mcs lock object.
+	 * All the slave lcores should give up try-locking a locked object and
+	 * return immediately, and then increase the "count" initialized with
+	 * zero by one per times.
+	 * We can check if the "count" is finally equal to the number of all
+	 * slave lcores to see if the behavior of try-locking a locked
+	 * mcslock object is correct.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0)
+		return -1;
+
+	count = 0;
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_try, NULL, i);
+	}
+	rte_mcslock_unlock(&p_ml_try, &ml_try_me);
+	rte_eal_mp_wait_lcore();
+
+	/* Test is_locked API */
+	if (rte_mcslock_is_locked(p_ml)) {
+		printf("mcslock is locked but it should not be\n");
+		return -1;
+	}
+
+	/* Counting the locked times in each core */
+	rte_mcslock_lock(&p_ml, &ml_me);
+	if (count != (rte_lcore_count() - 1))
+		ret = -1;
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	/* mcs lock perf test */
+	if (test_mcslock_perf() < 0)
+		return -1;
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(mcslock_autotest, test_mcslock);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation
  2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
  2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
@ 2019-07-05 10:27   ` Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs " Phil Yang
                       ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05 10:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

This patch set added MCS lock library and its unit test.

The MCS lock (proposed by John M. Mellor-Crummey and Michael L. Scott) provides
scalability by spinning on a CPU/thread local variable which avoids expensive
cache bouncings. It provides fairness by maintaining a list of acquirers and
passing the lock to each CPU/thread in the order they acquired the lock.

References:
1. http://web.mit.edu/6.173/www/currentsemester/readings/R06-scalable-synchronization-1991.pdf
2. https://lwn.net/Articles/590243/

Micro benchmark (1M iterations):
---------------------------------------------------------------------------------------------------------
MCS lock                          | spinlock                          | ticket lock
---------------------------------------------------------------------------------------------------------
Test with lock on 20 cores...     | Test with lock on 20 cores...     | Test with lock on 20 cores...
Core [0]  Cost Time = 4221256 us  | Core [0]  Cost Time = 1948341 us  | Core [0]  cost time = 25388253 us
Core [14] Cost Time = 4221260 us  | Core [14] Cost Time = 5136122 us  | Core [14] cost time = 25389593 us
Core [28] Cost Time = 4221239 us  | Core [28] Cost Time = 4849188 us  | Core [28] cost time = 25387857 us
Core [29] Cost Time = 4221190 us  | Core [29] Cost Time = 3424137 us  | Core [29] cost time = 25387625 us
Core [30] Cost Time = 4221249 us  | Core [30] Cost Time = 3455813 us  | Core [30] cost time = 25387662 us
Core [31] Cost Time = 4221090 us  | Core [31] Cost Time = 4742221 us  | Core [31] cost time = 25387968 us
Core [32] Cost Time = 4221169 us  | Core [32] Cost Time = 4955011 us  | Core [32] cost time = 25387991 us
Core [33] Cost Time = 4221192 us  | Core [33] Cost Time = 3807345 us  | Core [33] cost time = 25387679 us
Core [34] Cost Time = 4221209 us  | Core [34] Cost Time = 5011178 us  | Core [34] cost time = 25389370 us
Core [35] Cost Time = 4221232 us  | Core [35] Cost Time = 4983119 us  | Core [35] cost time = 25387899 us
Core [36] Cost Time = 4221260 us  | Core [36] Cost Time = 5178121 us  | Core [36] cost time = 25389593 us
Core [37] Cost Time = 4221203 us  | Core [37] Cost Time = 5148525 us  | Core [37] cost time = 25389347 us
Core [38] Cost Time = 4221229 us  | Core [38] Cost Time = 5186183 us  | Core [38] cost time = 25389363 us
Core [39] Cost Time = 4221253 us  | Core [39] Cost Time = 4650058 us  | Core [39] cost time = 25387948 us
Core [40] Cost Time = 4221121 us  | Core [40] Cost Time = 4682572 us  | Core [40] cost time = 25387857 us
Core [41] Cost Time = 4221238 us  | Core [41] Cost Time = 4327049 us  | Core [41] cost time = 25389261 us
Core [42] Cost Time = 4221234 us  | Core [42] Cost Time = 5141807 us  | Core [42] cost time = 25389284 us
Core [43] Cost Time = 4221218 us  | Core [43] Cost Time = 3346939 us  | Core [43] cost time = 25387967 us
Core [44] Cost Time = 4221220 us  | Core [44] Cost Time = 2768786 us  | Core [44] cost time = 25387771 us
Core [45] Cost Time = 4221221 us  | Core [45] Cost Time = 3525078 us  | Core [45] cost time = 25389044 us
---------------------------------------------------------------------------------------------------------
Total Cost Time = 84424283 us     | Total Cost Time = 86267593 us     | Total cost time = 507769332 us
---------------------------------------------------------------------------------------------------------

Summary:
1. In the lock contention scenario, MCS lock and ticket lock can grantee
the fairness for each lock acquirers. MCS lock has better performance
than ticket lock.
2. Spinlock is fast, however spinlock has the unfairness issue in the
lock contention case. This will make some lock acquirers got starved.
MCS lock is fast and fair comparing with spinlock.

v3
Fix __rte_experimental coding style warning.

v2
1. Lowercase the algrithom author's name; (David Marchand)
2. Add the load test on master core to align with other locks test; (David Marchand)
3. Enlarge the test iterations from 10K to 1M; (Ananyev Konstantin)
4. Fixed potential deadlock issues.

v1
Initial version.


Phil Yang (3):
  eal/mcslock: add mcs queued lock implementation
  eal/mcslock: use generic msc queued lock on all arch
  test/mcslock: add mcs queued lock unit test

 MAINTAINERS                                        |   5 +
 app/test/Makefile                                  |   1 +
 app/test/autotest_data.py                          |   6 +
 app/test/autotest_test_funcs.py                    |  32 +++
 app/test/meson.build                               |   2 +
 app/test/test_mcslock.c                            | 251 +++++++++++++++++++++
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/arch/arm/rte_mcslock.h          |  23 ++
 .../common/include/arch/ppc_64/rte_mcslock.h       |  19 ++
 .../common/include/arch/x86/rte_mcslock.h          |  19 ++
 .../common/include/generic/rte_mcslock.h           | 179 +++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 14 files changed, 546 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_mcslock.c
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs queued lock implementation
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
@ 2019-07-05 10:27     ` Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05 10:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.

This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>

---
 MAINTAINERS                                        |   4 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/generic/rte_mcslock.h           | 179 +++++++++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 6 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6054220..c6f81f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -234,6 +234,10 @@ F: lib/librte_eal/common/include/rte_random.h
 F: lib/librte_eal/common/rte_random.c
 F: app/test/test_rand_perf.c
 
+MCSlock - EXPERIMENTAL
+M: Phil Yang <phil.yang@arm.com>
+F: lib/librte_eal/common/include/generic/rte_mcslock.h
+
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
 M: Gavin Hu <gavin.hu@arm.com>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 715248d..d0e32b1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -63,6 +63,7 @@ The public API headers are grouped by topics:
 
 - **locks**:
   [atomic]             (@ref rte_atomic.h),
+  [mcslock]            (@ref rte_mcslock.h),
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index 21934bf..5afaeab 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -139,6 +139,12 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
+* **Added MCS lock library.**
+
+  Added MCS lock library. It provides scalability by spinning on a
+  CPU/thread local variable which avoids expensive cache bouncings.
+  It provides fairness by maintaining a list of acquirers and passing
+  the lock to each CPU/thread in the order they acquired the lock.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 1647af7..a00d4fc 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -21,7 +21,7 @@ INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_memcpy.h rte_cpuflags.h
-GENERIC_INC += rte_spinlock.h rte_rwlock.h rte_ticketlock.h
+GENERIC_INC += rte_mcslock.h rte_spinlock.h rte_rwlock.h rte_ticketlock.h
 GENERIC_INC += rte_vect.h rte_pause.h rte_io.h
 
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
diff --git a/lib/librte_eal/common/include/generic/rte_mcslock.h b/lib/librte_eal/common/include/generic/rte_mcslock.h
new file mode 100644
index 0000000..2bef283
--- /dev/null
+++ b/lib/librte_eal/common/include/generic/rte_mcslock.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_H_
+#define _RTE_MCSLOCK_H_
+
+/**
+ * @file
+ *
+ * RTE MCS lock
+ *
+ * This file defines the main data structure and APIs for MCS queued lock.
+ *
+ * The MCS lock (proposed by John M. Mellor-Crummey and Michael L. Scott)
+ * provides scalability by spinning on a CPU/thread local variable which
+ * avoids expensive cache bouncings. It provides fairness by maintaining
+ * a list of acquirers and passing the lock to each CPU/thread in the order
+ * they acquired the lock.
+ */
+
+#include <rte_lcore.h>
+#include <rte_common.h>
+#include <rte_pause.h>
+
+/**
+ * The rte_mcslock_t type.
+ */
+typedef struct rte_mcslock {
+	struct rte_mcslock *next;
+	int locked; /* 1 if the queue locked, 0 otherwise */
+} rte_mcslock_t;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Take the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ *   When the lock is initialized or declared, the msl pointer should be
+ *   set to NULL.
+ * @param me
+ *   A pointer to a new node of MCS lock. Each CPU/thread acquiring the
+ *   lock should use its 'own node'.
+ */
+__rte_experimental
+static inline void
+rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	rte_mcslock_t *prev;
+
+	/* Init me node */
+	__atomic_store_n(&me->locked, 1, __ATOMIC_RELAXED);
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* If the queue is empty, the exchange operation is enough to acquire
+	 * the lock. Hence, the exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node is
+	 * visible to other CPUs/threads. Hence, the exchange operation requires
+	 * release semantics as well.
+	 */
+	prev = __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL);
+	if (likely(prev == NULL)) {
+		/* Queue was empty, no further action required,
+		 * proceed with lock taken.
+		 */
+		return;
+	}
+	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
+
+	/* The while-load of me->locked should not move above the previous
+	 * store to prev->next. Otherwise it will cause a deadlock. Need a
+	 * store-load barrier.
+	 */
+	__atomic_thread_fence(__ATOMIC_ACQ_REL);
+	/* If the lock has already been acquired, it first atomically
+	 * places the node at the end of the queue and then proceeds
+	 * to spin on me->locked until the previous lock holder resets
+	 * the me->locked using mcslock_unlock().
+	 */
+	while (__atomic_load_n(&me->locked, __ATOMIC_ACQUIRE))
+		rte_pause();
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Release the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to the node of MCS lock passed in rte_mcslock_lock.
+ */
+__rte_experimental
+static inline void
+rte_mcslock_unlock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Check if there are more nodes in the queue. */
+	if (likely(__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)) {
+		/* No, last member in the queue. */
+		rte_mcslock_t *save_me = __atomic_load_n(&me, __ATOMIC_RELAXED);
+
+		/* Release the lock by setting it to NULL */
+		if (likely(__atomic_compare_exchange_n(msl, &save_me, NULL, 0,
+				__ATOMIC_RELEASE, __ATOMIC_RELAXED)))
+			return;
+
+		/* Speculative execution would be allowed to read in the
+		 * while-loop first. This has the potential to cause a
+		 * deadlock. Need a load barrier.
+		 */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* More nodes added to the queue by other CPUs.
+		 * Wait until the next pointer is set.
+		 */
+		while (__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)
+			rte_pause();
+	}
+
+	/* Pass lock to next waiter. */
+	__atomic_store_n(&me->next->locked, 0, __ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Try to take the lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to a new node of MCS lock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_mcslock_trylock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Init me node */
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* Try to lock */
+	rte_mcslock_t *expected = NULL;
+
+	/* The lock can be taken only when the queue is empty. Hence,
+	 * the compare-exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node
+	 * is visible to other CPUs/threads. Hence, the compare-exchange
+	 * operation requires release semantics as well.
+	 */
+	return __atomic_compare_exchange_n(msl, &expected, me, 0,
+			__ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Test if the lock is taken.
+ *
+ * @param msl
+ *   A pointer to a MCS lock node.
+ * @return
+ *   1 if the lock is currently taken; 0 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_mcslock_is_locked(rte_mcslock_t *msl)
+{
+	return (__atomic_load_n(&msl, __ATOMIC_RELAXED) != NULL);
+}
+
+#endif /* _RTE_MCSLOCK_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index bafd232..a54ece8 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -95,6 +95,7 @@ generic_headers = files(
 	'include/generic/rte_cpuflags.h',
 	'include/generic/rte_cycles.h',
 	'include/generic/rte_io.h',
+	'include/generic/rte_mcslock.h',
 	'include/generic/rte_memcpy.h',
 	'include/generic/rte_pause.h',
 	'include/generic/rte_prefetch.h',
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] eal/mcslock: use generic msc queued lock on all arch
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs " Phil Yang
@ 2019-07-05 10:27     ` Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
  2019-07-07 21:49     ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Thomas Monjalon
  3 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05 10:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

Let all architectures use generic MCS queued lock implementation.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 .../common/include/arch/arm/rte_mcslock.h          | 23 ++++++++++++++++++++++
 .../common/include/arch/ppc_64/rte_mcslock.h       | 19 ++++++++++++++++++
 .../common/include/arch/x86/rte_mcslock.h          | 19 ++++++++++++++++++
 3 files changed, 61 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_mcslock.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_mcslock.h b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
new file mode 100644
index 0000000..5e41e32
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_mcslock.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_ARM_H_
+#define _RTE_MCSLOCK_ARM_H_
+
+#ifndef RTE_FORCE_INTRINSICS
+#  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_ARM_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
new file mode 100644
index 0000000..951b6dd
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_PPC_64_H_
+#define _RTE_MCSLOCK_PPC_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_PPC_64_H_ */
+
diff --git a/lib/librte_eal/common/include/arch/x86/rte_mcslock.h b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
new file mode 100644
index 0000000..573b700
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/x86/rte_mcslock.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_X86_64_H_
+#define _RTE_MCSLOCK_X86_64_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_mcslock.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MCSLOCK_X86_64_H_ */
+
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] test/mcslock: add mcs queued lock unit test
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs " Phil Yang
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
@ 2019-07-05 10:27     ` Phil Yang
  2019-07-07 21:49     ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Thomas Monjalon
  3 siblings, 0 replies; 24+ messages in thread
From: Phil Yang @ 2019-07-05 10:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, konstantin.ananyev, jerinj,
	hemant.agrawal, Honnappa.Nagarahalli, gavin.hu, nd, phil.yang

Unit test and perf test for MCS queued lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

---
 MAINTAINERS                     |   1 +
 app/test/Makefile               |   1 +
 app/test/autotest_data.py       |   6 +
 app/test/autotest_test_funcs.py |  32 +++++
 app/test/meson.build            |   2 +
 app/test/test_mcslock.c         | 251 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 293 insertions(+)
 create mode 100644 app/test/test_mcslock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c6f81f4..f2760f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -237,6 +237,7 @@ F: app/test/test_rand_perf.c
 MCSlock - EXPERIMENTAL
 M: Phil Yang <phil.yang@arm.com>
 F: lib/librte_eal/common/include/generic/rte_mcslock.h
+F: app/test/test_mcslock.c
 
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
diff --git a/app/test/Makefile b/app/test/Makefile
index be0f392..1cea026 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -64,6 +64,7 @@ SRCS-y += test_atomic.c
 SRCS-y += test_barrier.c
 SRCS-y += test_malloc.c
 SRCS-y += test_cycles.c
+SRCS-y += test_mcslock.c
 SRCS-y += test_spinlock.c
 SRCS-y += test_ticketlock.c
 SRCS-y += test_memory.c
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 6cf7eca..e10a60d 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -177,6 +177,12 @@
         "Report":  None,
     },
     {
+        "Name":    "MCSlock autotest",
+        "Command": "mcslock_autotest",
+        "Func":    mcslock_autotest,
+        "Report":  None,
+    },
+    {
         "Name":    "Byte order autotest",
         "Command": "byteorder_autotest",
         "Func":    default_autotest,
diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
index 31cc0f5..26688b7 100644
--- a/app/test/autotest_test_funcs.py
+++ b/app/test/autotest_test_funcs.py
@@ -164,6 +164,38 @@ def ticketlock_autotest(child, test_name):
 
     return 0, "Success"
 
+def mcslock_autotest(child, test_name):
+    i = 0
+    ir = 0
+    child.sendline(test_name)
+    while True:
+        index = child.expect(["Test OK",
+                              "Test Failed",
+                              "lcore ([0-9]*) state: ([0-1])"
+                              "MCS lock taken on core ([0-9]*)",
+                              "MCS lock released on core ([0-9]*)",
+                              pexpect.TIMEOUT], timeout=5)
+        # ok
+        if index == 0:
+            break
+
+        # message, check ordering
+        elif index == 2:
+            if int(child.match.groups()[0]) < i:
+                return -1, "Fail [Bad order]"
+            i = int(child.match.groups()[0])
+        elif index == 3:
+            if int(child.match.groups()[0]) < ir:
+                return -1, "Fail [Bad order]"
+            ir = int(child.match.groups()[0])
+
+        # fail
+        elif index == 4:
+            return -1, "Fail [Timeout]"
+        elif index == 1:
+            return -1, "Fail"
+
+    return 0, "Success"
 
 def logs_autotest(child, test_name):
     child.sendline(test_name)
diff --git a/app/test/meson.build b/app/test/meson.build
index 466fd56..fe095cf 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -80,6 +80,7 @@ test_sources = files('commands.c',
 	'test_memzone.c',
 	'test_meter.c',
 	'test_metrics.c',
+	'test_mcslock.c',
 	'test_mp_secondary.c',
 	'test_pdump.c',
 	'test_per_lcore.c',
@@ -185,6 +186,7 @@ fast_test_names = [
         'lpm6_autotest',
         'malloc_autotest',
         'mbuf_autotest',
+        'mcslock_autotest',
         'memcpy_autotest',
         'memory_autotest',
         'mempool_autotest',
diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
new file mode 100644
index 0000000..1004563
--- /dev/null
+++ b/app/test/test_mcslock.c
@@ -0,0 +1,251 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_per_lcore.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_mcslock.h>
+#include <rte_atomic.h>
+
+#include "test.h"
+
+/*
+ * RTE MCS lock test
+ * =================
+ *
+ * These tests are derived from spin lock test cases.
+ *
+ * - The functional test takes all of these locks and launches the
+ *   ''test_mcslock_per_core()'' function on each core (except the master).
+ *
+ *   - The function takes the global lock, display something, then releases
+ *     the global lock on each core.
+ *
+ * - A load test is carried out, with all cores attempting to lock a single
+ *   lock multiple times.
+ */
+#include <rte_per_lcore.h>
+
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me);
+RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
+
+rte_mcslock_t *p_ml;
+rte_mcslock_t *p_ml_try;
+rte_mcslock_t *p_ml_perf;
+
+static unsigned int count;
+
+static rte_atomic32_t synchro;
+
+static int
+test_mcslock_per_core(__attribute__((unused)) void *arg)
+{
+	/* Per core me node. */
+	rte_mcslock_t ml_me = RTE_PER_LCORE(_ml_me);
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+	printf("MCS lock taken on core %u\n", rte_lcore_id());
+	rte_mcslock_unlock(&p_ml, &ml_me);
+	printf("MCS lock released on core %u\n", rte_lcore_id());
+
+	return 0;
+}
+
+static uint64_t time_count[RTE_MAX_LCORE] = {0};
+
+#define MAX_LOOP 1000000
+
+static int
+load_loop_fn(void *func_param)
+{
+	uint64_t time_diff = 0, begin;
+	uint64_t hz = rte_get_timer_hz();
+	volatile uint64_t lcount = 0;
+	const int use_lock = *(int *)func_param;
+	const unsigned int lcore = rte_lcore_id();
+
+	/**< Per core me node. */
+	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
+
+	/* wait synchro */
+	while (rte_atomic32_read(&synchro) == 0)
+		;
+
+	begin = rte_get_timer_cycles();
+	while (lcount < MAX_LOOP) {
+		if (use_lock)
+			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
+
+		lcount++;
+		if (use_lock)
+			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
+	}
+	time_diff = rte_get_timer_cycles() - begin;
+	time_count[lcore] = time_diff * 1000000 / hz;
+	return 0;
+}
+
+static int
+test_mcslock_perf(void)
+{
+	unsigned int i;
+	uint64_t total = 0;
+	int lock = 0;
+	const unsigned int lcore = rte_lcore_id();
+
+	printf("\nTest with no lock on single core...\n");
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on single core...\n");
+	lock = 1;
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+	printf("Core [%u] Cost Time = %"PRIu64" us\n",
+			lcore, time_count[lcore]);
+	memset(time_count, 0, sizeof(time_count));
+
+	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
+
+	rte_atomic32_set(&synchro, 0);
+	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
+
+	/* start synchro and launch test on master */
+	rte_atomic32_set(&synchro, 1);
+	load_loop_fn(&lock);
+
+	rte_eal_mp_wait_lcore();
+
+	RTE_LCORE_FOREACH(i) {
+		printf("Core [%u] Cost Time = %"PRIu64" us\n",
+				i, time_count[i]);
+		total += time_count[i];
+	}
+
+	printf("Total Cost Time = %"PRIu64" us\n", total);
+
+	return 0;
+}
+
+/*
+ * Use rte_mcslock_trylock() to trylock a mcs lock object,
+ * If it could not lock the object successfully, it would
+ * return immediately.
+ */
+static int
+test_mcslock_try(__attribute__((unused)) void *arg)
+{
+	/**< Per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/* Locked ml_try in the master lcore, so it should fail
+	 * when trying to lock it in the slave lcore.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0) {
+		rte_mcslock_lock(&p_ml, &ml_me);
+		count++;
+		rte_mcslock_unlock(&p_ml, &ml_me);
+	}
+
+	return 0;
+}
+
+
+/*
+ * Test rte_eal_get_lcore_state() in addition to mcs locks
+ * as we have "waiting" then "running" lcores.
+ */
+static int
+test_mcslock(void)
+{
+	int ret = 0;
+	int i;
+
+	/* Define per core me node. */
+	rte_mcslock_t ml_me     = RTE_PER_LCORE(_ml_me);
+	rte_mcslock_t ml_try_me = RTE_PER_LCORE(_ml_try_me);
+
+	/*
+	 * Test mcs lock & unlock on each core
+	 */
+
+	/* slave cores should be waiting: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_lock(&p_ml, &ml_me);
+
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_per_core, NULL, i);
+	}
+
+	/* slave cores should be busy: print it */
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		printf("lcore %d state: %d\n", i,
+				(int) rte_eal_get_lcore_state(i));
+	}
+
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	rte_eal_mp_wait_lcore();
+
+	/*
+	 * Test if it could return immediately from try-locking a locked object.
+	 * Here it will lock the mcs lock object first, then launch all the
+	 * slave lcores to trylock the same mcs lock object.
+	 * All the slave lcores should give up try-locking a locked object and
+	 * return immediately, and then increase the "count" initialized with
+	 * zero by one per times.
+	 * We can check if the "count" is finally equal to the number of all
+	 * slave lcores to see if the behavior of try-locking a locked
+	 * mcslock object is correct.
+	 */
+	if (rte_mcslock_trylock(&p_ml_try, &ml_try_me) == 0)
+		return -1;
+
+	count = 0;
+	RTE_LCORE_FOREACH_SLAVE(i) {
+		rte_eal_remote_launch(test_mcslock_try, NULL, i);
+	}
+	rte_mcslock_unlock(&p_ml_try, &ml_try_me);
+	rte_eal_mp_wait_lcore();
+
+	/* Test is_locked API */
+	if (rte_mcslock_is_locked(p_ml)) {
+		printf("mcslock is locked but it should not be\n");
+		return -1;
+	}
+
+	/* Counting the locked times in each core */
+	rte_mcslock_lock(&p_ml, &ml_me);
+	if (count != (rte_lcore_count() - 1))
+		ret = -1;
+	rte_mcslock_unlock(&p_ml, &ml_me);
+
+	/* mcs lock perf test */
+	if (test_mcslock_perf() < 0)
+		return -1;
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(mcslock_autotest, test_mcslock);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
  2019-07-04 20:12 ` Thomas Monjalon
@ 2019-07-05 10:33   ` Phil Yang (Arm Technology China)
  0 siblings, 0 replies; 24+ messages in thread
From: Phil Yang (Arm Technology China) @ 2019-07-05 10:33 UTC (permalink / raw)
  To: thomas, Honnappa Nagarahalli
  Cc: dev, jerinj, hemant.agrawal, Gavin Hu (Arm Technology China), nd, nd

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, July 5, 2019 4:12 AM
> To: Phil Yang (Arm Technology China) <Phil.Yang@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; Gavin
> Hu (Arm Technology China) <Gavin.Hu@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation
> 
> 05/06/2019 17:58, Phil Yang:
> > This patch set added MCS lock library and its unit test.
> >
> > The MCS lock (proposed by JOHN M. MELLOR-CRUMMEY and MICHAEL L.
> SCOTT) provides
> > scalability by spinning on a CPU/thread local variable which avoids
> expensive
> > cache bouncings. It provides fairness by maintaining a list of acquirers and
> > passing the lock to each CPU/thread in the order they acquired the lock.
> 
> What is the plan for this patch?
I have reworked this patch and addressed all the comments in the previous version. Please review the v3. 

> Do we try to get more reviews for a merge in 19.08-rc2?
> Or we target 19.11?
> 
Thank,
Phil

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation
  2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
                       ` (2 preceding siblings ...)
  2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
@ 2019-07-07 21:49     ` Thomas Monjalon
  3 siblings, 0 replies; 24+ messages in thread
From: Thomas Monjalon @ 2019-07-07 21:49 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, david.marchand, konstantin.ananyev, jerinj, hemant.agrawal,
	Honnappa.Nagarahalli, gavin.hu, nd

05/07/2019 12:27, Phil Yang:
> Phil Yang (3):
>   eal/mcslock: add mcs queued lock implementation
>   eal/mcslock: use generic msc queued lock on all arch
>   test/mcslock: add mcs queued lock unit test

Applied, thanks




^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-07-07 21:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation Phil Yang
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs " Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs " Phil Yang
2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-07-07 21:49     ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Thomas Monjalon
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-06-06 13:42   ` Ananyev, Konstantin
2019-06-07  5:27     ` Honnappa Nagarahalli
2019-06-10 16:36       ` Phil Yang (Arm Technology China)
2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
2019-06-05 19:59   ` Honnappa Nagarahalli
2019-06-06 10:17   ` Phil Yang (Arm Technology China)
2019-06-05 16:47 ` Stephen Hemminger
2019-06-05 20:48   ` Honnappa Nagarahalli
2019-06-05 17:35 ` Thomas Monjalon
2019-07-04 20:12 ` Thomas Monjalon
2019-07-05 10:33   ` Phil Yang (Arm Technology China)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).