DPDK patches and discussions
 help / color / mirror / Atom feed
From: Phil Yang <phil.yang@arm.com>
To: dev@dpdk.org
Cc: thomas@monjalon.net, david.marchand@redhat.com,
	konstantin.ananyev@intel.com, jerinj@marvell.com,
	hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com,
	gavin.hu@arm.com, nd@arm.com, phil.yang@arm.com
Subject: [dpdk-dev] [PATCH v3 1/3] eal/mcslock: add mcs queued lock implementation
Date: Fri,  5 Jul 2019 18:27:06 +0800	[thread overview]
Message-ID: <1562322429-18635-2-git-send-email-phil.yang@arm.com> (raw)
In-Reply-To: <1562322429-18635-1-git-send-email-phil.yang@arm.com>

If there are multiple threads contending, they all attempt to take the
spinlock lock at the same time once it is released. This results in a
huge amount of processor bus traffic, which is a huge performance
killer. Thus, if we somehow order the lock-takers so that they know who
is next in line for the resource we can vastly reduce the amount of bus
traffic.

This patch added MCS lock library. It provides scalability by spinning
on a CPU/thread local variable which avoids expensive cache bouncings.
It provides fairness by maintaining a list of acquirers and passing the
lock to each CPU/thread in the order they acquired the lock.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>

---
 MAINTAINERS                                        |   4 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/guides/rel_notes/release_19_08.rst             |   6 +
 lib/librte_eal/common/Makefile                     |   2 +-
 .../common/include/generic/rte_mcslock.h           | 179 +++++++++++++++++++++
 lib/librte_eal/common/meson.build                  |   1 +
 6 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/generic/rte_mcslock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6054220..c6f81f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -234,6 +234,10 @@ F: lib/librte_eal/common/include/rte_random.h
 F: lib/librte_eal/common/rte_random.c
 F: app/test/test_rand_perf.c
 
+MCSlock - EXPERIMENTAL
+M: Phil Yang <phil.yang@arm.com>
+F: lib/librte_eal/common/include/generic/rte_mcslock.h
+
 ARM v7
 M: Jan Viktorin <viktorin@rehivetech.com>
 M: Gavin Hu <gavin.hu@arm.com>
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 715248d..d0e32b1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -63,6 +63,7 @@ The public API headers are grouped by topics:
 
 - **locks**:
   [atomic]             (@ref rte_atomic.h),
+  [mcslock]            (@ref rte_mcslock.h),
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index 21934bf..5afaeab 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -139,6 +139,12 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
+* **Added MCS lock library.**
+
+  Added MCS lock library. It provides scalability by spinning on a
+  CPU/thread local variable which avoids expensive cache bouncings.
+  It provides fairness by maintaining a list of acquirers and passing
+  the lock to each CPU/thread in the order they acquired the lock.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 1647af7..a00d4fc 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -21,7 +21,7 @@ INC += rte_reciprocal.h rte_fbarray.h rte_uuid.h
 
 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_memcpy.h rte_cpuflags.h
-GENERIC_INC += rte_spinlock.h rte_rwlock.h rte_ticketlock.h
+GENERIC_INC += rte_mcslock.h rte_spinlock.h rte_rwlock.h rte_ticketlock.h
 GENERIC_INC += rte_vect.h rte_pause.h rte_io.h
 
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
diff --git a/lib/librte_eal/common/include/generic/rte_mcslock.h b/lib/librte_eal/common/include/generic/rte_mcslock.h
new file mode 100644
index 0000000..2bef283
--- /dev/null
+++ b/lib/librte_eal/common/include/generic/rte_mcslock.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_MCSLOCK_H_
+#define _RTE_MCSLOCK_H_
+
+/**
+ * @file
+ *
+ * RTE MCS lock
+ *
+ * This file defines the main data structure and APIs for MCS queued lock.
+ *
+ * The MCS lock (proposed by John M. Mellor-Crummey and Michael L. Scott)
+ * provides scalability by spinning on a CPU/thread local variable which
+ * avoids expensive cache bouncings. It provides fairness by maintaining
+ * a list of acquirers and passing the lock to each CPU/thread in the order
+ * they acquired the lock.
+ */
+
+#include <rte_lcore.h>
+#include <rte_common.h>
+#include <rte_pause.h>
+
+/**
+ * The rte_mcslock_t type.
+ */
+typedef struct rte_mcslock {
+	struct rte_mcslock *next;
+	int locked; /* 1 if the queue locked, 0 otherwise */
+} rte_mcslock_t;
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Take the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ *   When the lock is initialized or declared, the msl pointer should be
+ *   set to NULL.
+ * @param me
+ *   A pointer to a new node of MCS lock. Each CPU/thread acquiring the
+ *   lock should use its 'own node'.
+ */
+__rte_experimental
+static inline void
+rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	rte_mcslock_t *prev;
+
+	/* Init me node */
+	__atomic_store_n(&me->locked, 1, __ATOMIC_RELAXED);
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* If the queue is empty, the exchange operation is enough to acquire
+	 * the lock. Hence, the exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node is
+	 * visible to other CPUs/threads. Hence, the exchange operation requires
+	 * release semantics as well.
+	 */
+	prev = __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL);
+	if (likely(prev == NULL)) {
+		/* Queue was empty, no further action required,
+		 * proceed with lock taken.
+		 */
+		return;
+	}
+	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
+
+	/* The while-load of me->locked should not move above the previous
+	 * store to prev->next. Otherwise it will cause a deadlock. Need a
+	 * store-load barrier.
+	 */
+	__atomic_thread_fence(__ATOMIC_ACQ_REL);
+	/* If the lock has already been acquired, it first atomically
+	 * places the node at the end of the queue and then proceeds
+	 * to spin on me->locked until the previous lock holder resets
+	 * the me->locked using mcslock_unlock().
+	 */
+	while (__atomic_load_n(&me->locked, __ATOMIC_ACQUIRE))
+		rte_pause();
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Release the MCS lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to the node of MCS lock passed in rte_mcslock_lock.
+ */
+__rte_experimental
+static inline void
+rte_mcslock_unlock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Check if there are more nodes in the queue. */
+	if (likely(__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)) {
+		/* No, last member in the queue. */
+		rte_mcslock_t *save_me = __atomic_load_n(&me, __ATOMIC_RELAXED);
+
+		/* Release the lock by setting it to NULL */
+		if (likely(__atomic_compare_exchange_n(msl, &save_me, NULL, 0,
+				__ATOMIC_RELEASE, __ATOMIC_RELAXED)))
+			return;
+
+		/* Speculative execution would be allowed to read in the
+		 * while-loop first. This has the potential to cause a
+		 * deadlock. Need a load barrier.
+		 */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* More nodes added to the queue by other CPUs.
+		 * Wait until the next pointer is set.
+		 */
+		while (__atomic_load_n(&me->next, __ATOMIC_RELAXED) == NULL)
+			rte_pause();
+	}
+
+	/* Pass lock to next waiter. */
+	__atomic_store_n(&me->next->locked, 0, __ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Try to take the lock.
+ *
+ * @param msl
+ *   A pointer to the pointer of a MCS lock.
+ * @param me
+ *   A pointer to a new node of MCS lock.
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_mcslock_trylock(rte_mcslock_t **msl, rte_mcslock_t *me)
+{
+	/* Init me node */
+	__atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED);
+
+	/* Try to lock */
+	rte_mcslock_t *expected = NULL;
+
+	/* The lock can be taken only when the queue is empty. Hence,
+	 * the compare-exchange operation requires acquire semantics.
+	 * The store to me->next above should complete before the node
+	 * is visible to other CPUs/threads. Hence, the compare-exchange
+	 * operation requires release semantics as well.
+	 */
+	return __atomic_compare_exchange_n(msl, &expected, me, 0,
+			__ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change without prior notice
+ *
+ * Test if the lock is taken.
+ *
+ * @param msl
+ *   A pointer to a MCS lock node.
+ * @return
+ *   1 if the lock is currently taken; 0 otherwise.
+ */
+__rte_experimental
+static inline int
+rte_mcslock_is_locked(rte_mcslock_t *msl)
+{
+	return (__atomic_load_n(&msl, __ATOMIC_RELAXED) != NULL);
+}
+
+#endif /* _RTE_MCSLOCK_H_ */
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index bafd232..a54ece8 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -95,6 +95,7 @@ generic_headers = files(
 	'include/generic/rte_cpuflags.h',
 	'include/generic/rte_cycles.h',
 	'include/generic/rte_io.h',
+	'include/generic/rte_mcslock.h',
 	'include/generic/rte_memcpy.h',
 	'include/generic/rte_pause.h',
 	'include/generic/rte_prefetch.h',
-- 
2.7.4


  reply	other threads:[~2019-07-05 10:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-05 15:58 [dpdk-dev] [PATCH v1 0/3] MCS " Phil Yang
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 1/3] eal/mcslock: add mcs " Phil Yang
2019-07-05  9:56   ` [dpdk-dev] [PATCH v2 0/3] MCS " Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 1/3] eal/mcslock: add mcs " Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-07-05  9:56     ` [dpdk-dev] [PATCH v2 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-07-05 10:27   ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Phil Yang
2019-07-05 10:27     ` Phil Yang [this message]
2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-07-05 10:27     ` [dpdk-dev] [PATCH v3 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-07-07 21:49     ` [dpdk-dev] [PATCH v3 0/3] MCS queued lock implementation Thomas Monjalon
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 2/3] eal/mcslock: use generic msc queued lock on all arch Phil Yang
2019-06-05 15:58 ` [dpdk-dev] [PATCH v1 3/3] test/mcslock: add mcs queued lock unit test Phil Yang
2019-06-06 13:42   ` Ananyev, Konstantin
2019-06-07  5:27     ` Honnappa Nagarahalli
2019-06-10 16:36       ` Phil Yang (Arm Technology China)
2019-06-05 16:29 ` [dpdk-dev] [PATCH v1 0/3] MCS queued lock implementation David Marchand
2019-06-05 19:59   ` Honnappa Nagarahalli
2019-06-06 10:17   ` Phil Yang (Arm Technology China)
2019-06-05 16:47 ` Stephen Hemminger
2019-06-05 20:48   ` Honnappa Nagarahalli
2019-06-05 17:35 ` Thomas Monjalon
2019-07-04 20:12 ` Thomas Monjalon
2019-07-05 10:33   ` Phil Yang (Arm Technology China)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1562322429-18635-2-git-send-email-phil.yang@arm.com \
    --to=phil.yang@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=gavin.hu@arm.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).