From: Gage Eads <gage.eads@intel.com>
To: dev@dpdk.org
Cc: olivier.matz@6wind.com, arybchenko@solarflare.com,
bruce.richardson@intel.com, konstantin.ananyev@intel.com
Subject: [dpdk-dev] [PATCH v2 2/2] mempool/nb_stack: add non-blocking stack mempool
Date: Tue, 15 Jan 2019 16:32:32 -0600 [thread overview]
Message-ID: <20190115223232.31866-3-gage.eads@intel.com> (raw)
In-Reply-To: <20190115223232.31866-1-gage.eads@intel.com>
This commit adds support for non-blocking (linked list based) stack mempool
handler. The stack uses a 128-bit compare-and-swap instruction, and thus is
limited to x86_64. The 128-bit CAS atomically updates the stack top pointer
and a modification counter, which protects against the ABA problem.
In mempool_perf_autotest the lock-based stack outperforms the non-blocking
handler*, however:
- For applications with preemptible pthreads, a lock-based stack's
worst-case performance (i.e. one thread being preempted while
holding the spinlock) is much worse than the non-blocking stack's.
- Using per-thread mempool caches will largely mitigate the performance
difference.
*Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
running on isolcpus cores with a tickless scheduler. The lock-based stack's
rate_persec was 1x-3.5x the non-blocking stack's.
Signed-off-by: Gage Eads <gage.eads@intel.com>
---
MAINTAINERS | 4 +
config/common_base | 1 +
doc/guides/prog_guide/env_abstraction_layer.rst | 5 +
drivers/mempool/Makefile | 3 +
drivers/mempool/meson.build | 5 +
drivers/mempool/nb_stack/Makefile | 23 ++++
drivers/mempool/nb_stack/meson.build | 4 +
drivers/mempool/nb_stack/nb_lifo.h | 147 +++++++++++++++++++++
drivers/mempool/nb_stack/rte_mempool_nb_stack.c | 125 ++++++++++++++++++
.../nb_stack/rte_mempool_nb_stack_version.map | 4 +
mk/rte.app.mk | 7 +-
11 files changed, 326 insertions(+), 2 deletions(-)
create mode 100644 drivers/mempool/nb_stack/Makefile
create mode 100644 drivers/mempool/nb_stack/meson.build
create mode 100644 drivers/mempool/nb_stack/nb_lifo.h
create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack.c
create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 470f36b9c..5519d3323 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -416,6 +416,10 @@ M: Artem V. Andreev <artem.andreev@oktetlabs.ru>
M: Andrew Rybchenko <arybchenko@solarflare.com>
F: drivers/mempool/bucket/
+Non-blocking stack memory pool
+M: Gage Eads <gage.eads@intel.com>
+F: drivers/mempool/nb_stack/
+
Bus Drivers
-----------
diff --git a/config/common_base b/config/common_base
index 964a6956e..8a51f36b1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -726,6 +726,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
#
CONFIG_RTE_DRIVER_MEMPOOL_BUCKET=y
CONFIG_RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB=64
+CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK=y
CONFIG_RTE_DRIVER_MEMPOOL_RING=y
CONFIG_RTE_DRIVER_MEMPOOL_STACK=y
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba..9497b879c 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -541,6 +541,11 @@ Known Issues
5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
+ Alternatively, x86_64 applications can use the non-blocking stack mempool handler. When considering this handler, note that:
+
+ - it is limited to the x86_64 platform, because it uses an instruction (16-byte compare-and-swap) that is not available on other platforms.
+ - it has worse average-case performance than the non-preemptive rte_ring, but software caching (e.g. the mempool cache) can mitigate this by reducing the number of handler operations.
+
+ rte_timer
Running ``rte_timer_manage()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile
index 28c2e8360..895cf8a34 100644
--- a/drivers/mempool/Makefile
+++ b/drivers/mempool/Makefile
@@ -10,6 +10,9 @@ endif
ifeq ($(CONFIG_RTE_EAL_VFIO)$(CONFIG_RTE_LIBRTE_FSLMC_BUS),yy)
DIRS-$(CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL) += dpaa2
endif
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += nb_stack
+endif
DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING) += ring
DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += stack
DIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx
diff --git a/drivers/mempool/meson.build b/drivers/mempool/meson.build
index 4527d9806..01ee30fee 100644
--- a/drivers/mempool/meson.build
+++ b/drivers/mempool/meson.build
@@ -2,6 +2,11 @@
# Copyright(c) 2017 Intel Corporation
drivers = ['bucket', 'dpaa', 'dpaa2', 'octeontx', 'ring', 'stack']
+
+if dpdk_conf.has('RTE_ARCH_X86_64')
+ drivers += 'nb_stack'
+endif
+
std_deps = ['mempool']
config_flag_fmt = 'RTE_LIBRTE_@0@_MEMPOOL'
driver_name_fmt = 'rte_mempool_@0@'
diff --git a/drivers/mempool/nb_stack/Makefile b/drivers/mempool/nb_stack/Makefile
new file mode 100644
index 000000000..318b18283
--- /dev/null
+++ b/drivers/mempool/nb_stack/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_mempool_nb_stack.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+# Headers
+LDLIBS += -lrte_eal -lrte_mempool
+
+EXPORT_MAP := rte_mempool_nb_stack_version.map
+
+LIBABIVER := 1
+
+SRCS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += rte_mempool_nb_stack.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/mempool/nb_stack/meson.build b/drivers/mempool/nb_stack/meson.build
new file mode 100644
index 000000000..66d64a9ba
--- /dev/null
+++ b/drivers/mempool/nb_stack/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Intel Corporation
+
+sources = files('rte_mempool_nb_stack.c')
diff --git a/drivers/mempool/nb_stack/nb_lifo.h b/drivers/mempool/nb_stack/nb_lifo.h
new file mode 100644
index 000000000..2edae1c0f
--- /dev/null
+++ b/drivers/mempool/nb_stack/nb_lifo.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#ifndef _NB_LIFO_H_
+#define _NB_LIFO_H_
+
+struct nb_lifo_elem {
+ void *data;
+ struct nb_lifo_elem *next;
+};
+
+struct nb_lifo_head {
+ struct nb_lifo_elem *top; /**< Stack top */
+ uint64_t cnt; /**< Modification counter */
+};
+
+struct nb_lifo {
+ volatile struct nb_lifo_head head __rte_aligned(16);
+ rte_atomic64_t len;
+} __rte_cache_aligned;
+
+static __rte_always_inline void
+nb_lifo_init(struct nb_lifo *lifo)
+{
+ memset(lifo, 0, sizeof(*lifo));
+ rte_atomic64_set(&lifo->len, 0);
+}
+
+static __rte_always_inline unsigned int
+nb_lifo_len(struct nb_lifo *lifo)
+{
+ /* nb_lifo_push() and nb_lifo_pop() do not update the list's contents
+ * and lifo->len atomically, which can cause the list to appear shorter
+ * than it actually is if this function is called while other threads
+ * are modifying the list.
+ *
+ * However, given the inherently approximate nature of the get_count
+ * callback -- even if the list and its size were updated atomically,
+ * the size could change between when get_count executes and when the
+ * value is returned to the caller -- this is acceptable.
+ *
+ * The lifo->len updates are placed such that the list may appear to
+ * have fewer elements than it does, but will never appear to have more
+ * elements. If the mempool is near-empty to the point that this is a
+ * concern, the user should consider increasing the mempool size.
+ */
+ return (unsigned int)rte_atomic64_read(&lifo->len);
+}
+
+static __rte_always_inline void
+nb_lifo_push(struct nb_lifo *lifo,
+ struct nb_lifo_elem *first,
+ struct nb_lifo_elem *last,
+ unsigned int num)
+{
+ while (1) {
+ struct nb_lifo_head old_head, new_head;
+
+ old_head = lifo->head;
+
+ /* Swing the top pointer to the first element in the list and
+ * make the last element point to the old top.
+ */
+ new_head.top = first;
+ new_head.cnt = old_head.cnt + 1;
+
+ last->next = old_head.top;
+
+ if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head,
+ (uint64_t *)&old_head,
+ (uint64_t *)&new_head))
+ break;
+ }
+
+ rte_atomic64_add(&lifo->len, num);
+}
+
+static __rte_always_inline void
+nb_lifo_push_single(struct nb_lifo *lifo, struct nb_lifo_elem *elem)
+{
+ nb_lifo_push(lifo, elem, elem, 1);
+}
+
+static __rte_always_inline struct nb_lifo_elem *
+nb_lifo_pop(struct nb_lifo *lifo,
+ unsigned int num,
+ void **obj_table,
+ struct nb_lifo_elem **last)
+{
+ struct nb_lifo_head old_head;
+
+ /* Reserve num elements, if available */
+ while (1) {
+ uint64_t len = rte_atomic64_read(&lifo->len);
+
+ /* Does the list contain enough elements? */
+ if (len < num)
+ return NULL;
+
+ if (rte_atomic64_cmpset((volatile uint64_t *)&lifo->len,
+ len, len - num))
+ break;
+ }
+
+ /* Pop num elements */
+ while (1) {
+ struct nb_lifo_head new_head;
+ struct nb_lifo_elem *tmp;
+ unsigned int i;
+
+ old_head = lifo->head;
+
+ tmp = old_head.top;
+
+ /* Traverse the list to find the new head. A next pointer will
+ * either point to another element or NULL; if a thread
+ * encounters a pointer that has already been popped, the CAS
+ * will fail.
+ */
+ for (i = 0; i < num && tmp != NULL; i++) {
+ if (obj_table)
+ obj_table[i] = tmp->data;
+ if (last)
+ *last = tmp;
+ tmp = tmp->next;
+ }
+
+ /* If NULL was encountered, the list was modified while
+ * traversing it. Retry.
+ */
+ if (i != num)
+ continue;
+
+ new_head.top = tmp;
+ new_head.cnt = old_head.cnt + 1;
+
+ if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head,
+ (uint64_t *)&old_head,
+ (uint64_t *)&new_head))
+ break;
+ }
+
+ return old_head.top;
+}
+
+#endif /* _NB_LIFO_H_ */
diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack.c b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c
new file mode 100644
index 000000000..1818a2cfa
--- /dev/null
+++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+
+#include "nb_lifo.h"
+
+struct rte_mempool_nb_stack {
+ uint64_t size;
+ struct nb_lifo used_lifo; /**< LIFO containing mempool pointers */
+ struct nb_lifo free_lifo; /**< LIFO containing unused LIFO elements */
+};
+
+static int
+nb_stack_alloc(struct rte_mempool *mp)
+{
+ struct rte_mempool_nb_stack *s;
+ struct nb_lifo_elem *elems;
+ unsigned int n = mp->size;
+ unsigned int size, i;
+
+ size = sizeof(*s) + n * sizeof(struct nb_lifo_elem);
+
+ /* Allocate our local memory structure */
+ s = rte_zmalloc_socket("mempool-nb_stack",
+ size,
+ RTE_CACHE_LINE_SIZE,
+ mp->socket_id);
+ if (s == NULL) {
+ RTE_LOG(ERR, MEMPOOL, "Cannot allocate nb_stack!\n");
+ return -ENOMEM;
+ }
+
+ s->size = n;
+
+ nb_lifo_init(&s->used_lifo);
+ nb_lifo_init(&s->free_lifo);
+
+ elems = (struct nb_lifo_elem *)&s[1];
+ for (i = 0; i < n; i++)
+ nb_lifo_push_single(&s->free_lifo, &elems[i]);
+
+ mp->pool_data = s;
+
+ return 0;
+}
+
+static int
+nb_stack_enqueue(struct rte_mempool *mp, void * const *obj_table,
+ unsigned int n)
+{
+ struct rte_mempool_nb_stack *s = mp->pool_data;
+ struct nb_lifo_elem *first, *last, *tmp;
+ unsigned int i;
+
+ if (unlikely(n == 0))
+ return 0;
+
+ /* Pop n free elements */
+ first = nb_lifo_pop(&s->free_lifo, n, NULL, NULL);
+ if (unlikely(first == NULL))
+ return -ENOBUFS;
+
+ /* Prepare the list elements */
+ tmp = first;
+ for (i = 0; i < n; i++) {
+ tmp->data = obj_table[i];
+ last = tmp;
+ tmp = tmp->next;
+ }
+
+ /* Enqueue them to the used list */
+ nb_lifo_push(&s->used_lifo, first, last, n);
+
+ return 0;
+}
+
+static int
+nb_stack_dequeue(struct rte_mempool *mp, void **obj_table,
+ unsigned int n)
+{
+ struct rte_mempool_nb_stack *s = mp->pool_data;
+ struct nb_lifo_elem *first, *last;
+
+ if (unlikely(n == 0))
+ return 0;
+
+ /* Pop n used elements */
+ first = nb_lifo_pop(&s->used_lifo, n, obj_table, &last);
+ if (unlikely(first == NULL))
+ return -ENOENT;
+
+ /* Enqueue the list elements to the free list */
+ nb_lifo_push(&s->free_lifo, first, last, n);
+
+ return 0;
+}
+
+static unsigned
+nb_stack_get_count(const struct rte_mempool *mp)
+{
+ struct rte_mempool_nb_stack *s = mp->pool_data;
+
+ return nb_lifo_len(&s->used_lifo);
+}
+
+static void
+nb_stack_free(struct rte_mempool *mp)
+{
+ rte_free(mp->pool_data);
+}
+
+static struct rte_mempool_ops ops_nb_stack = {
+ .name = "nb_stack",
+ .alloc = nb_stack_alloc,
+ .free = nb_stack_free,
+ .enqueue = nb_stack_enqueue,
+ .dequeue = nb_stack_dequeue,
+ .get_count = nb_stack_get_count
+};
+
+MEMPOOL_REGISTER_OPS(ops_nb_stack);
diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
new file mode 100644
index 000000000..fc8c95e91
--- /dev/null
+++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 02e8b6f05..d4b4aaaf6 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -131,8 +131,11 @@ endif
ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
# plugins (link only if static libraries)
-_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += -lrte_mempool_bucket
-_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += -lrte_mempool_stack
+_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += -lrte_mempool_bucket
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += -lrte_mempool_nb_stack
+endif
+_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += -lrte_mempool_stack
ifeq ($(CONFIG_RTE_LIBRTE_DPAA_BUS),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA_MEMPOOL) += -lrte_mempool_dpaa
endif
--
2.13.6
next prev parent reply other threads:[~2019-01-15 22:33 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-10 20:55 [dpdk-dev] [PATCH 0/3] Add non-blocking stack mempool handler Gage Eads
2019-01-10 20:55 ` [dpdk-dev] [PATCH 1/3] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-13 12:18 ` Andrew Rybchenko
2019-01-14 4:29 ` Varghese, Vipin
2019-01-14 15:46 ` Eads, Gage
2019-01-16 4:34 ` Varghese, Vipin
2019-01-14 15:43 ` Eads, Gage
2019-01-10 20:55 ` [dpdk-dev] [PATCH 2/3] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-13 13:31 ` Andrew Rybchenko
2019-01-14 16:22 ` Eads, Gage
2019-01-10 20:55 ` [dpdk-dev] [PATCH 3/3] doc: add NB stack comment to EAL "known issues" Gage Eads
2019-01-15 22:32 ` [dpdk-dev] [PATCH v2 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-15 22:32 ` [dpdk-dev] [PATCH v2 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17 8:49 ` Gavin Hu (Arm Technology China)
2019-01-17 15:14 ` Eads, Gage
2019-01-17 15:57 ` Gavin Hu (Arm Technology China)
2019-01-15 22:32 ` Gage Eads [this message]
2019-01-16 7:13 ` [dpdk-dev] [PATCH v2 2/2] mempool/nb_stack: add non-blocking stack mempool Andrew Rybchenko
2019-01-17 8:06 ` Gavin Hu (Arm Technology China)
2019-01-17 14:11 ` Eads, Gage
2019-01-17 14:20 ` Bruce Richardson
2019-01-17 15:16 ` Eads, Gage
2019-01-17 15:42 ` Gavin Hu (Arm Technology China)
2019-01-17 20:41 ` Eads, Gage
2019-01-16 15:18 ` [dpdk-dev] [PATCH v3 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-16 15:18 ` [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17 15:45 ` Honnappa Nagarahalli
2019-01-17 23:03 ` Eads, Gage
2019-01-18 5:27 ` Honnappa Nagarahalli
2019-01-18 22:01 ` Eads, Gage
2019-01-22 20:30 ` Honnappa Nagarahalli
2019-01-22 22:25 ` Eads, Gage
2019-01-24 5:21 ` Honnappa Nagarahalli
2019-01-25 17:19 ` Eads, Gage
2019-01-16 15:18 ` [dpdk-dev] [PATCH v3 2/2] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-17 15:36 ` [dpdk-dev] [PATCH v4 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-17 15:36 ` [dpdk-dev] [PATCH v4 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17 15:36 ` [dpdk-dev] [PATCH v4 2/2] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-18 5:05 ` Honnappa Nagarahalli
2019-01-18 20:09 ` Eads, Gage
2019-01-19 0:00 ` Eads, Gage
2019-01-19 0:15 ` Thomas Monjalon
2019-01-22 18:24 ` Eads, Gage
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190115223232.31866-3-gage.eads@intel.com \
--to=gage.eads@intel.com \
--cc=arybchenko@solarflare.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).