DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <arybchenko@solarflare.com>
To: Gage Eads <gage.eads@intel.com>, <dev@dpdk.org>
Cc: <olivier.matz@6wind.com>, <bruce.richardson@intel.com>,
	<konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [PATCH 2/3] mempool/nb_stack: add non-blocking stack mempool
Date: Sun, 13 Jan 2019 16:31:01 +0300	[thread overview]
Message-ID: <74affd62-a694-b73a-69b9-28a884f80f61@solarflare.com> (raw)
In-Reply-To: <20190110205538.24435-3-gage.eads@intel.com>

Hi Gage,

In general looks very good.

Have you considered to make nb_lifo.h a library to be reusable outside
of the mempool driver?

There are few notes below.

Thanks,
Andrew.

On 1/10/19 11:55 PM, Gage Eads wrote:
> This commit adds support for non-blocking (linked list based) stack mempool
> handler. The stack uses a 128-bit compare-and-swap instruction, and thus is
> limited to x86_64. The 128-bit CAS atomically updates the stack top pointer
> and a modification counter, which protects against the ABA problem.
>
> In mempool_perf_autotest the lock-based stack outperforms the non-blocking
> handler*, however:
> - For applications with preemptible pthreads, a lock-based stack's
>    worst-case performance (i.e. one thread being preempted while
>    holding the spinlock) is much worse than the non-blocking stack's.
> - Using per-thread mempool caches will largely mitigate the performance
>    difference.
>
> *Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
> running on isolcpus cores with a tickless scheduler. The lock-based stack's
> rate_persec was 1x-3.5x the non-blocking stack's.
>
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
>   MAINTAINERS                                        |   4 +
>   config/common_base                                 |   1 +
>   drivers/mempool/Makefile                           |   1 +
>   drivers/mempool/nb_stack/Makefile                  |  30 +++++
>   drivers/mempool/nb_stack/meson.build               |   4 +
>   drivers/mempool/nb_stack/nb_lifo.h                 | 132 +++++++++++++++++++++
>   drivers/mempool/nb_stack/rte_mempool_nb_stack.c    | 125 +++++++++++++++++++
>   .../nb_stack/rte_mempool_nb_stack_version.map      |   4 +
>   mk/rte.app.mk                                      |   1 +
>   9 files changed, 302 insertions(+)
>   create mode 100644 drivers/mempool/nb_stack/Makefile
>   create mode 100644 drivers/mempool/nb_stack/meson.build
>   create mode 100644 drivers/mempool/nb_stack/nb_lifo.h
>   create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack.c
>   create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 470f36b9c..5519d3323 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -416,6 +416,10 @@ M: Artem V. Andreev <artem.andreev@oktetlabs.ru>
>   M: Andrew Rybchenko <arybchenko@solarflare.com>
>   F: drivers/mempool/bucket/
>   
> +Non-blocking stack memory pool
> +M: Gage Eads <gage.eads@intel.com>
> +F: drivers/mempool/nb_stack/
> +
>   
>   Bus Drivers
>   -----------
> diff --git a/config/common_base b/config/common_base
> index 964a6956e..40ce47312 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -728,6 +728,7 @@ CONFIG_RTE_DRIVER_MEMPOOL_BUCKET=y
>   CONFIG_RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB=64
>   CONFIG_RTE_DRIVER_MEMPOOL_RING=y
>   CONFIG_RTE_DRIVER_MEMPOOL_STACK=y
> +CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK=y

Typically it is alphabetically sorted.

>   #
>   # Compile PMD for octeontx fpa mempool device
> diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile
> index 28c2e8360..aeae3ac12 100644
> --- a/drivers/mempool/Makefile
> +++ b/drivers/mempool/Makefile
> @@ -13,5 +13,6 @@ endif
>   DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING) += ring
>   DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += stack
>   DIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx
> +DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += nb_stack

Typically it is alphabetically sorted. Yes, already broken, but, please, 
put it before ring.

>   
>   include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/mempool/nb_stack/Makefile b/drivers/mempool/nb_stack/Makefile
> new file mode 100644
> index 000000000..38b45f4f5
> --- /dev/null
> +++ b/drivers/mempool/nb_stack/Makefile
> @@ -0,0 +1,30 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2019 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# The non-blocking stack uses a 128-bit compare-and-swap instruction, and thus
> +# is limited to x86_64.
> +ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
> +
> +#
> +# library name
> +#
> +LIB = librte_mempool_nb_stack.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +# Headers
> +CFLAGS += -I$(RTE_SDK)/lib/librte_mempool

I guess it is derived from stack. Is it really required? There is no 
such line
in ring, bucket, octeontx and dpaa2.

> +LDLIBS += -lrte_eal -lrte_mempool
> +
> +EXPORT_MAP := rte_mempool_nb_stack_version.map
> +
> +LIBABIVER := 1
> +
> +SRCS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += rte_mempool_nb_stack.c
> +
> +endif
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/mempool/nb_stack/meson.build b/drivers/mempool/nb_stack/meson.build
> new file mode 100644
> index 000000000..66d64a9ba
> --- /dev/null
> +++ b/drivers/mempool/nb_stack/meson.build
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2019 Intel Corporation
> +
> +sources = files('rte_mempool_nb_stack.c')

Have no tested meson build for non-x86_64 target?
I guess it should be fixed to skip it on non-x86_64 builds.

> diff --git a/drivers/mempool/nb_stack/nb_lifo.h b/drivers/mempool/nb_stack/nb_lifo.h
> new file mode 100644
> index 000000000..701d75e37
> --- /dev/null
> +++ b/drivers/mempool/nb_stack/nb_lifo.h
> @@ -0,0 +1,132 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation
> + */
> +
> +#ifndef _NB_LIFO_H_
> +#define _NB_LIFO_H_
> +
> +struct nb_lifo_elem {
> +	void *data;
> +	struct nb_lifo_elem *next;
> +};
> +
> +struct nb_lifo_head {
> +	struct nb_lifo_elem *top; /**< Stack top */
> +	uint64_t cnt; /**< Modification counter */
> +};
> +
> +struct nb_lifo {
> +	volatile struct nb_lifo_head head __rte_aligned(16);
> +	rte_atomic64_t len;
> +} __rte_cache_aligned;
> +
> +static __rte_always_inline void
> +nb_lifo_init(struct nb_lifo *lifo)
> +{
> +	memset(lifo, 0, sizeof(*lifo));
> +	rte_atomic64_set(&lifo->len, 0);
> +}
> +
> +static __rte_always_inline unsigned int
> +nb_lifo_len(struct nb_lifo *lifo)
> +{
> +	return (unsigned int) rte_atomic64_read(&lifo->len);
> +}
> +
> +static __rte_always_inline void
> +nb_lifo_push(struct nb_lifo *lifo,
> +	     struct nb_lifo_elem *first,
> +	     struct nb_lifo_elem *last,
> +	     unsigned int num)
> +{
> +	while (1) {
> +		struct nb_lifo_head old_head, new_head;
> +
> +		old_head = lifo->head;
> +
> +		/* Swing the top pointer to the first element in the list and
> +		 * make the last element point to the old top.
> +		 */
> +		new_head.top = first;
> +		new_head.cnt = old_head.cnt + 1;
> +
> +		last->next = old_head.top;
> +
> +		if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head,
> +					 (uint64_t *)&old_head,
> +					 (uint64_t *)&new_head))
> +			break;
> +	}
> +
> +	rte_atomic64_add(&lifo->len, num);

I'd like to understand why it is not a problem that change of the list and
increase its length are not atomic. So, we can get wrong length of the stack
in the middle. It would be good to explain it in the comment.

> +}
> +
> +static __rte_always_inline void
> +nb_lifo_push_single(struct nb_lifo *lifo, struct nb_lifo_elem *elem)
> +{
> +	nb_lifo_push(lifo, elem, elem, 1);
> +}
> +
> +static __rte_always_inline struct nb_lifo_elem *
> +nb_lifo_pop(struct nb_lifo *lifo,
> +	    unsigned int num,
> +	    void **obj_table,
> +	    struct nb_lifo_elem **last)
> +{
> +	struct nb_lifo_head old_head;
> +
> +	/* Reserve num elements, if available */
> +	while (1) {
> +		uint64_t len = rte_atomic64_read(&lifo->len);
> +
> +		/* Does the list contain enough elements? */
> +		if (len < num)
> +			return NULL;
> +
> +		if (rte_atomic64_cmpset((volatile uint64_t *)&lifo->len,
> +					len, len - num))
> +			break;
> +	}
> +
> +	/* Pop num elements */
> +	while (1) {
> +		struct nb_lifo_head new_head;
> +		struct nb_lifo_elem *tmp;
> +		unsigned int i;
> +
> +		old_head = lifo->head;
> +
> +		tmp = old_head.top;
> +
> +		/* Traverse the list to find the new head. A next pointer will
> +		 * either point to another element or NULL; if a thread
> +		 * encounters a pointer that has already been popped, the CAS
> +		 * will fail.
> +		 */
> +		for (i = 0; i < num && tmp != NULL; i++) {
> +			if (obj_table)
> +				obj_table[i] = tmp->data;
> +			if (last)
> +				*last = tmp;

Isn't it better to do obj_table and last assignment later when
we successfully reserved elements? If there is not retries,
current solution is optimal, but I guess solution with the second
traversal to fill in obj_table will show more stable performance
results under high load when many retries are done.

> +			tmp = tmp->next;
> +		}
> +
> +		/* If NULL was encountered, the list was modified while
> +		 * traversing it. Retry.
> +		 */
> +		if (i != num)
> +			continue;
> +
> +		new_head.top = tmp;
> +		new_head.cnt = old_head.cnt + 1;
> +
> +		if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head,
> +					 (uint64_t *)&old_head,
> +					 (uint64_t *)&new_head))
> +			break;
> +	}
> +
> +	return old_head.top;
> +}
> +
> +#endif /* _NB_LIFO_H_ */
> diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack.c b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c
> new file mode 100644
> index 000000000..1b30775f7
> --- /dev/null
> +++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c
> @@ -0,0 +1,125 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation
> + */
> +
> +#include <stdio.h>
> +#include <rte_mempool.h>
> +#include <rte_malloc.h>
> +
> +#include "nb_lifo.h"
> +
> +struct rte_mempool_nb_stack {
> +	uint64_t size;
> +	struct nb_lifo used_lifo; /**< LIFO containing mempool pointers  */
> +	struct nb_lifo free_lifo; /**< LIFO containing unused LIFO elements */
> +};
> +
> +static int
> +nb_stack_alloc(struct rte_mempool *mp)
> +{
> +	struct rte_mempool_nb_stack *s;
> +	struct nb_lifo_elem *elems;
> +	unsigned int n = mp->size;
> +	unsigned int size, i;
> +
> +	size = sizeof(*s) + n * sizeof(struct nb_lifo_elem);
> +
> +	/* Allocate our local memory structure */
> +	s = rte_zmalloc_socket("mempool-nb_stack",
> +			       size,
> +			       RTE_CACHE_LINE_SIZE,
> +			       mp->socket_id);
> +	if (s == NULL) {
> +		RTE_LOG(ERR, MEMPOOL, "Cannot allocate nb_stack!\n");
> +		return -ENOMEM;
> +	}
> +
> +	s->size = n;
> +
> +	nb_lifo_init(&s->used_lifo);
> +	nb_lifo_init(&s->free_lifo);
> +
> +	elems = (struct nb_lifo_elem *) &s[1];

Does checkpatch.sh generate warning here because of space after type cast?
There are few similar cases in the patch.

> +	for (i = 0; i < n; i++)
> +		nb_lifo_push_single(&s->free_lifo, &elems[i]);
> +
> +	mp->pool_data = s;
> +
> +	return 0;
> +}
> +
> +static int
> +nb_stack_enqueue(struct rte_mempool *mp, void * const *obj_table,
> +		 unsigned int n)
> +{
> +	struct rte_mempool_nb_stack *s = mp->pool_data;
> +	struct nb_lifo_elem *first, *last, *tmp;
> +	unsigned int i;
> +
> +	if (unlikely(n == 0))
> +		return 0;
> +
> +	/* Pop n free elements */
> +	first = nb_lifo_pop(&s->free_lifo, n, NULL, NULL);
> +	if (unlikely(!first))

Just a nit, but as far as I know comparison with NULL is typically used 
in DPDK.
(few cases in the patch)

> +		return -ENOBUFS;
> +
> +	/* Prepare the list elements */
> +	tmp = first;
> +	for (i = 0; i < n; i++) {
> +		tmp->data = obj_table[i];
> +		last = tmp;
> +		tmp = tmp->next;
> +	}
> +
> +	/* Enqueue them to the used list */
> +	nb_lifo_push(&s->used_lifo, first, last, n);
> +
> +	return 0;
> +}
> +
> +static int
> +nb_stack_dequeue(struct rte_mempool *mp, void **obj_table,
> +		 unsigned int n)
> +{
> +	struct rte_mempool_nb_stack *s = mp->pool_data;
> +	struct nb_lifo_elem *first, *last;
> +
> +	if (unlikely(n == 0))
> +		return 0;
> +
> +	/* Pop n used elements */
> +	first = nb_lifo_pop(&s->used_lifo, n, obj_table, &last);
> +	if (unlikely(!first))
> +		return -ENOENT;
> +
> +	/* Enqueue the list elements to the free list */
> +	nb_lifo_push(&s->free_lifo, first, last, n);
> +
> +	return 0;
> +}
> +
> +static unsigned
> +nb_stack_get_count(const struct rte_mempool *mp)
> +{
> +	struct rte_mempool_nb_stack *s = mp->pool_data;
> +
> +	return nb_lifo_len(&s->used_lifo);
> +}
> +
> +static void
> +nb_stack_free(struct rte_mempool *mp)
> +{
> +	rte_free((void *)(mp->pool_data));

I think type cast is not required.

> +}
> +
> +static struct rte_mempool_ops ops_nb_stack = {
> +	.name = "nb_stack",
> +	.alloc = nb_stack_alloc,
> +	.free = nb_stack_free,
> +	.enqueue = nb_stack_enqueue,
> +	.dequeue = nb_stack_dequeue,
> +	.get_count = nb_stack_get_count
> +};
> +
> +MEMPOOL_REGISTER_OPS(ops_nb_stack);
> diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
> new file mode 100644
> index 000000000..fc8c95e91
> --- /dev/null
> +++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map
> @@ -0,0 +1,4 @@
> +DPDK_19.05 {
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 02e8b6f05..0b11d9417 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -133,6 +133,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
>   
>   _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += -lrte_mempool_bucket
>   _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK)  += -lrte_mempool_stack
> +_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK)  += -lrte_mempool_nb_stack

It is better to sort it alphabetically.

>   ifeq ($(CONFIG_RTE_LIBRTE_DPAA_BUS),y)
>   _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA_MEMPOOL)   += -lrte_mempool_dpaa
>   endif

  reply	other threads:[~2019-01-13 13:31 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 20:55 [dpdk-dev] [PATCH 0/3] Add non-blocking stack mempool handler Gage Eads
2019-01-10 20:55 ` [dpdk-dev] [PATCH 1/3] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-13 12:18   ` Andrew Rybchenko
2019-01-14  4:29     ` Varghese, Vipin
2019-01-14 15:46       ` Eads, Gage
2019-01-16  4:34         ` Varghese, Vipin
2019-01-14 15:43     ` Eads, Gage
2019-01-10 20:55 ` [dpdk-dev] [PATCH 2/3] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-13 13:31   ` Andrew Rybchenko [this message]
2019-01-14 16:22     ` Eads, Gage
2019-01-10 20:55 ` [dpdk-dev] [PATCH 3/3] doc: add NB stack comment to EAL "known issues" Gage Eads
2019-01-15 22:32 ` [dpdk-dev] [PATCH v2 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-15 22:32   ` [dpdk-dev] [PATCH v2 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17  8:49     ` Gavin Hu (Arm Technology China)
2019-01-17 15:14       ` Eads, Gage
2019-01-17 15:57         ` Gavin Hu (Arm Technology China)
2019-01-15 22:32   ` [dpdk-dev] [PATCH v2 2/2] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-16  7:13     ` Andrew Rybchenko
2019-01-17  8:06     ` Gavin Hu (Arm Technology China)
2019-01-17 14:11       ` Eads, Gage
2019-01-17 14:20         ` Bruce Richardson
2019-01-17 15:16           ` Eads, Gage
2019-01-17 15:42             ` Gavin Hu (Arm Technology China)
2019-01-17 20:41               ` Eads, Gage
2019-01-16 15:18   ` [dpdk-dev] [PATCH v3 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-16 15:18     ` [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17 15:45       ` Honnappa Nagarahalli
2019-01-17 23:03         ` Eads, Gage
2019-01-18  5:27           ` Honnappa Nagarahalli
2019-01-18 22:01             ` Eads, Gage
2019-01-22 20:30               ` Honnappa Nagarahalli
2019-01-22 22:25                 ` Eads, Gage
2019-01-24  5:21                   ` Honnappa Nagarahalli
2019-01-25 17:19                     ` Eads, Gage
2019-01-16 15:18     ` [dpdk-dev] [PATCH v3 2/2] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-17 15:36     ` [dpdk-dev] [PATCH v4 0/2] Add non-blocking stack mempool handler Gage Eads
2019-01-17 15:36       ` [dpdk-dev] [PATCH v4 1/2] eal: add 128-bit cmpset (x86-64 only) Gage Eads
2019-01-17 15:36       ` [dpdk-dev] [PATCH v4 2/2] mempool/nb_stack: add non-blocking stack mempool Gage Eads
2019-01-18  5:05         ` Honnappa Nagarahalli
2019-01-18 20:09           ` Eads, Gage
2019-01-19  0:00           ` Eads, Gage
2019-01-19  0:15             ` Thomas Monjalon
2019-01-22 18:24               ` Eads, Gage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74affd62-a694-b73a-69b9-28a884f80f61@solarflare.com \
    --to=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=gage.eads@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).