From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [67.231.154.164]) by dpdk.org (Postfix) with ESMTP id E90901B4D3 for ; Sun, 13 Jan 2019 14:31:13 +0100 (CET) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mx1-us1.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 24EFE140053; Sun, 13 Jan 2019 13:31:12 +0000 (UTC) Received: from [192.168.38.17] (91.220.146.112) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Sun, 13 Jan 2019 13:31:05 +0000 To: Gage Eads , CC: , , References: <20190110205538.24435-1-gage.eads@intel.com> <20190110205538.24435-3-gage.eads@intel.com> From: Andrew Rybchenko Message-ID: <74affd62-a694-b73a-69b9-28a884f80f61@solarflare.com> Date: Sun, 13 Jan 2019 16:31:01 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190110205538.24435-3-gage.eads@intel.com> Content-Language: en-GB X-Originating-IP: [91.220.146.112] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24360.003 X-TM-AS-Result: No-23.027200-8.000000-10 X-TMASE-MatchedRID: 8HTFlOrbAtH1Dewoh2WPY23NvezwBrVmQKuv8uQBDjqo+b+yOP0oGOY8 6qN+YNggDUvdohVkx3gldmD/ifhOevSQ6ZWK8JPlBu2zRCSrLjZfohHCqSnabmd6vNuG6CqySwi KuxmR+atNZpHTQxSfsbMF2knp0RB44X8HWMSNX7p/TWpwlAOFXuHCwRwMNQUW8cWgFw6wp7N/y8 /c0+UXk2IUjnyiR9E2biz4PqgfuBYmNHlC7DQV5e7KTDtx8Cggh+w9Wz/xXDo/DgG5YFx+BjBuJ IwLCXmgT+8v17QjdI2y9AsAxk+TJeksd7I8q0sZi/b+LxU2p46bKpAlY2y6SbXvDHySC+eUwbz5 Zl605FhplOjFmhr0xczT60btFDs5qH7dGVt6/4mQOktEo73GFNFqG4/BpDVaXCmcAC8DBrPgnNB Mw+pum24ISCKYLIlDXU3SBSQXLcLlTJ+0UBH8J5D6BbDN9+jOwjj8dZF/W2ANZrZsz6zmQz5QFs H1QNsOJh0QhcxpU1RC65348+oZgVlP3wf1xZicutvHF25zoU+AZr6hinmLqoKwF4K/wIz9vYSgG qfGrUhQQ2KJAqoek5yXqfF6zWv7RcriDaoCTGK3UCG/IQp2PhitPnqDWwg+uqWf6Nh7tmHwtZWt kkk1NJBINHyMDfCbhAWki4cbZIF3J2vhYAI6NMnUT+eskUQPezSI3KNQ1fIHiwiF9OOogUqyUzt VRoTBdvgGbsWpCcvPX7sUkKqkssgNYoWONkdvL4+sB3yBsckIUQiRW/XLrOcffauJFTlIeUQNvU 47zuh+DClqsI1Js4nq4a08mBQP319rGX8jiaGeAiCmPx4NwLTrdaH1ZWqC1B0Hk1Q1KyLgfCfWl nNb/+YW23/lu+clH134zSmy49kT1s4jJyReA+kqqJzF1otJ0cnlPIuVMJjSoRdl+jZ+MQ== X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--23.027200-8.000000 X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24360.003 X-MDID: 1547386273-Z5VTCD7yrwDC Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH 2/3] mempool/nb_stack: add non-blocking stack mempool X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2019 13:31:14 -0000 Hi Gage, In general looks very good. Have you considered to make nb_lifo.h a library to be reusable outside of the mempool driver? There are few notes below. Thanks, Andrew. On 1/10/19 11:55 PM, Gage Eads wrote: > This commit adds support for non-blocking (linked list based) stack mempool > handler. The stack uses a 128-bit compare-and-swap instruction, and thus is > limited to x86_64. The 128-bit CAS atomically updates the stack top pointer > and a modification counter, which protects against the ABA problem. > > In mempool_perf_autotest the lock-based stack outperforms the non-blocking > handler*, however: > - For applications with preemptible pthreads, a lock-based stack's > worst-case performance (i.e. one thread being preempted while > holding the spinlock) is much worse than the non-blocking stack's. > - Using per-thread mempool caches will largely mitigate the performance > difference. > > *Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4, > running on isolcpus cores with a tickless scheduler. The lock-based stack's > rate_persec was 1x-3.5x the non-blocking stack's. > > Signed-off-by: Gage Eads > --- > MAINTAINERS | 4 + > config/common_base | 1 + > drivers/mempool/Makefile | 1 + > drivers/mempool/nb_stack/Makefile | 30 +++++ > drivers/mempool/nb_stack/meson.build | 4 + > drivers/mempool/nb_stack/nb_lifo.h | 132 +++++++++++++++++++++ > drivers/mempool/nb_stack/rte_mempool_nb_stack.c | 125 +++++++++++++++++++ > .../nb_stack/rte_mempool_nb_stack_version.map | 4 + > mk/rte.app.mk | 1 + > 9 files changed, 302 insertions(+) > create mode 100644 drivers/mempool/nb_stack/Makefile > create mode 100644 drivers/mempool/nb_stack/meson.build > create mode 100644 drivers/mempool/nb_stack/nb_lifo.h > create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack.c > create mode 100644 drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map > > diff --git a/MAINTAINERS b/MAINTAINERS > index 470f36b9c..5519d3323 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -416,6 +416,10 @@ M: Artem V. Andreev > M: Andrew Rybchenko > F: drivers/mempool/bucket/ > > +Non-blocking stack memory pool > +M: Gage Eads > +F: drivers/mempool/nb_stack/ > + > > Bus Drivers > ----------- > diff --git a/config/common_base b/config/common_base > index 964a6956e..40ce47312 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -728,6 +728,7 @@ CONFIG_RTE_DRIVER_MEMPOOL_BUCKET=y > CONFIG_RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB=64 > CONFIG_RTE_DRIVER_MEMPOOL_RING=y > CONFIG_RTE_DRIVER_MEMPOOL_STACK=y > +CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK=y Typically it is alphabetically sorted. > # > # Compile PMD for octeontx fpa mempool device > diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile > index 28c2e8360..aeae3ac12 100644 > --- a/drivers/mempool/Makefile > +++ b/drivers/mempool/Makefile > @@ -13,5 +13,6 @@ endif > DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING) += ring > DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += stack > DIRS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += octeontx > +DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += nb_stack Typically it is alphabetically sorted. Yes, already broken, but, please, put it before ring. > > include $(RTE_SDK)/mk/rte.subdir.mk > diff --git a/drivers/mempool/nb_stack/Makefile b/drivers/mempool/nb_stack/Makefile > new file mode 100644 > index 000000000..38b45f4f5 > --- /dev/null > +++ b/drivers/mempool/nb_stack/Makefile > @@ -0,0 +1,30 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2019 Intel Corporation > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# The non-blocking stack uses a 128-bit compare-and-swap instruction, and thus > +# is limited to x86_64. > +ifeq ($(CONFIG_RTE_ARCH_X86_64),y) > + > +# > +# library name > +# > +LIB = librte_mempool_nb_stack.a > + > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > + > +# Headers > +CFLAGS += -I$(RTE_SDK)/lib/librte_mempool I guess it is derived from stack. Is it really required? There is no such line in ring, bucket, octeontx and dpaa2. > +LDLIBS += -lrte_eal -lrte_mempool > + > +EXPORT_MAP := rte_mempool_nb_stack_version.map > + > +LIBABIVER := 1 > + > +SRCS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += rte_mempool_nb_stack.c > + > +endif > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/mempool/nb_stack/meson.build b/drivers/mempool/nb_stack/meson.build > new file mode 100644 > index 000000000..66d64a9ba > --- /dev/null > +++ b/drivers/mempool/nb_stack/meson.build > @@ -0,0 +1,4 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2019 Intel Corporation > + > +sources = files('rte_mempool_nb_stack.c') Have no tested meson build for non-x86_64 target? I guess it should be fixed to skip it on non-x86_64 builds. > diff --git a/drivers/mempool/nb_stack/nb_lifo.h b/drivers/mempool/nb_stack/nb_lifo.h > new file mode 100644 > index 000000000..701d75e37 > --- /dev/null > +++ b/drivers/mempool/nb_stack/nb_lifo.h > @@ -0,0 +1,132 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2019 Intel Corporation > + */ > + > +#ifndef _NB_LIFO_H_ > +#define _NB_LIFO_H_ > + > +struct nb_lifo_elem { > + void *data; > + struct nb_lifo_elem *next; > +}; > + > +struct nb_lifo_head { > + struct nb_lifo_elem *top; /**< Stack top */ > + uint64_t cnt; /**< Modification counter */ > +}; > + > +struct nb_lifo { > + volatile struct nb_lifo_head head __rte_aligned(16); > + rte_atomic64_t len; > +} __rte_cache_aligned; > + > +static __rte_always_inline void > +nb_lifo_init(struct nb_lifo *lifo) > +{ > + memset(lifo, 0, sizeof(*lifo)); > + rte_atomic64_set(&lifo->len, 0); > +} > + > +static __rte_always_inline unsigned int > +nb_lifo_len(struct nb_lifo *lifo) > +{ > + return (unsigned int) rte_atomic64_read(&lifo->len); > +} > + > +static __rte_always_inline void > +nb_lifo_push(struct nb_lifo *lifo, > + struct nb_lifo_elem *first, > + struct nb_lifo_elem *last, > + unsigned int num) > +{ > + while (1) { > + struct nb_lifo_head old_head, new_head; > + > + old_head = lifo->head; > + > + /* Swing the top pointer to the first element in the list and > + * make the last element point to the old top. > + */ > + new_head.top = first; > + new_head.cnt = old_head.cnt + 1; > + > + last->next = old_head.top; > + > + if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head, > + (uint64_t *)&old_head, > + (uint64_t *)&new_head)) > + break; > + } > + > + rte_atomic64_add(&lifo->len, num); I'd like to understand why it is not a problem that change of the list and increase its length are not atomic. So, we can get wrong length of the stack in the middle. It would be good to explain it in the comment. > +} > + > +static __rte_always_inline void > +nb_lifo_push_single(struct nb_lifo *lifo, struct nb_lifo_elem *elem) > +{ > + nb_lifo_push(lifo, elem, elem, 1); > +} > + > +static __rte_always_inline struct nb_lifo_elem * > +nb_lifo_pop(struct nb_lifo *lifo, > + unsigned int num, > + void **obj_table, > + struct nb_lifo_elem **last) > +{ > + struct nb_lifo_head old_head; > + > + /* Reserve num elements, if available */ > + while (1) { > + uint64_t len = rte_atomic64_read(&lifo->len); > + > + /* Does the list contain enough elements? */ > + if (len < num) > + return NULL; > + > + if (rte_atomic64_cmpset((volatile uint64_t *)&lifo->len, > + len, len - num)) > + break; > + } > + > + /* Pop num elements */ > + while (1) { > + struct nb_lifo_head new_head; > + struct nb_lifo_elem *tmp; > + unsigned int i; > + > + old_head = lifo->head; > + > + tmp = old_head.top; > + > + /* Traverse the list to find the new head. A next pointer will > + * either point to another element or NULL; if a thread > + * encounters a pointer that has already been popped, the CAS > + * will fail. > + */ > + for (i = 0; i < num && tmp != NULL; i++) { > + if (obj_table) > + obj_table[i] = tmp->data; > + if (last) > + *last = tmp; Isn't it better to do obj_table and last assignment later when we successfully reserved elements? If there is not retries, current solution is optimal, but I guess solution with the second traversal to fill in obj_table will show more stable performance results under high load when many retries are done. > + tmp = tmp->next; > + } > + > + /* If NULL was encountered, the list was modified while > + * traversing it. Retry. > + */ > + if (i != num) > + continue; > + > + new_head.top = tmp; > + new_head.cnt = old_head.cnt + 1; > + > + if (rte_atomic128_cmpset((volatile uint64_t *) &lifo->head, > + (uint64_t *)&old_head, > + (uint64_t *)&new_head)) > + break; > + } > + > + return old_head.top; > +} > + > +#endif /* _NB_LIFO_H_ */ > diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack.c b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c > new file mode 100644 > index 000000000..1b30775f7 > --- /dev/null > +++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack.c > @@ -0,0 +1,125 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2019 Intel Corporation > + */ > + > +#include > +#include > +#include > + > +#include "nb_lifo.h" > + > +struct rte_mempool_nb_stack { > + uint64_t size; > + struct nb_lifo used_lifo; /**< LIFO containing mempool pointers */ > + struct nb_lifo free_lifo; /**< LIFO containing unused LIFO elements */ > +}; > + > +static int > +nb_stack_alloc(struct rte_mempool *mp) > +{ > + struct rte_mempool_nb_stack *s; > + struct nb_lifo_elem *elems; > + unsigned int n = mp->size; > + unsigned int size, i; > + > + size = sizeof(*s) + n * sizeof(struct nb_lifo_elem); > + > + /* Allocate our local memory structure */ > + s = rte_zmalloc_socket("mempool-nb_stack", > + size, > + RTE_CACHE_LINE_SIZE, > + mp->socket_id); > + if (s == NULL) { > + RTE_LOG(ERR, MEMPOOL, "Cannot allocate nb_stack!\n"); > + return -ENOMEM; > + } > + > + s->size = n; > + > + nb_lifo_init(&s->used_lifo); > + nb_lifo_init(&s->free_lifo); > + > + elems = (struct nb_lifo_elem *) &s[1]; Does checkpatch.sh generate warning here because of space after type cast? There are few similar cases in the patch. > + for (i = 0; i < n; i++) > + nb_lifo_push_single(&s->free_lifo, &elems[i]); > + > + mp->pool_data = s; > + > + return 0; > +} > + > +static int > +nb_stack_enqueue(struct rte_mempool *mp, void * const *obj_table, > + unsigned int n) > +{ > + struct rte_mempool_nb_stack *s = mp->pool_data; > + struct nb_lifo_elem *first, *last, *tmp; > + unsigned int i; > + > + if (unlikely(n == 0)) > + return 0; > + > + /* Pop n free elements */ > + first = nb_lifo_pop(&s->free_lifo, n, NULL, NULL); > + if (unlikely(!first)) Just a nit, but as far as I know comparison with NULL is typically used in DPDK. (few cases in the patch) > + return -ENOBUFS; > + > + /* Prepare the list elements */ > + tmp = first; > + for (i = 0; i < n; i++) { > + tmp->data = obj_table[i]; > + last = tmp; > + tmp = tmp->next; > + } > + > + /* Enqueue them to the used list */ > + nb_lifo_push(&s->used_lifo, first, last, n); > + > + return 0; > +} > + > +static int > +nb_stack_dequeue(struct rte_mempool *mp, void **obj_table, > + unsigned int n) > +{ > + struct rte_mempool_nb_stack *s = mp->pool_data; > + struct nb_lifo_elem *first, *last; > + > + if (unlikely(n == 0)) > + return 0; > + > + /* Pop n used elements */ > + first = nb_lifo_pop(&s->used_lifo, n, obj_table, &last); > + if (unlikely(!first)) > + return -ENOENT; > + > + /* Enqueue the list elements to the free list */ > + nb_lifo_push(&s->free_lifo, first, last, n); > + > + return 0; > +} > + > +static unsigned > +nb_stack_get_count(const struct rte_mempool *mp) > +{ > + struct rte_mempool_nb_stack *s = mp->pool_data; > + > + return nb_lifo_len(&s->used_lifo); > +} > + > +static void > +nb_stack_free(struct rte_mempool *mp) > +{ > + rte_free((void *)(mp->pool_data)); I think type cast is not required. > +} > + > +static struct rte_mempool_ops ops_nb_stack = { > + .name = "nb_stack", > + .alloc = nb_stack_alloc, > + .free = nb_stack_free, > + .enqueue = nb_stack_enqueue, > + .dequeue = nb_stack_dequeue, > + .get_count = nb_stack_get_count > +}; > + > +MEMPOOL_REGISTER_OPS(ops_nb_stack); > diff --git a/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map > new file mode 100644 > index 000000000..fc8c95e91 > --- /dev/null > +++ b/drivers/mempool/nb_stack/rte_mempool_nb_stack_version.map > @@ -0,0 +1,4 @@ > +DPDK_19.05 { > + > + local: *; > +}; > diff --git a/mk/rte.app.mk b/mk/rte.app.mk > index 02e8b6f05..0b11d9417 100644 > --- a/mk/rte.app.mk > +++ b/mk/rte.app.mk > @@ -133,6 +133,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n) > > _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_BUCKET) += -lrte_mempool_bucket > _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += -lrte_mempool_stack > +_LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_NB_STACK) += -lrte_mempool_nb_stack It is better to sort it alphabetically. > ifeq ($(CONFIG_RTE_LIBRTE_DPAA_BUS),y) > _LDLIBS-$(CONFIG_RTE_LIBRTE_DPAA_MEMPOOL) += -lrte_mempool_dpaa > endif