From: Ola Liljedahl <Ola.Liljedahl@arm.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
"gage.eads@intel.com" <gage.eads@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: nd <nd@arm.com>
Subject: Re: [dpdk-dev] [RFC,v2] lfring: lock-free ring buffer
Date: Sat, 15 Jun 2019 21:00:12 +0000 [thread overview]
Message-ID: <f5223d64b70733ffd56869b8c8e140776bcfea81.camel@arm.com> (raw)
In-Reply-To: <9184057F7FC11744A2107296B6B8EB1E68CD76ED@FMSMSX108.amr.corp.intel.com>
On Wed, 2019-06-05 at 18:21 +0000, Eads, Gage wrote:
> Hi Ola,
>
> Is it possible to add burst enqueue and dequeue functions as well, so we can
> plug this into a mempool handler?
Proper burst enqueue is difficult or at least not very efficient.
> Besides the mempool handler, I think the all-or-nothing semantics would be
> useful for applications. Besides that, this RFC looks good at a high level.
>
> For a complete submission, a few more changes are needed:
> - Builds: Need to add 'lfring' to lib/meson.build and mk/rte.app.mk
> - Documentation: need to update release notes, add a new section in the
> programmer's guide, and update the doxygen configuration files
> - Tests: This patchset should add a set of lfring tests as well
>
> Code comments follow.
Thanks for the review comments, I only had time to look at a few of them. I will
return with more complete answers and a new version of the patch.
-- Ola
>
> <snip>
>
>
> diff --git a/lib/Makefile b/lib/Makefile index d6239d2..cd46339 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -12,6 +12,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_PCI) += librte_pci
> DEPDIRS-librte_pci := librte_eal
> DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring DEPDIRS-librte_ring :=
> librte_eal
> +DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_lfring DEPDIRS-librte_lfring
> +:= librte_eal
>
> LIBRTE_RING -> LIBRTE_LFRING
>
>
> DIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += librte_mempool DEPDIRS-
> librte_mempool := librte_eal librte_ring
> DIRS-$(CONFIG_RTE_LIBRTE_MBUF) += librte_mbuf diff --git
> a/lib/librte_lfring/Makefile b/lib/librte_lfring/Makefile new file mode 100644
> index 0000000..c04d2e9
> --- /dev/null
> +++ b/lib/librte_lfring/Makefile
> @@ -0,0 +1,22 @@
> +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2010-2014 Intel
> +Corporation
>
> Need to correct the copyright
>
>
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_lfring.a
> +
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 LDLIBS += -lrte_eal
> +
> +EXPORT_MAP := rte_lfring_version.map
> +
> +LIBABIVER := 1
> +
> +# all source are stored in SRCS-y
> +SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_lfring.c
>
> LIBRTE_RING -> LIBRTE_LFRING
>
>
> +
> +# install includes
> +SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_lfring.h lockfree16.h
>
> LIBRTE_RING -> LIBRTE_LFRING
>
>
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_lfring/lockfree16.h b/lib/librte_lfring/lockfree16.h
> new
> file mode 100644 index 0000000..199add9
> --- /dev/null
> +++ b/lib/librte_lfring/lockfree16.h
>
> This needs to be refactored to use the rte_atomic128_cmp_exchange interface.
>
> <snip>
>
>
> diff --git a/lib/librte_lfring/meson.build b/lib/librte_lfring/meson.build new
> file mode 100644 index 0000000..c8b90cb
> --- /dev/null
> +++ b/lib/librte_lfring/meson.build
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2017 Intel
> +Corporation
> +
>
> Correct copyright
>
>
> +version = 1
> +sources = files('rte_lfring.c')
> +headers = files('rte_lfring.h','lockfree16.h')
> diff --git a/lib/librte_lfring/rte_lfring.c b/lib/librte_lfring/rte_lfring.c
> new file
> mode 100644 index 0000000..e130f71
> --- /dev/null
> +++ b/lib/librte_lfring/rte_lfring.c
> @@ -0,0 +1,264 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2015 Intel Corporation
> + * Copyright (c) 2019 Arm Ltd
> + * All rights reserved.
> + */
> +
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <sys/queue.h>
> +
> +#include <rte_common.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_launch.h>
> +#include <rte_eal.h>
> +#include <rte_eal_memconfig.h>
> +#include <rte_atomic.h>
> +#include <rte_per_lcore.h>
> +#include <rte_lcore.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>
> +#include <rte_spinlock.h>
> +
>
> You can reduce the system and DPDK includes down to:
>
> #include <inttypes.h>
> #include <string.h>
>
> #include <rte_atomic.h>
> #include <rte_eal.h>
> #include <rte_eal_memconfig.h>
> #include <rte_errno.h>
> #include <rte_malloc.h>
> #include <rte_memzone.h>
> #include <rte_rwlock.h>
> #include <rte_tailq.h>
>
>
> +#include "rte_lfring.h"
> +
> +TAILQ_HEAD(rte_lfring_list, rte_tailq_entry);
> +
> +static struct rte_tailq_elem rte_lfring_tailq = {
> + .name = RTE_TAILQ_LFRING_NAME,
> +};
> +EAL_REGISTER_TAILQ(rte_lfring_tailq)
> +
> +/* true if x is a power of 2 */
> +#define POWEROF2(x) ((((x)-1) & (x)) == 0)
>
> DPDK provides this functionality with RTE_IS_POWER_OF_2(), in rte_common.h
>
>
> +
> +/* return the size of memory occupied by a ring */ ssize_t
> +rte_lfring_get_memsize(unsigned int count) {
> + ssize_t sz;
> +
> + /* count must be a power of 2 */
> + if ((!POWEROF2(count)) || (count > 0x80000000U)) {
> + RTE_LOG(ERR, RING,
>
> This library needs to replace the RING logging with its own LFRING logging.
>
> <snip>
>
>
> +/* create the ring */
>
> The comments before function definitions (e.g. "create the ring", "free the
> ring") probably aren't necessary, the names are self-explanatory.
>
>
> +struct rte_lfring *
> +rte_lfring_create(const char *name, unsigned int count, int socket_id,
> + unsigned int flags)
> +{
> + char mz_name[RTE_MEMZONE_NAMESIZE];
> + struct rte_lfring *r;
> + struct rte_tailq_entry *te;
> + const struct rte_memzone *mz;
> + ssize_t ring_size;
> + int mz_flags = 0;
> + struct rte_lfring_list *ring_list = NULL;
> + const unsigned int requested_count = count;
> + int ret;
> +
> + ring_list = RTE_TAILQ_CAST(rte_lfring_tailq.head, rte_lfring_list);
> +
> + count = rte_align32pow2(count);
> +
> + ring_size = rte_lfring_get_memsize(count);
> + if (ring_size < 0) {
> + rte_errno = ring_size;
> + return NULL;
> + }
> +
> + ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
> + RTE_LFRING_MZ_PREFIX, name);
> + if (ret < 0 || ret >= (int)sizeof(mz_name)) {
> + rte_errno = ENAMETOOLONG;
> + return NULL;
> + }
> +
> + te = rte_zmalloc("LFRING_TAILQ_ENTRY", sizeof(*te), 0);
> + if (te == NULL) {
> + RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
> + rte_errno = ENOMEM;
> + return NULL;
> + }
> +
> + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> +
> + /* reserve a memory zone for this ring. If we can't get rte_config or
> + * we are secondary process, the memzone_reserve function will set
> + * rte_errno for us appropriately - hence no check in this this
> + * function
> + */
>
> The "or we are secondary process" comment is out-of-date; the memzone reserve
> functions can be called by a secondary process. (For that matter, the
> documentation in rte_memzone.h is out-of-date as well.) This restriction was
> removed in commit 916e4f4f4e45.
>
> <snip>
>
>
> +/* free the ring */
> +void
> +rte_lfring_free(struct rte_lfring *r)
> +{
> + struct rte_lfring_list *ring_list = NULL;
> + struct rte_tailq_entry *te;
> +
> + if (r == NULL)
> + return;
> +
> + /*
> + * Ring was not created with rte_lfring_create,
> + * therefore, there is no memzone to free.
> + */
> + if (r->memzone == NULL) {
> + RTE_LOG(ERR, RING, "Cannot free ring (not created with
> rte_lfring_create()");
> + return;
> + }
> +
> + if (rte_memzone_free(r->memzone) != 0) {
> + RTE_LOG(ERR, RING, "Cannot free memory\n");
> + return;
> + }
>
> Better to free the ring's memzone after taking it off the list, in the
> unlikely event another thread looks it up at the same time and attempts to use
> it.
>
>
> +
> + ring_list = RTE_TAILQ_CAST(rte_lfring_tailq.head, rte_lfring_list);
> + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> +
> + /* find out tailq entry */
> + TAILQ_FOREACH(te, ring_list, next) {
> + if (te->data == (void *) r)
> + break;
> + }
> +
> + if (te == NULL) {
> + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> + return;
> + }
> +
> + TAILQ_REMOVE(ring_list, te, next);
> +
> + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> + rte_free(te);
> +}
>
> <snip>
>
>
> +/* search a ring from its name */
> +struct rte_lfring *
> +rte_lfring_lookup(const char *name)
> +{
> + struct rte_tailq_entry *te;
> + struct rte_lfring *r = NULL;
> + struct rte_lfring_list *ring_list;
> +
> + ring_list = RTE_TAILQ_CAST(rte_lfring_tailq.head, rte_lfring_list);
> +
> + rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
> +
> + TAILQ_FOREACH(te, ring_list, next) {
> + r = (struct rte_lfring *) te->data;
> + if (strncmp(name, r->name, RTE_LFRING_NAMESIZE) == 0)
>
> Missing a NULL pointer check before dereferencing 'name'
Why shouldn't the program crash if someone passes a NULL pointer parameter?
Callers will be internal, external users should be able to control whether NULL
is passed instead of a valid pointer.
A crash and a core dump is the best way to detect and debug errors.
>
>
> + break;
> + }
> +
> + rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> + if (te == NULL) {
> + rte_errno = ENOENT;
> + return NULL;
> + }
> +
> + return r;
> +}
> diff --git a/lib/librte_lfring/rte_lfring.h b/lib/librte_lfring/rte_lfring.h
> new file
> mode 100644 index 0000000..22d3e1c
> --- /dev/null
> +++ b/lib/librte_lfring/rte_lfring.h
> @@ -0,0 +1,621 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2019 Arm Ltd
> + * All rights reserved.
> + */
> +
> +#ifndef _RTE_LFRING_H_
> +#define _RTE_LFRING_H_
> +
> +/**
> + * @file
> + * RTE Lfring
> + *
> + * The Lfring Manager is a fixed-size queue, implemented as a table of
> + * (pointers + counters).
> + *
> + * - FIFO (First In First Out)
> + * - Maximum size is fixed; the pointers are stored in a table.
> + * - Lock-free implementation.
> + * - Multi- or single-consumer dequeue.
> + * - Multi- or single-producer enqueue.
> + *
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <sys/queue.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +
>
> You should be able to clean up the includes here too, e.g. rte_pause.h doesn't
> appear to be used.
>
>
> +#include "lockfree16.h"
> +
> +#define RTE_TAILQ_LFRING_NAME "RTE_LFRING"
> +
> +#define RTE_LFRING_MZ_PREFIX "RG_"
> +/**< The maximum length of an lfring name. */ #define
>
> Change "/**<" to "/**", otherwise doxygen applies the comment to
> RTE_LFRING_MZ_PREFIX
>
>
> +RTE_LFRING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
> + sizeof(RTE_LFRING_MZ_PREFIX) + 1)
> +
> +struct rte_memzone; /* forward declaration, so as not to require
> +memzone.h */
>
> This forward declaration is no longer necessary
>
> <snip>
>
>
> +/**
> + * Calculate the memory size needed for a lfring
> + *
> + * This function returns the number of bytes needed for a lfring, given
> + * the number of elements in it. This value is the sum of the size of
> + * the structure rte_lfring and the size of the memory needed by the
> + * objects pointers. The value is aligned to a cache line size.
> + *
> + * @param count
> + * The number of elements in the lfring (must be a power of 2).
> + * @return
> + * - The memory size needed for the lfring on success.
> + * - -EINVAL if count is not a power of 2.
>
> Missing error case: -EINVAL if size exceeds size limit
>
>
> + */
> +ssize_t rte_lfring_get_memsize(unsigned int count);
> +
> +/**
> + * Initialize a lfring structure.
> + *
> + * Initialize a lfring structure in memory pointed by "r". The size of
> +the
> + * memory area must be large enough to store the lfring structure and
> +the
> + * object table. It is advised to use rte_lfring_get_memsize() to get
> +the
> + * appropriate size.
> + *
> + * The lfring size is set to *count*, which must be a power of two.
> +Water
> + * marking is disabled by default. The real usable lfring size is
> + * *count-1* instead of *count* to differentiate a free lfring from an
> + * empty lfring.
>
> "free lfring" -> should say "full lfring"? Same applies to
> rte_lfring_create().
>
>
> + *
> + * The lfring is not added in RTE_TAILQ_LFRING global list. Indeed, the
> + * memory given by the caller may not be shareable among dpdk
> + * processes.
>
> Here and in rte_lfring_create(), the documentation mentions whether or not the
> lfring is added to the RTE_TAILQ_LFRING list. I don't think this makes sense
> in the documentation, as it pertains to the implementation and not the
> interface.
>
>
> + *
> + * @param r
> + * The pointer to the lfring structure followed by the objects table.
> + * @param name
> + * The name of the lfring.
> + * @param count
> + * The number of elements in the lfring (must be a power of 2).
> + * @param flags
> + * @return
> + * 0 on success, or a negative value on error.
> + */
> +int rte_lfring_init(struct rte_lfring *r, const char *name, unsigned int
> count,
> + unsigned int flags);
> +
> +/**
> + * Create a new lfring named *name* in memory.
> + *
> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_lfring_init() to initialize an empty lfring.
> + *
> + * The new lfring size is set to *count*, which must be a power of
> + * two. Water marking is disabled by default. The real usable lfring
> +size
> + * is *count-1* instead of *count* to differentiate a free lfring from
> +an
> + * empty lfring.
> + *
> + * The lfring is added in RTE_TAILQ_LFRING list.
> + *
> + * @param name
> + * The name of the lfring.
> + * @param count
> + * The size of the lfring (must be a power of 2).
> + * @param socket_id
> + * The *socket_id* argument is the socket identifier in case of
> + * NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + * constraint for the reserved zone.
> + * @param flags
> + * An OR of the following:
> + * - LFRING_F_SP_ENQ: If this flag is set, the default behavior when
> + * using ``rte_lfring_enqueue()`` or ``rte_lfring_enqueue_bulk()``
> + * is "single-producer". Otherwise, it is "multi-producers".
> + * - LFRING_F_SC_DEQ: If this flag is set, the default behavior when
> + * using ``rte_lfring_dequeue()`` or ``rte_lfring_dequeue_bulk()``
> + * is "single-consumer". Otherwise, it is "multi-consumers".
> + * @return
> + * On success, the pointer to the new allocated lfring. NULL on error with
> + * rte_errno set appropriately. Possible errno values include:
> + * - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> structure
> + * - E_RTE_SECONDARY - function was called from a secondary process
> instance
>
> E_RTE_SECONDARY and E_RTE_NO_CONFIG are not possible error cases
>
> <snip>
>
>
> +#define ENQ_RETRY_LIMIT 32
>
> Per the coding guidelines (
> https://doc.dpdk.org/guides/contributing/coding_style.html#macros), macros
> should be prefixed with RTE_.
>
> <snip>
>
>
> +/**
> + * Return the number of elements which can be stored in the lfring.
> + *
> + * @param r
> + * A pointer to the lfring structure.
> + * @return
> + * The usable size of the lfring.
> + */
> +static inline unsigned int
> +rte_lfring_get_capacity(const struct rte_lfring *r) {
> + return r->size;
>
> I believe this should return r->mask, to account for the one unusable ring
> entry.
I think this is a mistake, all ring entries should be usable.
>
> <snip>
>
>
> diff --git a/lib/librte_lfring/rte_lfring_version.map
> b/lib/librte_lfring/rte_lfring_version.map
> new file mode 100644
> index 0000000..d935efd
> --- /dev/null
> +++ b/lib/librte_lfring/rte_lfring_version.map
> @@ -0,0 +1,19 @@
> +DPDK_2.0 {
> + global:
> +
> + rte_ring_create;
> + rte_ring_dump;
> + rte_ring_get_memsize;
> + rte_ring_init;
> + rte_ring_list_dump;
> + rte_ring_lookup;
>
> Need to fix function names and DPDK version number
>
> Thanks,
> Gage
>
>
--
Ola Liljedahl, System Architect, Arm
next prev parent reply other threads:[~2019-06-15 21:00 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 16:36 Ola Liljedahl
2019-06-05 18:21 ` Eads, Gage
2019-06-15 21:00 ` Ola Liljedahl [this message]
2019-06-18 17:06 ` Eads, Gage
2023-06-09 17:10 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5223d64b70733ffd56869b8c8e140776bcfea81.camel@arm.com \
--to=ola.liljedahl@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=gage.eads@intel.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).