[dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size
@ 2019-08-28 14:46 Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable " Honnappa Nagarahalli
                   ` (7 more replies)
  0 siblings, 8 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to writes its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who simply end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch consists of 4 parts:
1) New APIs to support configurable ring element size
   These will help reduce code duplication in the templates. I think these
   can be made internal (do not expose to DPDK applications, but expose to
   DPDK libraries), feedback needed.

2) rte_ring templates
   The templates provide an easy way to add new APIs for different ring
   element types/sizes which can be used by multiple libraries. These
   also allow for creating APIs to store elements of custom types
   (for ex: a structure)

   The template needs 4 parameters:
   a) RTE_RING_TMPLT_API_SUFFIX - This is used as a suffix to the
      rte_ring APIs.
      For ex: if RTE_RING_TMPLT_API_SUFFIX is '32b', the API name will be
      rte_ring_create_32b
   b) RTE_RING_TMPLT_ELEM_SIZE - Size of the ring element in bytes.
      For ex: sizeof(uint32_t)
   c) RTE_RING_TMPLT_ELEM_TYPE - Type of the ring element.
      For ex: uint32_t. If a common ring library does not use a standard
      data type, it should create its own type by defining a structure
      with standard data type. For ex: for an elment size of 96b, one
      could define a structure

      struct s_96b {
          uint32_t a[3];
      }
      The common library can use this structure to define
      RTE_RING_TMPLT_ELEM_TYPE.

      The application using this common ring library should define its
      element type as a union with the above structure.

      union app_element_type {
          struct s_96b v;
          struct app_element {
              uint16_t a;
              uint16_t b;
              uint32_t c;
              uint32_t d;
          }
      }
   d) RTE_RING_TMPLT_EXPERIMENTAL - Indicates if the new APIs being defined
      are experimental. Should be set to empty to remove the experimental
      tag.

   The ring library consists of some APIs that are defined as inline
   functions and some APIs that are non-inline functions. The non-inline
   functions are in rte_ring_template.c. However, this file needs to be
   included in other .c files. Any feedback on how to handle this is
   appreciated.

   Note that the templates help create the APIs that are dependent on the
   element size (for ex: rte_ring_create, enqueue/dequeue etc). Other APIs
   that do NOT depend on the element size do not need to be part of the
   template (for ex: rte_ring_dump, rte_ring_count, rte_ring_free_count
   etc).

3) APIs for 32b ring element size
   This uses the templates to create APIs to enqueue/dequeue elements of
   size 32b.

4) rte_hash libray is changed to use 32b ring APIs
   The 32b APIs are used in rte_hash library to store the free slot index
   and free bucket index.

This patch results in following checkpatch issue:
WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'

The patch is following the rules in the existing code. Please let me know
if this needs to be fixed.

Honnappa Nagarahalli (5):
  lib/ring: apis to support configurable element size
  lib/ring: add template to support different element sizes
  tools/checkpatch: relax constraints on __rte_experimental
  lib/ring: add ring APIs to support 32b ring elements
  lib/hash: use ring with 32b element size to save memory

 devtools/checkpatches.sh             |  11 +-
 lib/librte_hash/rte_cuckoo_hash.c    |  55 ++---
 lib/librte_hash/rte_cuckoo_hash.h    |   2 +-
 lib/librte_ring/Makefile             |   9 +-
 lib/librte_ring/meson.build          |  11 +-
 lib/librte_ring/rte_ring.c           |  34 ++-
 lib/librte_ring/rte_ring.h           |  72 ++++++
 lib/librte_ring/rte_ring_32.c        |  19 ++
 lib/librte_ring/rte_ring_32.h        |  36 +++
 lib/librte_ring/rte_ring_template.c  |  46 ++++
 lib/librte_ring/rte_ring_template.h  | 330 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   4 +
 12 files changed, 582 insertions(+), 47 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_32.c
 create mode 100644 lib/librte_ring/rte_ring_32.h
 create mode 100644 lib/librte_ring/rte_ring_template.c
 create mode 100644 lib/librte_ring/rte_ring_template.h

-- 
2.17.1

^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable element size
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
@ 2019-08-28 14:46 ` Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes Honnappa Nagarahalli
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. The new APIs
rte_ring_get_memsize_elem and rte_ring_create_elem help reduce code
duplication while creating rte_ring templates.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |  2 +-
 lib/librte_ring/meson.build          |  3 ++
 lib/librte_ring/rte_ring.c           | 34 +++++++++----
 lib/librte_ring/rte_ring.h           | 72 ++++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |  2 +
 5 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..4c8410229 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..74219840a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,3 +6,6 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..879feb9f6 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -46,23 +46,32 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, size_t esize)
 {
 	ssize_t sz;
 
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be "
+			"power of 2, and do not exceed the limit %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +123,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, size_t esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +144,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +191,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..bbc1202d3 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -122,6 +122,29 @@ struct rte_ring {
 #define __IS_SC 1
 #define __IS_MC 0
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of elements in the ring (recommended to be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned count, size_t esize);
+
 /**
  * Calculate the memory size needed for a ring
  *
@@ -175,6 +198,54 @@ ssize_t rte_ring_get_memsize(unsigned count);
 int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	unsigned flags);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of elements in the ring (recommended to be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
+				size_t esize, int socket_id, unsigned flags);
+
 /**
  * Create a new ring named *name* in memory.
  *
@@ -216,6 +287,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-08-28 14:46 ` Honnappa Nagarahalli
  2019-10-01 11:47   ` Ananyev, Konstantin
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 3/5] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

Add templates to support creating ring APIs with different
ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile            |   4 +-
 lib/librte_ring/meson.build         |   4 +-
 lib/librte_ring/rte_ring_template.c |  46 ++++
 lib/librte_ring/rte_ring_template.h | 330 ++++++++++++++++++++++++++++
 4 files changed, 382 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_template.c
 create mode 100644 lib/librte_ring/rte_ring_template.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 4c8410229..818898110 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_template.h \
+					rte_ring_template.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 74219840a..e4e208a7c 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,9 @@ version = 2
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_template.h',
+		'rte_ring_template.c')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring_template.c b/lib/librte_ring/rte_ring_template.c
new file mode 100644
index 000000000..1ca593f95
--- /dev/null
+++ b/lib/librte_ring/rte_ring_template.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_eal_memconfig.h>
+#include <rte_atomic.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_branch_prediction.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_spinlock.h>
+#include <rte_tailq.h>
+
+#include "rte_ring.h"
+
+/* return the size of memory occupied by a ring */
+ssize_t
+__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, RTE_RING_TMPLT_ELEM_SIZE);
+}
+
+/* create the ring */
+struct rte_ring *
+__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
+		int socket_id, unsigned flags)
+{
+	return rte_ring_create_elem(name, count, RTE_RING_TMPLT_ELEM_SIZE,
+		socket_id, flags);
+}
diff --git a/lib/librte_ring/rte_ring_template.h b/lib/librte_ring/rte_ring_template.h
new file mode 100644
index 000000000..b9b14dfbb
--- /dev/null
+++ b/lib/librte_ring/rte_ring_template.h
@@ -0,0 +1,330 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RING_TEMPLATE_H_
+#define _RTE_RING_TEMPLATE_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_ring.h>
+
+/* Ring API suffix name - used to append to API names */
+#ifndef RTE_RING_TMPLT_API_SUFFIX
+#error RTE_RING_TMPLT_API_SUFFIX not defined
+#endif
+
+/* Ring's element size in bits, should be a power of 2 */
+#ifndef RTE_RING_TMPLT_ELEM_SIZE
+#error RTE_RING_TMPLT_ELEM_SIZE not defined
+#endif
+
+/* Type of ring elements */
+#ifndef RTE_RING_TMPLT_ELEM_TYPE
+#error RTE_RING_TMPLT_ELEM_TYPE not defined
+#endif
+
+#define _rte_fuse(a, b) a##_##b
+#define __rte_fuse(a, b) _rte_fuse(a, b)
+#define __RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
+
+/* Calculate the memory size needed for a ring */
+RTE_RING_TMPLT_EXPERIMENTAL
+ssize_t __RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
+
+/* Create a new ring named *name* in memory. */
+RTE_RING_TMPLT_EXPERIMENTAL
+struct rte_ring *
+__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
+					int socket_id, unsigned flags);
+
+/**
+ * @internal Enqueue several objects on the ring
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
+		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
+		RTE_RING_TMPLT_ELEM_TYPE);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+	unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
+		RTE_RING_TMPLT_ELEM_TYPE);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_TEMPLATE_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH 3/5] tools/checkpatch: relax constraints on __rte_experimental
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes Honnappa Nagarahalli
@ 2019-08-28 14:46 ` Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 4/5] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

Relax the constraints on __rte_experimental usage, allow redefining
to macros.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 devtools/checkpatches.sh | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 560e6ce93..090c9b08a 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -99,9 +99,14 @@ check_experimental_tags() { # <patch>
 			ret = 1;
 		}
 		if ($1 != "+__rte_experimental" || $2 != "") {
-			print "__rte_experimental must appear alone on the line" \
-				" immediately preceding the return type of a function."
-			ret = 1;
+			# code such as "#define XYZ __rte_experimental" is
+			# allowed
+			if ($1 != "+#define") {
+				print "__rte_experimental must appear alone " \
+				      "on the line immediately preceding the " \
+				      "return type of a function."
+				ret = 1;
+			}
 		}
 	}
 	END {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH 4/5] lib/ring: add ring APIs to support 32b ring elements
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
                   ` (2 preceding siblings ...)
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 3/5] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
@ 2019-08-28 14:46 ` Honnappa Nagarahalli
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 5/5] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

Add ring APIs to support 32b ring elements using templates.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |  3 ++-
 lib/librte_ring/meson.build          |  4 +++-
 lib/librte_ring/rte_ring_32.c        | 19 +++++++++++++++
 lib/librte_ring/rte_ring_32.h        | 36 ++++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |  2 ++
 5 files changed, 62 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_32.c
 create mode 100644 lib/librte_ring/rte_ring_32.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 818898110..3102bb64d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -14,10 +14,11 @@ EXPORT_MAP := rte_ring_version.map
 LIBABIVER := 2
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c rte_ring_32.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_32.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
 					rte_ring_template.h \
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index e4e208a7c..81ea53ed7 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -2,8 +2,10 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
-sources = files('rte_ring.c')
+sources = files('rte_ring.c',
+		'rte_ring_32.c')
 headers = files('rte_ring.h',
+		'rte_ring_32.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
 		'rte_ring_template.h',
diff --git a/lib/librte_ring/rte_ring_32.c b/lib/librte_ring/rte_ring_32.c
new file mode 100644
index 000000000..09e90cec1
--- /dev/null
+++ b/lib/librte_ring/rte_ring_32.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include <rte_ring_32.h>
+#include <rte_ring_template.c>
diff --git a/lib/librte_ring/rte_ring_32.h b/lib/librte_ring/rte_ring_32.h
new file mode 100644
index 000000000..5270a9bc7
--- /dev/null
+++ b/lib/librte_ring/rte_ring_32.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RING_32_H_
+#define _RTE_RING_32_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#define RTE_RING_TMPLT_API_SUFFIX 32
+#define RTE_RING_TMPLT_ELEM_SIZE sizeof(uint32_t)
+#define RTE_RING_TMPLT_ELEM_TYPE uint32_t
+#define RTE_RING_TMPLT_EXPERIMENTAL __rte_experimental
+
+#include <rte_ring_template.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_32_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index e410a7503..9efba91bb 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,7 +21,9 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_32;
 	rte_ring_create_elem;
+	rte_ring_get_memsize_32;
 	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH 5/5] lib/hash: use ring with 32b element size to save memory
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
                   ` (3 preceding siblings ...)
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 4/5] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
@ 2019-08-28 14:46 ` Honnappa Nagarahalli
  2019-08-28 15:12 ` [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Jerin Jacob Kollanukkaran
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 14:46 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 55 ++++++++++++++-----------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 26 insertions(+), 31 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..a0cd3360a 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_32.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -213,7 +213,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
+	r = rte_ring_create_32(ring_name, rte_align32pow2(num_key_slots),
 			params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
@@ -227,7 +227,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_32(ext_ring_name,
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -295,7 +295,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * for next bucket
 		 */
 		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_32(r_ext, i);
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -434,7 +434,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
 	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_32(r, i);
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +598,12 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_32(h->free_slots, i);
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, i);
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +622,13 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_32(h->free_slots, slot_id);
 }
 
 /* Search a key from bucket and update its data.
@@ -923,8 +922,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
+	uint32_t slot_id = 0;
+	uint32_t ext_bkt_id = 0;
 	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
@@ -968,7 +967,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_32(h->free_slots,
 					cached_free_slots->objs,
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
@@ -982,13 +981,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_32(h->free_slots, &slot_id) != 0)
 			return -ENOSPC;
-		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
+	new_idx = slot_id;
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1078,12 +1076,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_32(h->free_ext_bkts, &ext_bkt_id) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	bkt_id = ext_bkt_id - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
@@ -1373,7 +1371,7 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_32(h->free_slots,
 						cached_free_slots->objs,
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
@@ -1383,11 +1381,10 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+				bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_32(h->free_slots, bkt->key_idx[i]);
 	}
 }
 
@@ -1551,7 +1548,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, index);
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1611,7 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, index);
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1622,17 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_32(h->free_slots,
 						cached_free_slots->objs,
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_32(h->free_slots, key_idx);
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
                   ` (4 preceding siblings ...)
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 5/5] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2019-08-28 15:12 ` Jerin Jacob Kollanukkaran
  2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
  7 siblings, 0 replies; 173+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-08-28 15:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, yipeng1.wang, sameh.gobriel,
	bruce.richardson, pablo.de.lara.guarch
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> Sent: Wednesday, August 28, 2019 8:16 PM
> To: olivier.matz@6wind.com; yipeng1.wang@intel.com;
> sameh.gobriel@intel.com; bruce.richardson@intel.com;
> pablo.de.lara.guarch@intel.com; honnappa.nagarahalli@arm.com
> Cc: dev@dpdk.org; dharmik.thakkar@arm.com; gavin.hu@arm.com;
> ruifeng.wang@arm.com; nd@arm.com
> Subject: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element
> size
> 
> The current rte_ring hard-codes the type of the ring element to 'void *', hence
> the size of the element is hard-coded to 32b/64b. Since the ring element type is
> not an input to rte_ring APIs, it results in couple of issues:
> 
> 1) If an application requires to store an element which is not 64b, it
>    needs to writes its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who simply end up making
>    work-arounds and often waste memory.

If we are taking this path, Could you change rte_event_ring implementation based
on new framework?



> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
> 
> This patch consists of 4 parts:
> 1) New APIs to support configurable ring element size
>    These will help reduce code duplication in the templates. I think these
>    can be made internal (do not expose to DPDK applications, but expose to
>    DPDK libraries), feedback needed.
> 
> 2) rte_ring templates
>    The templates provide an easy way to add new APIs for different ring
>    element types/sizes which can be used by multiple libraries. These
>    also allow for creating APIs to store elements of custom types
>    (for ex: a structure)
> 
>    The template needs 4 parameters:
>    a) RTE_RING_TMPLT_API_SUFFIX - This is used as a suffix to the
>       rte_ring APIs.
>       For ex: if RTE_RING_TMPLT_API_SUFFIX is '32b', the API name will be
>       rte_ring_create_32b
>    b) RTE_RING_TMPLT_ELEM_SIZE - Size of the ring element in bytes.
>       For ex: sizeof(uint32_t)
>    c) RTE_RING_TMPLT_ELEM_TYPE - Type of the ring element.
>       For ex: uint32_t. If a common ring library does not use a standard
>       data type, it should create its own type by defining a structure
>       with standard data type. For ex: for an elment size of 96b, one
>       could define a structure
> 
>       struct s_96b {
>           uint32_t a[3];
>       }
>       The common library can use this structure to define
>       RTE_RING_TMPLT_ELEM_TYPE.
> 
>       The application using this common ring library should define its
>       element type as a union with the above structure.
> 
>       union app_element_type {
>           struct s_96b v;
>           struct app_element {
>               uint16_t a;
>               uint16_t b;
>               uint32_t c;
>               uint32_t d;
>           }
>       }
>    d) RTE_RING_TMPLT_EXPERIMENTAL - Indicates if the new APIs being defined
>       are experimental. Should be set to empty to remove the experimental
>       tag.
> 
>    The ring library consists of some APIs that are defined as inline
>    functions and some APIs that are non-inline functions. The non-inline
>    functions are in rte_ring_template.c. However, this file needs to be
>    included in other .c files. Any feedback on how to handle this is
>    appreciated.
> 
>    Note that the templates help create the APIs that are dependent on the
>    element size (for ex: rte_ring_create, enqueue/dequeue etc). Other APIs
>    that do NOT depend on the element size do not need to be part of the
>    template (for ex: rte_ring_dump, rte_ring_count, rte_ring_free_count
>    etc).
> 
> 3) APIs for 32b ring element size
>    This uses the templates to create APIs to enqueue/dequeue elements of
>    size 32b.
> 
> 4) rte_hash libray is changed to use 32b ring APIs
>    The 32b APIs are used in rte_hash library to store the free slot index
>    and free bucket index.
> 
> This patch results in following checkpatch issue:
> WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
> 
> The patch is following the rules in the existing code. Please let me know if this
> needs to be fixed.
> 
> Honnappa Nagarahalli (5):
>   lib/ring: apis to support configurable element size
>   lib/ring: add template to support different element sizes
>   tools/checkpatch: relax constraints on __rte_experimental
>   lib/ring: add ring APIs to support 32b ring elements
>   lib/hash: use ring with 32b element size to save memory
> 
>  devtools/checkpatches.sh             |  11 +-
>  lib/librte_hash/rte_cuckoo_hash.c    |  55 ++---
>  lib/librte_hash/rte_cuckoo_hash.h    |   2 +-
>  lib/librte_ring/Makefile             |   9 +-
>  lib/librte_ring/meson.build          |  11 +-
>  lib/librte_ring/rte_ring.c           |  34 ++-
>  lib/librte_ring/rte_ring.h           |  72 ++++++
>  lib/librte_ring/rte_ring_32.c        |  19 ++
>  lib/librte_ring/rte_ring_32.h        |  36 +++
>  lib/librte_ring/rte_ring_template.c  |  46 ++++
> lib/librte_ring/rte_ring_template.h  | 330 +++++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |   4 +
>  12 files changed, 582 insertions(+), 47 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_32.c  create mode 100644 lib/librte_ring/rte_ring_32.h
> create mode 100644 lib/librte_ring/rte_ring_template.c
>  create mode 100644 lib/librte_ring/rte_ring_template.h
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
                   ` (5 preceding siblings ...)
  2019-08-28 15:12 ` [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Jerin Jacob Kollanukkaran
@ 2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
  2019-08-28 22:59   ` Honnappa Nagarahalli
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
  7 siblings, 1 reply; 173+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2019-08-28 15:16 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, yipeng1.wang, sameh.gobriel,
	bruce.richardson, pablo.de.lara.guarch
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd,
	Jerin Jacob Kollanukkaran

Hi Honnappa, 

Great idea I think we can replace duplicated implementation lib/librte_eventdev/rte_event_ring.h which uses
element sizeof 16B.
 There are already a couple of SW eventdevice drivers using event_ring.

Pavan.

>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa
>Nagarahalli
>Sent: Wednesday, August 28, 2019 8:16 PM
>To: olivier.matz@6wind.com; yipeng1.wang@intel.com;
>sameh.gobriel@intel.com; bruce.richardson@intel.com;
>pablo.de.lara.guarch@intel.com; honnappa.nagarahalli@arm.com
>Cc: dev@dpdk.org; dharmik.thakkar@arm.com; gavin.hu@arm.com;
>ruifeng.wang@arm.com; nd@arm.com
>Subject: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom
>element size
>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size
  2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
@ 2019-08-28 22:59   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-28 22:59 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, olivier.matz, yipeng1.wang,
	sameh.gobriel, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd, jerinj, Honnappa Nagarahalli, nd

<snip>

> Subject: RE: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom
> element size
> 
> Hi Honnappa,
> 
> Great idea I think we can replace duplicated implementation
> lib/librte_eventdev/rte_event_ring.h which uses element sizeof 16B.
>  There are already a couple of SW eventdevice drivers using event_ring.
Thank you Pavan. I will take a look and get back.

> 
> Pavan.
> 
> >-----Original Message-----
> >From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> >Sent: Wednesday, August 28, 2019 8:16 PM
> >To: olivier.matz@6wind.com; yipeng1.wang@intel.com;
> >sameh.gobriel@intel.com; bruce.richardson@intel.com;
> >pablo.de.lara.guarch@intel.com; honnappa.nagarahalli@arm.com
> >Cc: dev@dpdk.org; dharmik.thakkar@arm.com; gavin.hu@arm.com;
> >ruifeng.wang@arm.com; nd@arm.com
> >Subject: [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom
> >element size
> >


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size
  2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
                   ` (6 preceding siblings ...)
  2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
@ 2019-09-06 19:05 ` Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable " Honnappa Nagarahalli
                     ` (15 more replies)
  7 siblings, 16 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch consists of several parts:
1) New APIs to support configurable ring element size
   These will help reduce code duplication in the templates. I think these
   can be made internal (do not expose to DPDK applications, but expose to
   DPDK libraries), feedback needed.

2) rte_ring templates
   The templates provide an easy way to add new APIs for different ring
   element types/sizes which can be used by multiple libraries. These
   also allow for creating APIs to store elements of custom types
   (for ex: a structure)

   The template needs 4 parameters:
   a) RTE_RING_TMPLT_API_SUFFIX - This is used as a suffix to the
      rte_ring APIs.
      For ex: if RTE_RING_TMPLT_API_SUFFIX is '32b', the API name will be
      rte_ring_create_32b
   b) RTE_RING_TMPLT_ELEM_SIZE - Size of the ring element in bytes.
      For ex: sizeof(uint32_t)
   c) RTE_RING_TMPLT_ELEM_TYPE - Type of the ring element.
      For ex: uint32_t. If a common ring library does not use a standard
      data type, it should create its own type by defining a structure
      with standard data type. For ex: for an elment size of 96b, one
      could define a structure

      struct s_96b {
          uint32_t a[3];
      }
      The common library can use this structure to define
      RTE_RING_TMPLT_ELEM_TYPE.

      The application using this common ring library should define its
      element type as a union with the above structure.

      union app_element_type {
          struct s_96b v;
          struct app_element {
              uint16_t a;
              uint16_t b;
              uint32_t c;
              uint32_t d;
          }
      }
   d) RTE_RING_TMPLT_EXPERIMENTAL - Indicates if the new APIs being defined
      are experimental. Should be set to empty to remove the experimental
      tag.

   The ring library consists of some APIs that are defined as inline
   functions and some APIs that are non-inline functions. The non-inline
   functions are in rte_ring_template.c. However, this file needs to be
   included in other .c files. Any feedback on how to handle this is
   appreciated.

   Note that the templates help create the APIs that are dependent on the
   element size (for ex: rte_ring_create, enqueue/dequeue etc). Other APIs
   that do NOT depend on the element size do not need to be part of the
   template (for ex: rte_ring_dump, rte_ring_count, rte_ring_free_count
   etc).

3) APIs for 32b ring element size
   This uses the templates to create APIs to enqueue/dequeue elements of
   size 32b.

4) rte_hash libray is changed to use 32b ring APIs
   The 32b APIs are used in rte_hash library to store the free slot index
   and free bucket index.

5) Event Dev changed to use ring templates
   Event Dev defines its own 128b ring APIs using the templates. This helps
   in keeping the 'struct rte_event' as is. If Event Dev has to use generic
   128b ring APIs, it requires 'struct rte_event' to change to
   'union rte_event' to include a generic data type such as '__int128_t'.
   This breaks the API compatibility and results in large number of
   changes.
   With this change, the event rings are stored on rte_ring's tailq.
   Event Dev specific ring list is NOT available. IMO, this does not have
   any impact to the user.

This patch results in following checkpatch issue:
WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'

However, this patch is following the rules in the existing code. Please
let me know if this needs to be fixed.

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  lib/ring: apis to support configurable element size
  lib/ring: add template to support different element sizes
  tools/checkpatch: relax constraints on __rte_experimental
  lib/ring: add ring APIs to support 32b ring elements
  lib/hash: use ring with 32b element size to save memory
  lib/eventdev: use ring templates for event rings

 devtools/checkpatches.sh                  |  11 +-
 lib/librte_eventdev/Makefile              |   2 +
 lib/librte_eventdev/meson.build           |   2 +
 lib/librte_eventdev/rte_event_ring.c      | 146 +---------
 lib/librte_eventdev/rte_event_ring.h      |  41 +--
 lib/librte_eventdev/rte_event_ring_128b.c |  19 ++
 lib/librte_eventdev/rte_event_ring_128b.h |  44 +++
 lib/librte_hash/rte_cuckoo_hash.c         |  55 ++--
 lib/librte_hash/rte_cuckoo_hash.h         |   2 +-
 lib/librte_ring/Makefile                  |   9 +-
 lib/librte_ring/meson.build               |  11 +-
 lib/librte_ring/rte_ring.c                |  34 ++-
 lib/librte_ring/rte_ring.h                |  72 +++++
 lib/librte_ring/rte_ring_32.c             |  19 ++
 lib/librte_ring/rte_ring_32.h             |  36 +++
 lib/librte_ring/rte_ring_template.c       |  46 +++
 lib/librte_ring/rte_ring_template.h       | 330 ++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map      |   4 +
 18 files changed, 660 insertions(+), 223 deletions(-)
 create mode 100644 lib/librte_eventdev/rte_event_ring_128b.c
 create mode 100644 lib/librte_eventdev/rte_event_ring_128b.h
 create mode 100644 lib/librte_ring/rte_ring_32.c
 create mode 100644 lib/librte_ring/rte_ring_32.h
 create mode 100644 lib/librte_ring/rte_ring_template.c
 create mode 100644 lib/librte_ring/rte_ring_template.h

-- 
2.17.1

^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes Honnappa Nagarahalli
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. The new APIs
rte_ring_get_memsize_elem and rte_ring_create_elem help reduce code
duplication while creating rte_ring templates.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |  2 +-
 lib/librte_ring/meson.build          |  3 ++
 lib/librte_ring/rte_ring.c           | 34 +++++++++----
 lib/librte_ring/rte_ring.h           | 72 ++++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |  2 +
 5 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..4c8410229 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..74219840a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,3 +6,6 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..879feb9f6 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -46,23 +46,32 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, size_t esize)
 {
 	ssize_t sz;
 
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be "
+			"power of 2, and do not exceed the limit %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +123,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, size_t esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +144,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +191,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..bbc1202d3 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -122,6 +122,29 @@ struct rte_ring {
 #define __IS_SC 1
 #define __IS_MC 0
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of elements in the ring (recommended to be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned count, size_t esize);
+
 /**
  * Calculate the memory size needed for a ring
  *
@@ -175,6 +198,54 @@ ssize_t rte_ring_get_memsize(unsigned count);
 int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	unsigned flags);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of elements in the ring (recommended to be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
+				size_t esize, int socket_id, unsigned flags);
+
 /**
  * Create a new ring named *name* in memory.
  *
@@ -216,6 +287,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-08 19:44     ` Stephen Hemminger
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 3/6] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

Add templates to support creating ring APIs with different
ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile            |   4 +-
 lib/librte_ring/meson.build         |   4 +-
 lib/librte_ring/rte_ring_template.c |  46 ++++
 lib/librte_ring/rte_ring_template.h | 330 ++++++++++++++++++++++++++++
 4 files changed, 382 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_template.c
 create mode 100644 lib/librte_ring/rte_ring_template.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 4c8410229..818898110 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_template.h \
+					rte_ring_template.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 74219840a..e4e208a7c 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,9 @@ version = 2
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_template.h',
+		'rte_ring_template.c')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring_template.c b/lib/librte_ring/rte_ring_template.c
new file mode 100644
index 000000000..1ca593f95
--- /dev/null
+++ b/lib/librte_ring/rte_ring_template.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_launch.h>
+#include <rte_eal.h>
+#include <rte_eal_memconfig.h>
+#include <rte_atomic.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_branch_prediction.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_spinlock.h>
+#include <rte_tailq.h>
+
+#include "rte_ring.h"
+
+/* return the size of memory occupied by a ring */
+ssize_t
+__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, RTE_RING_TMPLT_ELEM_SIZE);
+}
+
+/* create the ring */
+struct rte_ring *
+__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
+		int socket_id, unsigned flags)
+{
+	return rte_ring_create_elem(name, count, RTE_RING_TMPLT_ELEM_SIZE,
+		socket_id, flags);
+}
diff --git a/lib/librte_ring/rte_ring_template.h b/lib/librte_ring/rte_ring_template.h
new file mode 100644
index 000000000..5002a7485
--- /dev/null
+++ b/lib/librte_ring/rte_ring_template.h
@@ -0,0 +1,330 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RING_TEMPLATE_H_
+#define _RTE_RING_TEMPLATE_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_ring.h>
+
+/* Ring API suffix name - used to append to API names */
+#ifndef RTE_RING_TMPLT_API_SUFFIX
+#error RTE_RING_TMPLT_API_SUFFIX not defined
+#endif
+
+/* Ring's element size in bits, should be a power of 2 */
+#ifndef RTE_RING_TMPLT_ELEM_SIZE
+#error RTE_RING_TMPLT_ELEM_SIZE not defined
+#endif
+
+/* Type of ring elements */
+#ifndef RTE_RING_TMPLT_ELEM_TYPE
+#error RTE_RING_TMPLT_ELEM_TYPE not defined
+#endif
+
+#define _rte_fuse(a, b) a##_##b
+#define __rte_fuse(a, b) _rte_fuse(a, b)
+#define __RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
+
+/* Calculate the memory size needed for a ring */
+RTE_RING_TMPLT_EXPERIMENTAL
+ssize_t __RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
+
+/* Create a new ring named *name* in memory. */
+RTE_RING_TMPLT_EXPERIMENTAL
+struct rte_ring *
+__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
+					int socket_id, unsigned flags);
+
+/**
+ * @internal Enqueue several objects on the ring
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
+		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
+		RTE_RING_TMPLT_ELEM_TYPE);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+	unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
+		RTE_RING_TMPLT_ELEM_TYPE);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj)
+{
+	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1, NULL) ?
+			0 : -ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ */
+static __rte_always_inline unsigned int
+__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ */
+static __rte_always_inline int
+__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
+{
+	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1, NULL) ?
+			0 : -ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
+	unsigned int *free_space)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ */
+static __rte_always_inline unsigned
+__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
+	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
+	unsigned int *available)
+{
+	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_TEMPLATE_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] tools/checkpatch: relax constraints on __rte_experimental
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 4/6] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

Relax the constraints on __rte_experimental usage, allow redefining
to macros.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 devtools/checkpatches.sh | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 560e6ce93..090c9b08a 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -99,9 +99,14 @@ check_experimental_tags() { # <patch>
 			ret = 1;
 		}
 		if ($1 != "+__rte_experimental" || $2 != "") {
-			print "__rte_experimental must appear alone on the line" \
-				" immediately preceding the return type of a function."
-			ret = 1;
+			# code such as "#define XYZ __rte_experimental" is
+			# allowed
+			if ($1 != "+#define") {
+				print "__rte_experimental must appear alone " \
+				      "on the line immediately preceding the " \
+				      "return type of a function."
+				ret = 1;
+			}
 		}
 	}
 	END {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] lib/ring: add ring APIs to support 32b ring elements
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (2 preceding siblings ...)
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 3/6] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

Add ring APIs to support 32b ring elements using templates.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |  3 ++-
 lib/librte_ring/meson.build          |  4 +++-
 lib/librte_ring/rte_ring_32.c        | 19 +++++++++++++++
 lib/librte_ring/rte_ring_32.h        | 36 ++++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |  2 ++
 5 files changed, 62 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_32.c
 create mode 100644 lib/librte_ring/rte_ring_32.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 818898110..3102bb64d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -14,10 +14,11 @@ EXPORT_MAP := rte_ring_version.map
 LIBABIVER := 2
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c rte_ring_32.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_32.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
 					rte_ring_template.h \
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index e4e208a7c..81ea53ed7 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -2,8 +2,10 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
-sources = files('rte_ring.c')
+sources = files('rte_ring.c',
+		'rte_ring_32.c')
 headers = files('rte_ring.h',
+		'rte_ring_32.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
 		'rte_ring_template.h',
diff --git a/lib/librte_ring/rte_ring_32.c b/lib/librte_ring/rte_ring_32.c
new file mode 100644
index 000000000..09e90cec1
--- /dev/null
+++ b/lib/librte_ring/rte_ring_32.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include <rte_ring_32.h>
+#include <rte_ring_template.c>
diff --git a/lib/librte_ring/rte_ring_32.h b/lib/librte_ring/rte_ring_32.h
new file mode 100644
index 000000000..5270a9bc7
--- /dev/null
+++ b/lib/librte_ring/rte_ring_32.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RING_32_H_
+#define _RTE_RING_32_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#define RTE_RING_TMPLT_API_SUFFIX 32
+#define RTE_RING_TMPLT_ELEM_SIZE sizeof(uint32_t)
+#define RTE_RING_TMPLT_ELEM_TYPE uint32_t
+#define RTE_RING_TMPLT_EXPERIMENTAL __rte_experimental
+
+#include <rte_ring_template.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_32_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index e410a7503..9efba91bb 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,7 +21,9 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_32;
 	rte_ring_create_elem;
+	rte_ring_get_memsize_32;
 	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] lib/hash: use ring with 32b element size to save memory
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (3 preceding siblings ...)
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 4/6] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 6/6] lib/eventdev: use ring templates for event rings Honnappa Nagarahalli
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 55 ++++++++++++++-----------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 26 insertions(+), 31 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..a0cd3360a 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_32.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -213,7 +213,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
+	r = rte_ring_create_32(ring_name, rte_align32pow2(num_key_slots),
 			params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
@@ -227,7 +227,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_32(ext_ring_name,
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -295,7 +295,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * for next bucket
 		 */
 		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_32(r_ext, i);
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -434,7 +434,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
 	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_32(r, i);
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +598,12 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_32(h->free_slots, i);
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, i);
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +622,13 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_32(h->free_slots, slot_id);
 }
 
 /* Search a key from bucket and update its data.
@@ -923,8 +922,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
+	uint32_t slot_id = 0;
+	uint32_t ext_bkt_id = 0;
 	uint32_t new_idx, bkt_id;
 	int ret;
 	unsigned n_slots;
@@ -968,7 +967,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_32(h->free_slots,
 					cached_free_slots->objs,
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
@@ -982,13 +981,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_32(h->free_slots, &slot_id) != 0)
 			return -ENOSPC;
-		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
+	new_idx = slot_id;
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1078,12 +1076,12 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_32(h->free_ext_bkts, &ext_bkt_id) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
+	bkt_id = ext_bkt_id - 1;
 	/* Use the first location of the new bucket */
 	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
@@ -1373,7 +1371,7 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_32(h->free_slots,
 						cached_free_slots->objs,
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
@@ -1383,11 +1381,10 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+				bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_32(h->free_slots, bkt->key_idx[i]);
 	}
 }
 
@@ -1551,7 +1548,7 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, index);
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1611,7 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_32(h->free_ext_bkts, index);
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1622,17 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_32(h->free_slots,
 						cached_free_slots->objs,
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_32(h->free_slots, key_idx);
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] lib/eventdev: use ring templates for event rings
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (4 preceding siblings ...)
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2019-09-06 19:05   ` Honnappa Nagarahalli
  2019-09-09 13:04   ` [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size Aaron Conole
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:05 UTC (permalink / raw)
  To: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch
  Cc: dev, pbhagavatula, jerinj, Honnappa Nagarahalli

Use rte_ring templates to define ring APIs for 128b ring element
type. However, the generic 128b ring APIs are not defined. Doing
so, results in changes to 'struct rte_event' which results in
API changes.

Suggested-by: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Suggested-by: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_eventdev/Makefile              |   2 +
 lib/librte_eventdev/meson.build           |   2 +
 lib/librte_eventdev/rte_event_ring.c      | 146 +---------------------
 lib/librte_eventdev/rte_event_ring.h      |  41 +-----
 lib/librte_eventdev/rte_event_ring_128b.c |  19 +++
 lib/librte_eventdev/rte_event_ring_128b.h |  44 +++++++
 6 files changed, 78 insertions(+), 176 deletions(-)
 create mode 100644 lib/librte_eventdev/rte_event_ring_128b.c
 create mode 100644 lib/librte_eventdev/rte_event_ring_128b.h

diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index cd3ff8040..4c76bbdf3 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -24,6 +24,7 @@ LDLIBS += -lrte_mbuf -lrte_cryptodev -lpthread
 
 # library source files
 SRCS-y += rte_eventdev.c
+SRCS-y += rte_event_ring_128b.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
 SRCS-y += rte_event_timer_adapter.c
@@ -35,6 +36,7 @@ SYMLINK-y-include += rte_eventdev.h
 SYMLINK-y-include += rte_eventdev_pmd.h
 SYMLINK-y-include += rte_eventdev_pmd_pci.h
 SYMLINK-y-include += rte_eventdev_pmd_vdev.h
+SYMLINK-y-include += rte_event_ring_128b.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
diff --git a/lib/librte_eventdev/meson.build b/lib/librte_eventdev/meson.build
index 19541f23f..8a0fd7332 100644
--- a/lib/librte_eventdev/meson.build
+++ b/lib/librte_eventdev/meson.build
@@ -11,6 +11,7 @@ else
 endif
 
 sources = files('rte_eventdev.c',
+		'rte_event_ring_128b.c',
 		'rte_event_ring.c',
 		'rte_event_eth_rx_adapter.c',
 		'rte_event_timer_adapter.c',
@@ -20,6 +21,7 @@ headers = files('rte_eventdev.h',
 		'rte_eventdev_pmd.h',
 		'rte_eventdev_pmd_pci.h',
 		'rte_eventdev_pmd_vdev.h',
+		'rte_event_ring_128b.h',
 		'rte_event_ring.h',
 		'rte_event_eth_rx_adapter.h',
 		'rte_event_timer_adapter.h',
diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
index 50190de01..479db53ea 100644
--- a/lib/librte_eventdev/rte_event_ring.c
+++ b/lib/librte_eventdev/rte_event_ring.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <sys/queue.h>
@@ -11,13 +12,6 @@
 #include <rte_eal_memconfig.h>
 #include "rte_event_ring.h"
 
-TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
-
-static struct rte_tailq_elem rte_event_ring_tailq = {
-	.name = RTE_TAILQ_EVENT_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_event_ring_tailq)
-
 int
 rte_event_ring_init(struct rte_event_ring *r, const char *name,
 	unsigned int count, unsigned int flags)
@@ -35,150 +29,20 @@ struct rte_event_ring *
 rte_event_ring_create(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_event_ring *r;
-	struct rte_tailq_entry *te;
-	const struct rte_memzone *mz;
-	ssize_t ring_size;
-	int mz_flags = 0;
-	struct rte_event_ring_list *ring_list = NULL;
-	const unsigned int requested_count = count;
-	int ret;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-		rte_event_ring_list);
-
-	/* for an exact size ring, round up from count to a power of two */
-	if (flags & RING_F_EXACT_SZ)
-		count = rte_align32pow2(count + 1);
-	else if (!rte_is_power_of_2(count)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
-
-	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
-		RTE_RING_MZ_PREFIX, name);
-	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-		rte_errno = ENAMETOOLONG;
-		return NULL;
-	}
-
-	te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-	if (te == NULL) {
-		RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	rte_mcfg_tailq_write_lock();
-
-	/*
-	 * reserve a memory zone for this ring. If we can't get rte_config or
-	 * we are secondary process, the memzone_reserve function will set
-	 * rte_errno for us appropriately - hence no check in this this function
-	 */
-	mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
-	if (mz != NULL) {
-		r = mz->addr;
-		/* Check return value in case rte_ring_init() fails on size */
-		int err = rte_event_ring_init(r, name, requested_count, flags);
-		if (err) {
-			RTE_LOG(ERR, RING, "Ring init failed\n");
-			if (rte_memzone_free(mz) != 0)
-				RTE_LOG(ERR, RING, "Cannot free memzone\n");
-			rte_free(te);
-			rte_mcfg_tailq_write_unlock();
-			return NULL;
-		}
-
-		te->data = (void *) r;
-		r->r.memzone = mz;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-	} else {
-		r = NULL;
-		RTE_LOG(ERR, RING, "Cannot reserve memory\n");
-		rte_free(te);
-	}
-	rte_mcfg_tailq_write_unlock();
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_create_event_128b(name, count,
+						socket_id, flags);
 }
 
 
 struct rte_event_ring *
 rte_event_ring_lookup(const char *name)
 {
-	struct rte_tailq_entry *te;
-	struct rte_event_ring *r = NULL;
-	struct rte_event_ring_list *ring_list;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-
-	rte_mcfg_tailq_read_lock();
-
-	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_event_ring *) te->data;
-		if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
-			break;
-	}
-
-	rte_mcfg_tailq_read_unlock();
-
-	if (te == NULL) {
-		rte_errno = ENOENT;
-		return NULL;
-	}
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_lookup(name);
 }
 
 /* free the ring */
 void
 rte_event_ring_free(struct rte_event_ring *r)
 {
-	struct rte_event_ring_list *ring_list = NULL;
-	struct rte_tailq_entry *te;
-
-	if (r == NULL)
-		return;
-
-	/*
-	 * Ring was not created with rte_event_ring_create,
-	 * therefore, there is no memzone to free.
-	 */
-	if (r->r.memzone == NULL) {
-		RTE_LOG(ERR, RING,
-			"Cannot free ring (not created with rte_event_ring_create()");
-		return;
-	}
-
-	if (rte_memzone_free(r->r.memzone) != 0) {
-		RTE_LOG(ERR, RING, "Cannot free memory\n");
-		return;
-	}
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-	rte_mcfg_tailq_write_lock();
-
-	/* find out tailq entry */
-	TAILQ_FOREACH(te, ring_list, next) {
-		if (te->data == (void *) r)
-			break;
-	}
-
-	if (te == NULL) {
-		rte_mcfg_tailq_write_unlock();
-		return;
-	}
-
-	TAILQ_REMOVE(ring_list, te, next);
-
-	rte_mcfg_tailq_write_unlock();
-
-	rte_free(te);
+	rte_ring_free(&r->r);
 }
diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
index 827a3209e..4553c0076 100644
--- a/lib/librte_eventdev/rte_event_ring.h
+++ b/lib/librte_eventdev/rte_event_ring.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2016-2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 /**
@@ -20,8 +21,7 @@
 #include <rte_malloc.h>
 #include <rte_ring.h>
 #include "rte_eventdev.h"
-
-#define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
+#include "rte_event_ring_128b.h"
 
 /**
  * Generic ring structure for passing rte_event objects from core to core.
@@ -88,22 +88,8 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
 		const struct rte_event *events,
 		unsigned int n, uint16_t *free_space)
 {
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
-
-	n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
-
-	ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
-
-	update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
-end:
-	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
+	return rte_ring_enqueue_burst_event_128b(&r->r, events, n,
+							(uint32_t *)free_space);
 }
 
 /**
@@ -129,23 +115,8 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
 		struct rte_event *events,
 		unsigned int n, uint16_t *available)
 {
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
-
-	DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
-
-	update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
-
-end:
-	if (available != NULL)
-		*available = entries - n;
-	return n;
+	return rte_ring_dequeue_burst_event_128b(&r->r, events, n,
+							(uint32_t *)available);
 }
 
 /*
diff --git a/lib/librte_eventdev/rte_event_ring_128b.c b/lib/librte_eventdev/rte_event_ring_128b.c
new file mode 100644
index 000000000..5e4105a2f
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_ring_128b.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include <rte_event_ring_128b.h>
+#include <rte_ring_template.c>
diff --git a/lib/librte_eventdev/rte_event_ring_128b.h b/lib/librte_eventdev/rte_event_ring_128b.h
new file mode 100644
index 000000000..3079d7b49
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_ring_128b.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_EVENT_RING_128_H_
+#define _RTE_EVENT_RING_128_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include "rte_eventdev.h"
+
+/* Event ring will use its own template. Otherwise, the 'struct rte_event'
+ * needs to change to 'union rte_event' to include a standard 128b data type
+ * such as __int128_t which results in API changes.
+ *
+ * The RTE_RING_TMPLT_API_SUFFIX cannot be just '128b' as that will be
+ * used for standard 128b element type APIs defined by the rte_ring library.
+ */
+#define RTE_RING_TMPLT_API_SUFFIX event_128b
+#define RTE_RING_TMPLT_ELEM_SIZE sizeof(struct rte_event)
+#define RTE_RING_TMPLT_ELEM_TYPE struct rte_event
+#define RTE_RING_TMPLT_EXPERIMENTAL
+
+#include <rte_ring_template.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_EVENT_RING_128_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes Honnappa Nagarahalli
@ 2019-09-08 19:44     ` Stephen Hemminger
  2019-09-09  9:01       ` Bruce Richardson
  0 siblings, 1 reply; 173+ messages in thread
From: Stephen Hemminger @ 2019-09-08 19:44 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, dev, pbhagavatula, jerinj

On Fri,  6 Sep 2019 14:05:06 -0500
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> wrote:

> Add templates to support creating ring APIs with different
> ring element sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Understand the desire for generic code, but macro's are much harder to maintain
and debug. Would it be possible to use inline code taking a size argument
and let compiler optimizations with constant folding do the same thing.

Sorry, I vote NO for large scale use of macro's.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes
  2019-09-08 19:44     ` Stephen Hemminger
@ 2019-09-09  9:01       ` Bruce Richardson
  2019-09-09 22:33         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Bruce Richardson @ 2019-09-09  9:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Honnappa Nagarahalli, olivier.matz, yipeng1.wang, sameh.gobriel,
	pablo.de.lara.guarch, dev, pbhagavatula, jerinj

On Sun, Sep 08, 2019 at 08:44:36PM +0100, Stephen Hemminger wrote:
> On Fri,  6 Sep 2019 14:05:06 -0500
> Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> wrote:
> 
> > Add templates to support creating ring APIs with different
> > ring element sizes.
> > 
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> Understand the desire for generic code, but macro's are much harder to maintain
> and debug. Would it be possible to use inline code taking a size argument
> and let compiler optimizations with constant folding do the same thing.
> 
> Sorry, I vote NO for large scale use of macro's.

I would tend to agree. This use of macros makes the code very awkward to
read and understand.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (5 preceding siblings ...)
  2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 6/6] lib/eventdev: use ring templates for event rings Honnappa Nagarahalli
@ 2019-09-09 13:04   ` Aaron Conole
  2019-10-07 13:49   ` David Marchand
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: Aaron Conole @ 2019-09-09 13:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, yipeng1.wang, sameh.gobriel, bruce.richardson,
	pablo.de.lara.guarch, dev, pbhagavatula, jerinj

Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> writes:

> The current rte_ring hard-codes the type of the ring element to 'void *',
> hence the size of the element is hard-coded to 32b/64b. Since the ring
> element type is not an input to rte_ring APIs, it results in couple
> of issues:
>
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
>
> This patch consists of several parts:
> 1) New APIs to support configurable ring element size
>    These will help reduce code duplication in the templates. I think these
>    can be made internal (do not expose to DPDK applications, but expose to
>    DPDK libraries), feedback needed.
>
> 2) rte_ring templates
>    The templates provide an easy way to add new APIs for different ring
>    element types/sizes which can be used by multiple libraries. These
>    also allow for creating APIs to store elements of custom types
>    (for ex: a structure)
>
>    The template needs 4 parameters:
>    a) RTE_RING_TMPLT_API_SUFFIX - This is used as a suffix to the
>       rte_ring APIs.
>       For ex: if RTE_RING_TMPLT_API_SUFFIX is '32b', the API name will be
>       rte_ring_create_32b
>    b) RTE_RING_TMPLT_ELEM_SIZE - Size of the ring element in bytes.
>       For ex: sizeof(uint32_t)
>    c) RTE_RING_TMPLT_ELEM_TYPE - Type of the ring element.
>       For ex: uint32_t. If a common ring library does not use a standard
>       data type, it should create its own type by defining a structure
>       with standard data type. For ex: for an elment size of 96b, one
>       could define a structure
>
>       struct s_96b {
>           uint32_t a[3];
>       }
>       The common library can use this structure to define
>       RTE_RING_TMPLT_ELEM_TYPE.
>
>       The application using this common ring library should define its
>       element type as a union with the above structure.
>
>       union app_element_type {
>           struct s_96b v;
>           struct app_element {
>               uint16_t a;
>               uint16_t b;
>               uint32_t c;
>               uint32_t d;
>           }
>       }
>    d) RTE_RING_TMPLT_EXPERIMENTAL - Indicates if the new APIs being defined
>       are experimental. Should be set to empty to remove the experimental
>       tag.
>
>    The ring library consists of some APIs that are defined as inline
>    functions and some APIs that are non-inline functions. The non-inline
>    functions are in rte_ring_template.c. However, this file needs to be
>    included in other .c files. Any feedback on how to handle this is
>    appreciated.
>
>    Note that the templates help create the APIs that are dependent on the
>    element size (for ex: rte_ring_create, enqueue/dequeue etc). Other APIs
>    that do NOT depend on the element size do not need to be part of the
>    template (for ex: rte_ring_dump, rte_ring_count, rte_ring_free_count
>    etc).
>
> 3) APIs for 32b ring element size
>    This uses the templates to create APIs to enqueue/dequeue elements of
>    size 32b.
>
> 4) rte_hash libray is changed to use 32b ring APIs
>    The 32b APIs are used in rte_hash library to store the free slot index
>    and free bucket index.
>
> 5) Event Dev changed to use ring templates
>    Event Dev defines its own 128b ring APIs using the templates. This helps
>    in keeping the 'struct rte_event' as is. If Event Dev has to use generic
>    128b ring APIs, it requires 'struct rte_event' to change to
>    'union rte_event' to include a generic data type such as '__int128_t'.
>    This breaks the API compatibility and results in large number of
>    changes.
>    With this change, the event rings are stored on rte_ring's tailq.
>    Event Dev specific ring list is NOT available. IMO, this does not have
>    any impact to the user.
>
> This patch results in following checkpatch issue:
> WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
>
> However, this patch is following the rules in the existing code. Please
> let me know if this needs to be fixed.
>
> v2
>  - Change Event Ring implementation to use ring templates
>    (Jerin, Pavan)

Since you'll likely be spinning a v3 (to switch off the macroization),
this series seems to have some unit test failures:

   24/82 DPDK:fast-tests / event_ring_autotest   FAIL     0.12 s (exit status 255 or signal 127 SIGinvalid)
   --- command ---
   DPDK_TEST='event_ring_autotest' /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 --file-prefix=event_ring_autotest
   --- stdout ---
   EAL: Probing VFIO support...
   APP: HPET is not enabled, using TSC as default timer
   RTE>>event_ring_autotest
   RING: Requested number of elements is invalid, must be power of 2, and do not exceed the limit 2147483647
   Test detected odd count
   Test detected NULL ring lookup
   RING: Requested number of elements is invalid, must be power of 2, and do not exceed the limit 2147483647
   RING: Requested number of elements is invalid, must be power of 2, and do not exceed the limit 2147483647
   Error, status after enqueue is unexpected
   Test Failed
   RTE>>
   --- stderr ---
   EAL: Detected 2 lcore(s)
   EAL: Detected 1 NUMA nodes
   EAL: Multi-process socket /var/run/dpdk/event_ring_autotest/mp_socket
   EAL: Selected IOVA mode 'PA'
   EAL: No available hugepages reported in hugepages-1048576kB
   -------

Please double check.  Seems to only happen with clang/llvm.

> Honnappa Nagarahalli (6):
>   lib/ring: apis to support configurable element size
>   lib/ring: add template to support different element sizes
>   tools/checkpatch: relax constraints on __rte_experimental
>   lib/ring: add ring APIs to support 32b ring elements
>   lib/hash: use ring with 32b element size to save memory
>   lib/eventdev: use ring templates for event rings
>
>  devtools/checkpatches.sh                  |  11 +-
>  lib/librte_eventdev/Makefile              |   2 +
>  lib/librte_eventdev/meson.build           |   2 +
>  lib/librte_eventdev/rte_event_ring.c      | 146 +---------
>  lib/librte_eventdev/rte_event_ring.h      |  41 +--
>  lib/librte_eventdev/rte_event_ring_128b.c |  19 ++
>  lib/librte_eventdev/rte_event_ring_128b.h |  44 +++
>  lib/librte_hash/rte_cuckoo_hash.c         |  55 ++--
>  lib/librte_hash/rte_cuckoo_hash.h         |   2 +-
>  lib/librte_ring/Makefile                  |   9 +-
>  lib/librte_ring/meson.build               |  11 +-
>  lib/librte_ring/rte_ring.c                |  34 ++-
>  lib/librte_ring/rte_ring.h                |  72 +++++
>  lib/librte_ring/rte_ring_32.c             |  19 ++
>  lib/librte_ring/rte_ring_32.h             |  36 +++
>  lib/librte_ring/rte_ring_template.c       |  46 +++
>  lib/librte_ring/rte_ring_template.h       | 330 ++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map      |   4 +
>  18 files changed, 660 insertions(+), 223 deletions(-)
>  create mode 100644 lib/librte_eventdev/rte_event_ring_128b.c
>  create mode 100644 lib/librte_eventdev/rte_event_ring_128b.h
>  create mode 100644 lib/librte_ring/rte_ring_32.c
>  create mode 100644 lib/librte_ring/rte_ring_32.h
>  create mode 100644 lib/librte_ring/rte_ring_template.c
>  create mode 100644 lib/librte_ring/rte_ring_template.h

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes
  2019-09-09  9:01       ` Bruce Richardson
@ 2019-09-09 22:33         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-09 22:33 UTC (permalink / raw)
  To: Bruce Richardson, Stephen Hemminger
  Cc: olivier.matz, yipeng1.wang, sameh.gobriel, pablo.de.lara.guarch,
	dev, pbhagavatula, jerinj, Honnappa Nagarahalli, nd, nd


> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, September 9, 2019 4:01 AM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> olivier.matz@6wind.com; yipeng1.wang@intel.com;
> sameh.gobriel@intel.com; pablo.de.lara.guarch@intel.com; dev@dpdk.org;
> pbhagavatula@marvell.com; jerinj@marvell.com
> Subject: Re: [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support
> different element sizes
> 
> On Sun, Sep 08, 2019 at 08:44:36PM +0100, Stephen Hemminger wrote:
> > On Fri,  6 Sep 2019 14:05:06 -0500
> > Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> wrote:
> >
> > > Add templates to support creating ring APIs with different ring
> > > element sizes.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > Understand the desire for generic code, but macro's are much harder to
> > maintain and debug. Would it be possible to use inline code taking a
> > size argument and let compiler optimizations with constant folding do the
> same thing.
> >
> > Sorry, I vote NO for large scale use of macro's.
> 
> I would tend to agree. This use of macros makes the code very awkward to
> read and understand.
Stephen, Bruce,  thank you for your feedback. Looks like we at least have an agreement on the problem definition, hopefully we can find a solution. I will try to rework this and get back with solutions/problems.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-08-28 14:46 ` [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes Honnappa Nagarahalli
@ 2019-10-01 11:47   ` Ananyev, Konstantin
  2019-10-02  4:21     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-01 11:47 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, Wang, Yipeng1, Gobriel,
	Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, dharmik.thakkar, gavin.hu, ruifeng.wang, nd



> 
> 
> Add templates to support creating ring APIs with different
> ring element sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/Makefile            |   4 +-
>  lib/librte_ring/meson.build         |   4 +-
>  lib/librte_ring/rte_ring_template.c |  46 ++++
>  lib/librte_ring/rte_ring_template.h | 330 ++++++++++++++++++++++++++++
>  4 files changed, 382 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_ring/rte_ring_template.c
>  create mode 100644 lib/librte_ring/rte_ring_template.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> index 4c8410229..818898110 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
>  					rte_ring_generic.h \
> -					rte_ring_c11_mem.h
> +					rte_ring_c11_mem.h \
> +					rte_ring_template.h \
> +					rte_ring_template.c
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
> index 74219840a..e4e208a7c 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -5,7 +5,9 @@ version = 2
>  sources = files('rte_ring.c')
>  headers = files('rte_ring.h',
>  		'rte_ring_c11_mem.h',
> -		'rte_ring_generic.h')
> +		'rte_ring_generic.h',
> +		'rte_ring_template.h',
> +		'rte_ring_template.c')
> 
>  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
>  allow_experimental_apis = true
> diff --git a/lib/librte_ring/rte_ring_template.c b/lib/librte_ring/rte_ring_template.c
> new file mode 100644
> index 000000000..1ca593f95
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_template.c
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <sys/queue.h>
> +
> +#include <rte_common.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_launch.h>
> +#include <rte_eal.h>
> +#include <rte_eal_memconfig.h>
> +#include <rte_atomic.h>
> +#include <rte_per_lcore.h>
> +#include <rte_lcore.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>
> +#include <rte_spinlock.h>
> +#include <rte_tailq.h>
> +
> +#include "rte_ring.h"
> +
> +/* return the size of memory occupied by a ring */
> +ssize_t
> +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count)
> +{
> +	return rte_ring_get_memsize_elem(count, RTE_RING_TMPLT_ELEM_SIZE);
> +}
> +
> +/* create the ring */
> +struct rte_ring *
> +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> +		int socket_id, unsigned flags)
> +{
> +	return rte_ring_create_elem(name, count, RTE_RING_TMPLT_ELEM_SIZE,
> +		socket_id, flags);
> +}
> diff --git a/lib/librte_ring/rte_ring_template.h b/lib/librte_ring/rte_ring_template.h
> new file mode 100644
> index 000000000..b9b14dfbb
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_template.h
> @@ -0,0 +1,330 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RING_TEMPLATE_H_
> +#define _RTE_RING_TEMPLATE_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <sys/queue.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +#include <rte_ring.h>
> +
> +/* Ring API suffix name - used to append to API names */
> +#ifndef RTE_RING_TMPLT_API_SUFFIX
> +#error RTE_RING_TMPLT_API_SUFFIX not defined
> +#endif
> +
> +/* Ring's element size in bits, should be a power of 2 */
> +#ifndef RTE_RING_TMPLT_ELEM_SIZE
> +#error RTE_RING_TMPLT_ELEM_SIZE not defined
> +#endif
> +
> +/* Type of ring elements */
> +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> +#error RTE_RING_TMPLT_ELEM_TYPE not defined
> +#endif
> +
> +#define _rte_fuse(a, b) a##_##b
> +#define __rte_fuse(a, b) _rte_fuse(a, b)
> +#define __RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> +
> +/* Calculate the memory size needed for a ring */
> +RTE_RING_TMPLT_EXPERIMENTAL
> +ssize_t __RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> +
> +/* Create a new ring named *name* in memory. */
> +RTE_RING_TMPLT_EXPERIMENTAL
> +struct rte_ring *
> +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> +					int socket_id, unsigned flags);


Just an idea - probably same thing can be achieved in a different way.
Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS macros
with static inline functions and then make all internal functions, i.e. __rte_ring_do_dequeue()
to accept enqueue/dequeue function pointer as a parameter.
Then let say default rte_ring_mc_dequeue_bulk will do:

rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
                unsigned int n, unsigned int *available)
{
        return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
                        __IS_MC, available, dequeue_ptr_default);
}

Then if someone will like to define ring functions forelt_size==X, all he would need to do:
1. define his own enqueue/dequeuer functions.
2. do something like:
rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
                unsigned int n, unsigned int *available)
{
        return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
                        __IS_MC, available, dequeue_X);
}

Konstantin


> +
> +/**
> + * @internal Enqueue several objects on the ring
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
> +		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> +		unsigned int *free_space)
> +{
> +	uint32_t prod_head, prod_next;
> +	uint32_t free_entries;
> +
> +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> +			&prod_head, &prod_next, &free_entries);
> +	if (n == 0)
> +		goto end;
> +
> +	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
> +		RTE_RING_TMPLT_ELEM_TYPE);
> +
> +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> +end:
> +	if (free_space != NULL)
> +		*free_space = free_entries - n;
> +	return n;
> +}
> +
> +/**
> + * @internal Dequeue several objects from the ring
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
> +	unsigned int *available)
> +{
> +	uint32_t cons_head, cons_next;
> +	uint32_t entries;
> +
> +	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> +			&cons_head, &cons_next, &entries);
> +	if (n == 0)
> +		goto end;
> +
> +	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
> +		RTE_RING_TMPLT_ELEM_TYPE);
> +
> +	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> +
> +end:
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
> +
> +/**
> + * Enqueue several objects on the ring (multi-producers safe).
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> +	unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
> +}
> +
> +/**
> + * Enqueue several objects on a ring (NOT multi-producers safe).
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> +	unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
> +}
> +
> +/**
> + * Enqueue several objects on a ring.
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> +	unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
> +}
> +
> +/**
> + * Enqueue one object on a ring (multi-producers safe).
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE obj)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1, NULL) ?
> +			0 : -ENOBUFS;
> +}
> +
> +/**
> + * Enqueue one object on a ring (NOT multi-producers safe).
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE obj)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1, NULL) ?
> +			0 : -ENOBUFS;
> +}
> +
> +/**
> + * Enqueue one object on a ring.
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1, NULL) ?
> +			0 : -ENOBUFS;
> +}
> +
> +/**
> + * Dequeue several objects from a ring (multi-consumers safe).
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, __IS_MC, available);
> +}
> +
> +/**
> + * Dequeue several objects from a ring (NOT multi-consumers safe).
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, __IS_SC, available);
> +}
> +
> +/**
> + * Dequeue several objects from a ring.
> + */
> +static __rte_always_inline unsigned int
> +__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +			RTE_RING_QUEUE_FIXED, r->cons.single, available);
> +}
> +
> +/**
> + * Dequeue one object from a ring (multi-consumers safe).
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1, NULL) ?
> +			0 : -ENOENT;
> +}
> +
> +/**
> + * Dequeue one object from a ring (NOT multi-consumers safe).
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1, NULL) ?
> +			0 : -ENOENT;
> +}
> +
> +/**
> + * Dequeue one object from a ring.
> + */
> +static __rte_always_inline int
> +__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> +{
> +	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1, NULL) ?
> +			0 : -ENOENT;
> +}
> +
> +/**
> + * Enqueue several objects on the ring (multi-producers safe).
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> +			 unsigned int n, unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> +}
> +
> +/**
> + * Enqueue several objects on a ring (NOT multi-producers safe).
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> +			 unsigned int n, unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> +}
> +
> +/**
> + * Enqueue several objects on a ring.
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *free_space)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
> +}
> +
> +/**
> + * Dequeue several objects from a ring (multi-consumers safe). When the request
> + * objects are more than the available objects, only dequeue the actual number
> + * of objects
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> +}
> +
> +/**
> + * Dequeue several objects from a ring (NOT multi-consumers safe).When the
> + * request objects are more than the available objects, only dequeue the
> + * actual number of objects
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> +}
> +
> +/**
> + * Dequeue multiple objects from a ring up to a maximum number.
> + */
> +static __rte_always_inline unsigned
> +__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
> +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> +	unsigned int *available)
> +{
> +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> +				RTE_RING_QUEUE_VARIABLE,
> +				r->cons.single, available);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_TEMPLATE_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-01 11:47   ` Ananyev, Konstantin
@ 2019-10-02  4:21     ` Honnappa Nagarahalli
  2019-10-02  8:39       ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-02  4:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, Wang, Yipeng1, Gobriel, Sameh,
	Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

> > Add templates to support creating ring APIs with different ring
> > element sizes.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/Makefile            |   4 +-
> >  lib/librte_ring/meson.build         |   4 +-
> >  lib/librte_ring/rte_ring_template.c |  46 ++++
> > lib/librte_ring/rte_ring_template.h | 330 ++++++++++++++++++++++++++++
> >  4 files changed, 382 insertions(+), 2 deletions(-)  create mode
> > 100644 lib/librte_ring/rte_ring_template.c
> >  create mode 100644 lib/librte_ring/rte_ring_template.h
> >
> > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > 4c8410229..818898110 100644
> > --- a/lib/librte_ring/Makefile
> > +++ b/lib/librte_ring/Makefile
> > @@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c  #
> > install includes  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> > rte_ring.h \
> >  					rte_ring_generic.h \
> > -					rte_ring_c11_mem.h
> > +					rte_ring_c11_mem.h \
> > +					rte_ring_template.h \
> > +					rte_ring_template.c
> >
> >  include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
> > index 74219840a..e4e208a7c 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -5,7 +5,9 @@ version = 2
> >  sources = files('rte_ring.c')
> >  headers = files('rte_ring.h',
> >  		'rte_ring_c11_mem.h',
> > -		'rte_ring_generic.h')
> > +		'rte_ring_generic.h',
> > +		'rte_ring_template.h',
> > +		'rte_ring_template.c')
> >
> >  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> > allow_experimental_apis = true diff --git
> > a/lib/librte_ring/rte_ring_template.c
> > b/lib/librte_ring/rte_ring_template.c
> > new file mode 100644
> > index 000000000..1ca593f95
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_template.c
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdarg.h>
> > +#include <string.h>
> > +#include <stdint.h>
> > +#include <inttypes.h>
> > +#include <errno.h>
> > +#include <sys/queue.h>
> > +
> > +#include <rte_common.h>
> > +#include <rte_log.h>
> > +#include <rte_memory.h>
> > +#include <rte_memzone.h>
> > +#include <rte_malloc.h>
> > +#include <rte_launch.h>
> > +#include <rte_eal.h>
> > +#include <rte_eal_memconfig.h>
> > +#include <rte_atomic.h>
> > +#include <rte_per_lcore.h>
> > +#include <rte_lcore.h>
> > +#include <rte_branch_prediction.h>
> > +#include <rte_errno.h>
> > +#include <rte_string_fns.h>
> > +#include <rte_spinlock.h>
> > +#include <rte_tailq.h>
> > +
> > +#include "rte_ring.h"
> > +
> > +/* return the size of memory occupied by a ring */ ssize_t
> > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count) {
> > +	return rte_ring_get_memsize_elem(count,
> RTE_RING_TMPLT_ELEM_SIZE); }
> > +
> > +/* create the ring */
> > +struct rte_ring *
> > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> > +		int socket_id, unsigned flags)
> > +{
> > +	return rte_ring_create_elem(name, count,
> RTE_RING_TMPLT_ELEM_SIZE,
> > +		socket_id, flags);
> > +}
> > diff --git a/lib/librte_ring/rte_ring_template.h
> > b/lib/librte_ring/rte_ring_template.h
> > new file mode 100644
> > index 000000000..b9b14dfbb
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_template.h
> > @@ -0,0 +1,330 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RING_TEMPLATE_H_
> > +#define _RTE_RING_TEMPLATE_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdio.h>
> > +#include <stdint.h>
> > +#include <sys/queue.h>
> > +#include <errno.h>
> > +#include <rte_common.h>
> > +#include <rte_config.h>
> > +#include <rte_memory.h>
> > +#include <rte_lcore.h>
> > +#include <rte_atomic.h>
> > +#include <rte_branch_prediction.h>
> > +#include <rte_memzone.h>
> > +#include <rte_pause.h>
> > +#include <rte_ring.h>
> > +
> > +/* Ring API suffix name - used to append to API names */ #ifndef
> > +RTE_RING_TMPLT_API_SUFFIX #error RTE_RING_TMPLT_API_SUFFIX not
> > +defined #endif
> > +
> > +/* Ring's element size in bits, should be a power of 2 */ #ifndef
> > +RTE_RING_TMPLT_ELEM_SIZE #error RTE_RING_TMPLT_ELEM_SIZE not
> defined
> > +#endif
> > +
> > +/* Type of ring elements */
> > +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> > +#error RTE_RING_TMPLT_ELEM_TYPE not defined #endif
> > +
> > +#define _rte_fuse(a, b) a##_##b
> > +#define __rte_fuse(a, b) _rte_fuse(a, b) #define __RTE_RING_CONCAT(a)
> > +__rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> > +
> > +/* Calculate the memory size needed for a ring */
> > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > +
> > +/* Create a new ring named *name* in memory. */
> > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> > +					int socket_id, unsigned flags);
> 
> 
> Just an idea - probably same thing can be achieved in a different way.
> Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS macros
> with static inline functions and then make all internal functions, i.e.
> __rte_ring_do_dequeue()
> to accept enqueue/dequeue function pointer as a parameter.
> Then let say default rte_ring_mc_dequeue_bulk will do:
> 
> rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>                 unsigned int n, unsigned int *available)
> {
>         return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
>                         __IS_MC, available, dequeue_ptr_default);
> }
> 
> Then if someone will like to define ring functions forelt_size==X, all he would
> need to do:
> 1. define his own enqueue/dequeuer functions.
> 2. do something like:
> rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>                 unsigned int n, unsigned int *available)
> {
>         return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
>                         __IS_MC, available, dequeue_X);
> }
> 
> Konstantin
Thanks for the feedback/idea. The goal of this patch was to make it simple enough to define APIs to store any element size without code duplication. With this patch, the user has to write ~4 lines of code to get APIs for any element size. I would like to keep the goal still the same.

If we have to avoid the macro-fest, the main problem that needs to be addressed is - how to represent different sizes of element types in a generic way? IMO, we can do this by defining the element type to be a multiple of uint32_t (I do not think we need to go to uint16_t).

For ex:
rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
                uint32_t *obj_table, unsigned int num_objs,
                unsigned int n,
                enum rte_ring_queue_behavior behavior, unsigned int is_sp,
                unsigned int *free_space)
{
}

This approach would ensure that we have generic enough APIs and they can be used for elements of any size. But the element itself needs to be a multiple of 32b - I think this should not be a concern.

The API suffix definitely needs to be better, any suggestions?

> 
> 
> > +
> > +/**
> > + * @internal Enqueue several objects on the ring
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
> > +		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int
> n,
> > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > +		unsigned int *free_space)
> > +{
> > +	uint32_t prod_head, prod_next;
> > +	uint32_t free_entries;
> > +
> > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > +			&prod_head, &prod_next, &free_entries);
> > +	if (n == 0)
> > +		goto end;
> > +
> > +	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
> > +		RTE_RING_TMPLT_ELEM_TYPE);
> > +
> > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > +end:
> > +	if (free_space != NULL)
> > +		*free_space = free_entries - n;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal Dequeue several objects from the ring
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
> > +	unsigned int *available)
> > +{
> > +	uint32_t cons_head, cons_next;
> > +	uint32_t entries;
> > +
> > +	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > +			&cons_head, &cons_next, &entries);
> > +	if (n == 0)
> > +		goto end;
> > +
> > +	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
> > +		RTE_RING_TMPLT_ELEM_TYPE);
> > +
> > +	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > +
> > +end:
> > +	if (available != NULL)
> > +		*available = entries - n;
> > +	return n;
> > +}
> > +
> > +
> > +/**
> > + * Enqueue several objects on the ring (multi-producers safe).
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > +	unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > +	unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue several objects on a ring.
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > +	unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ring (multi-producers safe).
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1,
> NULL) ?
> > +			0 : -ENOBUFS;
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ring (NOT multi-producers safe).
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1,
> NULL) ?
> > +			0 : -ENOBUFS;
> > +}
> > +
> > +/**
> > + * Enqueue one object on a ring.
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1,
> NULL) ?
> > +			0 : -ENOBUFS;
> > +}
> > +
> > +/**
> > + * Dequeue several objects from a ring (multi-consumers safe).
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, __IS_MC, available);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from a ring (NOT multi-consumers safe).
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, __IS_SC, available);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from a ring.
> > + */
> > +static __rte_always_inline unsigned int
> > +__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_FIXED, r->cons.single, available);
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ring (multi-consumers safe).
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1,
> NULL) ?
> > +			0 : -ENOENT;
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ring (NOT multi-consumers safe).
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1,
> NULL) ?
> > +			0 : -ENOENT;
> > +}
> > +
> > +/**
> > + * Dequeue one object from a ring.
> > + */
> > +static __rte_always_inline int
> > +__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > +{
> > +	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1,
> NULL) ?
> > +			0 : -ENOENT;
> > +}
> > +
> > +/**
> > + * Enqueue several objects on the ring (multi-producers safe).
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > +			 unsigned int n, unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > +			 unsigned int n, unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> > +}
> > +
> > +/**
> > + * Enqueue several objects on a ring.
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *free_space)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> free_space);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from a ring (multi-consumers safe). When the
> request
> > + * objects are more than the available objects, only dequeue the actual
> number
> > + * of objects
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from a ring (NOT multi-consumers safe).When
> the
> > + * request objects are more than the available objects, only dequeue the
> > + * actual number of objects
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> > +}
> > +
> > +/**
> > + * Dequeue multiple objects from a ring up to a maximum number.
> > + */
> > +static __rte_always_inline unsigned
> > +__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
> > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > +	unsigned int *available)
> > +{
> > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > +				RTE_RING_QUEUE_VARIABLE,
> > +				r->cons.single, available);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RING_TEMPLATE_H_ */
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-02  4:21     ` Honnappa Nagarahalli
@ 2019-10-02  8:39       ` Ananyev, Konstantin
  2019-10-03  3:33         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02  8:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, Wang, Yipeng1, Gobriel,
	Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Wednesday, October 2, 2019 5:22 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; olivier.matz@6wind.com; Wang, Yipeng1 <yipeng1.wang@intel.com>; Gobriel,
> Sameh <sameh.gobriel@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Cc: dev@dpdk.org; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Ruifeng
> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
> 
> > > Add templates to support creating ring APIs with different ring
> > > element sizes.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/Makefile            |   4 +-
> > >  lib/librte_ring/meson.build         |   4 +-
> > >  lib/librte_ring/rte_ring_template.c |  46 ++++
> > > lib/librte_ring/rte_ring_template.h | 330 ++++++++++++++++++++++++++++
> > >  4 files changed, 382 insertions(+), 2 deletions(-)  create mode
> > > 100644 lib/librte_ring/rte_ring_template.c
> > >  create mode 100644 lib/librte_ring/rte_ring_template.h
> > >
> > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > > 4c8410229..818898110 100644
> > > --- a/lib/librte_ring/Makefile
> > > +++ b/lib/librte_ring/Makefile
> > > @@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c  #
> > > install includes  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> > > rte_ring.h \
> > >  					rte_ring_generic.h \
> > > -					rte_ring_c11_mem.h
> > > +					rte_ring_c11_mem.h \
> > > +					rte_ring_template.h \
> > > +					rte_ring_template.c
> > >
> > >  include $(RTE_SDK)/mk/rte.lib.mk
> > > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
> > > index 74219840a..e4e208a7c 100644
> > > --- a/lib/librte_ring/meson.build
> > > +++ b/lib/librte_ring/meson.build
> > > @@ -5,7 +5,9 @@ version = 2
> > >  sources = files('rte_ring.c')
> > >  headers = files('rte_ring.h',
> > >  		'rte_ring_c11_mem.h',
> > > -		'rte_ring_generic.h')
> > > +		'rte_ring_generic.h',
> > > +		'rte_ring_template.h',
> > > +		'rte_ring_template.c')
> > >
> > >  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> > > allow_experimental_apis = true diff --git
> > > a/lib/librte_ring/rte_ring_template.c
> > > b/lib/librte_ring/rte_ring_template.c
> > > new file mode 100644
> > > index 000000000..1ca593f95
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_template.c
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#include <stdio.h>
> > > +#include <stdarg.h>
> > > +#include <string.h>
> > > +#include <stdint.h>
> > > +#include <inttypes.h>
> > > +#include <errno.h>
> > > +#include <sys/queue.h>
> > > +
> > > +#include <rte_common.h>
> > > +#include <rte_log.h>
> > > +#include <rte_memory.h>
> > > +#include <rte_memzone.h>
> > > +#include <rte_malloc.h>
> > > +#include <rte_launch.h>
> > > +#include <rte_eal.h>
> > > +#include <rte_eal_memconfig.h>
> > > +#include <rte_atomic.h>
> > > +#include <rte_per_lcore.h>
> > > +#include <rte_lcore.h>
> > > +#include <rte_branch_prediction.h>
> > > +#include <rte_errno.h>
> > > +#include <rte_string_fns.h>
> > > +#include <rte_spinlock.h>
> > > +#include <rte_tailq.h>
> > > +
> > > +#include "rte_ring.h"
> > > +
> > > +/* return the size of memory occupied by a ring */ ssize_t
> > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count) {
> > > +	return rte_ring_get_memsize_elem(count,
> > RTE_RING_TMPLT_ELEM_SIZE); }
> > > +
> > > +/* create the ring */
> > > +struct rte_ring *
> > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> > > +		int socket_id, unsigned flags)
> > > +{
> > > +	return rte_ring_create_elem(name, count,
> > RTE_RING_TMPLT_ELEM_SIZE,
> > > +		socket_id, flags);
> > > +}
> > > diff --git a/lib/librte_ring/rte_ring_template.h
> > > b/lib/librte_ring/rte_ring_template.h
> > > new file mode 100644
> > > index 000000000..b9b14dfbb
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_template.h
> > > @@ -0,0 +1,330 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RING_TEMPLATE_H_
> > > +#define _RTE_RING_TEMPLATE_H_
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <stdio.h>
> > > +#include <stdint.h>
> > > +#include <sys/queue.h>
> > > +#include <errno.h>
> > > +#include <rte_common.h>
> > > +#include <rte_config.h>
> > > +#include <rte_memory.h>
> > > +#include <rte_lcore.h>
> > > +#include <rte_atomic.h>
> > > +#include <rte_branch_prediction.h>
> > > +#include <rte_memzone.h>
> > > +#include <rte_pause.h>
> > > +#include <rte_ring.h>
> > > +
> > > +/* Ring API suffix name - used to append to API names */ #ifndef
> > > +RTE_RING_TMPLT_API_SUFFIX #error RTE_RING_TMPLT_API_SUFFIX not
> > > +defined #endif
> > > +
> > > +/* Ring's element size in bits, should be a power of 2 */ #ifndef
> > > +RTE_RING_TMPLT_ELEM_SIZE #error RTE_RING_TMPLT_ELEM_SIZE not
> > defined
> > > +#endif
> > > +
> > > +/* Type of ring elements */
> > > +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> > > +#error RTE_RING_TMPLT_ELEM_TYPE not defined #endif
> > > +
> > > +#define _rte_fuse(a, b) a##_##b
> > > +#define __rte_fuse(a, b) _rte_fuse(a, b) #define __RTE_RING_CONCAT(a)
> > > +__rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> > > +
> > > +/* Calculate the memory size needed for a ring */
> > > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > > +
> > > +/* Create a new ring named *name* in memory. */
> > > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned count,
> > > +					int socket_id, unsigned flags);
> >
> >
> > Just an idea - probably same thing can be achieved in a different way.
> > Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS macros
> > with static inline functions and then make all internal functions, i.e.
> > __rte_ring_do_dequeue()
> > to accept enqueue/dequeue function pointer as a parameter.
> > Then let say default rte_ring_mc_dequeue_bulk will do:
> >
> > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >                 unsigned int n, unsigned int *available)
> > {
> >         return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> >                         __IS_MC, available, dequeue_ptr_default);
> > }
> >
> > Then if someone will like to define ring functions forelt_size==X, all he would
> > need to do:
> > 1. define his own enqueue/dequeuer functions.
> > 2. do something like:
> > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >                 unsigned int n, unsigned int *available)
> > {
> >         return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> >                         __IS_MC, available, dequeue_X);
> > }
> >
> > Konstantin
> Thanks for the feedback/idea. The goal of this patch was to make it simple enough to define APIs to store any element size without code
> duplication. 

Well, then if we store elt_size inside the ring, it should be easy enough
to add  to the API generic functions that would use memcpy(or rte_memcpy) for enqueue/dequeue.
Yes, it might be slower than existing (8B per elem), but might be still acceptable.

>With this patch, the user has to write ~4 lines of code to get APIs for any element size. I would like to keep the goal still the
> same.
> 
> If we have to avoid the macro-fest, the main problem that needs to be addressed is - how to represent different sizes of element types in a
> generic way? IMO, we can do this by defining the element type to be a multiple of uint32_t (I do not think we need to go to uint16_t).
> 
> For ex:
> rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
>                 uint32_t *obj_table, unsigned int num_objs,
>                 unsigned int n,
>                 enum rte_ring_queue_behavior behavior, unsigned int is_sp,
>                 unsigned int *free_space)
> {
> }
> 
> This approach would ensure that we have generic enough APIs and they can be used for elements of any size. But the element itself needs
> to be a multiple of 32b - I think this should not be a concern.
> 
> The API suffix definitely needs to be better, any suggestions?

> 
> >
> >
> > > +
> > > +/**
> > > + * @internal Enqueue several objects on the ring
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
> > > +		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int
> > n,
> > > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > +		unsigned int *free_space)
> > > +{
> > > +	uint32_t prod_head, prod_next;
> > > +	uint32_t free_entries;
> > > +
> > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > +			&prod_head, &prod_next, &free_entries);
> > > +	if (n == 0)
> > > +		goto end;
> > > +
> > > +	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
> > > +		RTE_RING_TMPLT_ELEM_TYPE);
> > > +
> > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > +end:
> > > +	if (free_space != NULL)
> > > +		*free_space = free_entries - n;
> > > +	return n;
> > > +}
> > > +
> > > +/**
> > > + * @internal Dequeue several objects from the ring
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
> > > +	unsigned int *available)
> > > +{
> > > +	uint32_t cons_head, cons_next;
> > > +	uint32_t entries;
> > > +
> > > +	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > +			&cons_head, &cons_next, &entries);
> > > +	if (n == 0)
> > > +		goto end;
> > > +
> > > +	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
> > > +		RTE_RING_TMPLT_ELEM_TYPE);
> > > +
> > > +	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > +
> > > +end:
> > > +	if (available != NULL)
> > > +		*available = entries - n;
> > > +	return n;
> > > +}
> > > +
> > > +
> > > +/**
> > > + * Enqueue several objects on the ring (multi-producers safe).
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > +	unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > +	unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue several objects on a ring.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > +	unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue one object on a ring (multi-producers safe).
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1,
> > NULL) ?
> > > +			0 : -ENOBUFS;
> > > +}
> > > +
> > > +/**
> > > + * Enqueue one object on a ring (NOT multi-producers safe).
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1,
> > NULL) ?
> > > +			0 : -ENOBUFS;
> > > +}
> > > +
> > > +/**
> > > + * Enqueue one object on a ring.
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1,
> > NULL) ?
> > > +			0 : -ENOBUFS;
> > > +}
> > > +
> > > +/**
> > > + * Dequeue several objects from a ring (multi-consumers safe).
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, __IS_MC, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue several objects from a ring (NOT multi-consumers safe).
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, __IS_SC, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue several objects from a ring.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_FIXED, r->cons.single, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue one object from a ring (multi-consumers safe).
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1,
> > NULL) ?
> > > +			0 : -ENOENT;
> > > +}
> > > +
> > > +/**
> > > + * Dequeue one object from a ring (NOT multi-consumers safe).
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1,
> > NULL) ?
> > > +			0 : -ENOENT;
> > > +}
> > > +
> > > +/**
> > > + * Dequeue one object from a ring.
> > > + */
> > > +static __rte_always_inline int
> > > +__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p)
> > > +{
> > > +	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1,
> > NULL) ?
> > > +			0 : -ENOENT;
> > > +}
> > > +
> > > +/**
> > > + * Enqueue several objects on the ring (multi-producers safe).
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > > +			 unsigned int n, unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > > +			 unsigned int n, unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> > > +}
> > > +
> > > +/**
> > > + * Enqueue several objects on a ring.
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *free_space)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> > free_space);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue several objects from a ring (multi-consumers safe). When the
> > request
> > > + * objects are more than the available objects, only dequeue the actual
> > number
> > > + * of objects
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue several objects from a ring (NOT multi-consumers safe).When
> > the
> > > + * request objects are more than the available objects, only dequeue the
> > > + * actual number of objects
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> > > +}
> > > +
> > > +/**
> > > + * Dequeue multiple objects from a ring up to a maximum number.
> > > + */
> > > +static __rte_always_inline unsigned
> > > +__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
> > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > +	unsigned int *available)
> > > +{
> > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > +				RTE_RING_QUEUE_VARIABLE,
> > > +				r->cons.single, available);
> > > +}
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RING_TEMPLATE_H_ */
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-02  8:39       ` Ananyev, Konstantin
@ 2019-10-03  3:33         ` Honnappa Nagarahalli
  2019-10-03 11:51           ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03  3:33 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, Wang, Yipeng1, Gobriel, Sameh,
	Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd, nd, nd

<snip>

> >
> > > > Add templates to support creating ring APIs with different ring
> > > > element sizes.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  lib/librte_ring/Makefile            |   4 +-
> > > >  lib/librte_ring/meson.build         |   4 +-
> > > >  lib/librte_ring/rte_ring_template.c |  46 ++++
> > > > lib/librte_ring/rte_ring_template.h | 330
> > > > ++++++++++++++++++++++++++++
> > > >  4 files changed, 382 insertions(+), 2 deletions(-)  create mode
> > > > 100644 lib/librte_ring/rte_ring_template.c
> > > >  create mode 100644 lib/librte_ring/rte_ring_template.h
> > > >
> > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > index
> > > > 4c8410229..818898110 100644
> > > > --- a/lib/librte_ring/Makefile
> > > > +++ b/lib/librte_ring/Makefile
> > > > @@ -19,6 +19,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c  #
> > > > install includes  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> > > > rte_ring.h \
> > > >  					rte_ring_generic.h \
> > > > -					rte_ring_c11_mem.h
> > > > +					rte_ring_c11_mem.h \
> > > > +					rte_ring_template.h \
> > > > +					rte_ring_template.c
> > > >
> > > >  include $(RTE_SDK)/mk/rte.lib.mk
> > > > diff --git a/lib/librte_ring/meson.build
> > > > b/lib/librte_ring/meson.build index 74219840a..e4e208a7c 100644
> > > > --- a/lib/librte_ring/meson.build
> > > > +++ b/lib/librte_ring/meson.build
> > > > @@ -5,7 +5,9 @@ version = 2
> > > >  sources = files('rte_ring.c')
> > > >  headers = files('rte_ring.h',
> > > >  		'rte_ring_c11_mem.h',
> > > > -		'rte_ring_generic.h')
> > > > +		'rte_ring_generic.h',
> > > > +		'rte_ring_template.h',
> > > > +		'rte_ring_template.c')
> > > >
> > > >  # rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > experimental allow_experimental_apis = true diff --git
> > > > a/lib/librte_ring/rte_ring_template.c
> > > > b/lib/librte_ring/rte_ring_template.c
> > > > new file mode 100644
> > > > index 000000000..1ca593f95
> > > > --- /dev/null
> > > > +++ b/lib/librte_ring/rte_ring_template.c
> > > > @@ -0,0 +1,46 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright (c) 2019 Arm Limited  */
> > > > +
> > > > +#include <stdio.h>
> > > > +#include <stdarg.h>
> > > > +#include <string.h>
> > > > +#include <stdint.h>
> > > > +#include <inttypes.h>
> > > > +#include <errno.h>
> > > > +#include <sys/queue.h>
> > > > +
> > > > +#include <rte_common.h>
> > > > +#include <rte_log.h>
> > > > +#include <rte_memory.h>
> > > > +#include <rte_memzone.h>
> > > > +#include <rte_malloc.h>
> > > > +#include <rte_launch.h>
> > > > +#include <rte_eal.h>
> > > > +#include <rte_eal_memconfig.h>
> > > > +#include <rte_atomic.h>
> > > > +#include <rte_per_lcore.h>
> > > > +#include <rte_lcore.h>
> > > > +#include <rte_branch_prediction.h> #include <rte_errno.h>
> > > > +#include <rte_string_fns.h> #include <rte_spinlock.h> #include
> > > > +<rte_tailq.h>
> > > > +
> > > > +#include "rte_ring.h"
> > > > +
> > > > +/* return the size of memory occupied by a ring */ ssize_t
> > > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count) {
> > > > +	return rte_ring_get_memsize_elem(count,
> > > RTE_RING_TMPLT_ELEM_SIZE); }
> > > > +
> > > > +/* create the ring */
> > > > +struct rte_ring *
> > > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned
> count,
> > > > +		int socket_id, unsigned flags)
> > > > +{
> > > > +	return rte_ring_create_elem(name, count,
> > > RTE_RING_TMPLT_ELEM_SIZE,
> > > > +		socket_id, flags);
> > > > +}
> > > > diff --git a/lib/librte_ring/rte_ring_template.h
> > > > b/lib/librte_ring/rte_ring_template.h
> > > > new file mode 100644
> > > > index 000000000..b9b14dfbb
> > > > --- /dev/null
> > > > +++ b/lib/librte_ring/rte_ring_template.h
> > > > @@ -0,0 +1,330 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright (c) 2019 Arm Limited  */
> > > > +
> > > > +#ifndef _RTE_RING_TEMPLATE_H_
> > > > +#define _RTE_RING_TEMPLATE_H_
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +#include <stdio.h>
> > > > +#include <stdint.h>
> > > > +#include <sys/queue.h>
> > > > +#include <errno.h>
> > > > +#include <rte_common.h>
> > > > +#include <rte_config.h>
> > > > +#include <rte_memory.h>
> > > > +#include <rte_lcore.h>
> > > > +#include <rte_atomic.h>
> > > > +#include <rte_branch_prediction.h> #include <rte_memzone.h>
> > > > +#include <rte_pause.h> #include <rte_ring.h>
> > > > +
> > > > +/* Ring API suffix name - used to append to API names */ #ifndef
> > > > +RTE_RING_TMPLT_API_SUFFIX #error RTE_RING_TMPLT_API_SUFFIX
> not
> > > > +defined #endif
> > > > +
> > > > +/* Ring's element size in bits, should be a power of 2 */ #ifndef
> > > > +RTE_RING_TMPLT_ELEM_SIZE #error RTE_RING_TMPLT_ELEM_SIZE
> not
> > > defined
> > > > +#endif
> > > > +
> > > > +/* Type of ring elements */
> > > > +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> > > > +#error RTE_RING_TMPLT_ELEM_TYPE not defined #endif
> > > > +
> > > > +#define _rte_fuse(a, b) a##_##b
> > > > +#define __rte_fuse(a, b) _rte_fuse(a, b) #define
> > > > +__RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> > > > +
> > > > +/* Calculate the memory size needed for a ring */
> > > > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > > > +
> > > > +/* Create a new ring named *name* in memory. */
> > > > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned
> count,
> > > > +					int socket_id, unsigned flags);
> > >
> > >
> > > Just an idea - probably same thing can be achieved in a different way.
> > > Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS
> > > macros with static inline functions and then make all internal functions,
> i.e.
> > > __rte_ring_do_dequeue()
> > > to accept enqueue/dequeue function pointer as a parameter.
> > > Then let say default rte_ring_mc_dequeue_bulk will do:
> > >
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >                 unsigned int n, unsigned int *available) {
> > >         return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> > >                         __IS_MC, available, dequeue_ptr_default); }
> > >
> > > Then if someone will like to define ring functions forelt_size==X,
> > > all he would need to do:
> > > 1. define his own enqueue/dequeuer functions.
> > > 2. do something like:
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >                 unsigned int n, unsigned int *available) {
> > >         return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> > >                         __IS_MC, available, dequeue_X); }
> > >
> > > Konstantin
> > Thanks for the feedback/idea. The goal of this patch was to make it
> > simple enough to define APIs to store any element size without code
> duplication.
> 
> Well, then if we store elt_size inside the ring, it should be easy enough to add
> to the API generic functions that would use memcpy(or rte_memcpy) for
> enqueue/dequeue.
> Yes, it might be slower than existing (8B per elem), but might be still
> acceptable.
The element size will be a constant in most use cases. If we keep the element size as a parameter, it allows the compiler to do any loop unrolling and auto-vectorization optimizations on copying.
Storing the element size will result in additional memory access.

> 
> >With this patch, the user has to write ~4 lines of code to get APIs for
> >any element size. I would like to keep the goal still the  same.
> >
> > If we have to avoid the macro-fest, the main problem that needs to be
> > addressed is - how to represent different sizes of element types in a generic
> way? IMO, we can do this by defining the element type to be a multiple of
> uint32_t (I do not think we need to go to uint16_t).
> >
> > For ex:
> > rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
> >                 uint32_t *obj_table, unsigned int num_objs,
> >                 unsigned int n,
> >                 enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> >                 unsigned int *free_space) { }
> >
> > This approach would ensure that we have generic enough APIs and they
> > can be used for elements of any size. But the element itself needs to be a
> multiple of 32b - I think this should not be a concern.
> >
> > The API suffix definitely needs to be better, any suggestions?
> 
> >
> > >
> > >
> > > > +
> > > > +/**
> > > > + * @internal Enqueue several objects on the ring  */ static
> > > > +__rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(__rte_ring_do_enqueue)(struct rte_ring *r,
> > > > +		RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int
> > > n,
> > > > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > > +		unsigned int *free_space)
> > > > +{
> > > > +	uint32_t prod_head, prod_next;
> > > > +	uint32_t free_entries;
> > > > +
> > > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > > +			&prod_head, &prod_next, &free_entries);
> > > > +	if (n == 0)
> > > > +		goto end;
> > > > +
> > > > +	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n,
> > > > +		RTE_RING_TMPLT_ELEM_TYPE);
> > > > +
> > > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > +end:
> > > > +	if (free_space != NULL)
> > > > +		*free_space = free_entries - n;
> > > > +	return n;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @internal Dequeue several objects from the ring  */ static
> > > > +__rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(__rte_ring_do_dequeue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	enum rte_ring_queue_behavior behavior, unsigned int is_sc,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	uint32_t cons_head, cons_next;
> > > > +	uint32_t entries;
> > > > +
> > > > +	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > > +			&cons_head, &cons_next, &entries);
> > > > +	if (n == 0)
> > > > +		goto end;
> > > > +
> > > > +	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n,
> > > > +		RTE_RING_TMPLT_ELEM_TYPE);
> > > > +
> > > > +	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > > +
> > > > +end:
> > > > +	if (available != NULL)
> > > > +		*available = entries - n;
> > > > +	return n;
> > > > +}
> > > > +
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on the ring (multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > > +	unsigned int *free_space)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, __IS_MP, free_space); }
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > > +	unsigned int *free_space)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, __IS_SP, free_space); }
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on a ring.
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_enqueue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE const *obj_table, unsigned int n,
> > > > +	unsigned int *free_space)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, r->prod.single,
> free_space); }
> > > > +
> > > > +/**
> > > > + * Enqueue one object on a ring (multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(rte_ring_mp_enqueue_bulk)(r, &obj, 1,
> > > NULL) ?
> > > > +			0 : -ENOBUFS;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Enqueue one object on a ring (NOT multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE obj)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(rte_ring_sp_enqueue_bulk)(r, &obj, 1,
> > > NULL) ?
> > > > +			0 : -ENOBUFS;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Enqueue one object on a ring.
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_enqueue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(rte_ring_enqueue_bulk)(r, obj, 1,
> > > NULL) ?
> > > > +			0 : -ENOBUFS;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Dequeue several objects from a ring (multi-consumers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, __IS_MC, available); }
> > > > +
> > > > +/**
> > > > + * Dequeue several objects from a ring (NOT multi-consumers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, __IS_SC, available); }
> > > > +
> > > > +/**
> > > > + * Dequeue several objects from a ring.
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__RTE_RING_CONCAT(rte_ring_dequeue_bulk)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_FIXED, r->cons.single, available); }
> > > > +
> > > > +/**
> > > > + * Dequeue one object from a ring (multi-consumers safe).
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p) {
> > > > +	return __RTE_RING_CONCAT(rte_ring_mc_dequeue_bulk)(r, obj_p, 1,
> > > NULL) ?
> > > > +			0 : -ENOENT;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Dequeue one object from a ring (NOT multi-consumers safe).
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p) {
> > > > +	return __RTE_RING_CONCAT(rte_ring_sc_dequeue_bulk)(r, obj_p, 1,
> > > NULL) ?
> > > > +			0 : -ENOENT;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Dequeue one object from a ring.
> > > > + */
> > > > +static __rte_always_inline int
> > > > +__RTE_RING_CONCAT(rte_ring_dequeue)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_p) {
> > > > +	return __RTE_RING_CONCAT(rte_ring_dequeue_bulk)(r, obj_p, 1,
> > > NULL) ?
> > > > +			0 : -ENOENT;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on the ring (multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_mp_enqueue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > > > +			 unsigned int n, unsigned int *free_space) {
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space); }
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on a ring (NOT multi-producers safe).
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_sp_enqueue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table,
> > > > +			 unsigned int n, unsigned int *free_space) {
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space); }
> > > > +
> > > > +/**
> > > > + * Enqueue several objects on a ring.
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_enqueue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *free_space)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_enqueue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> > > free_space);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Dequeue several objects from a ring (multi-consumers safe).
> > > > +When the
> > > request
> > > > + * objects are more than the available objects, only dequeue the
> > > > + actual
> > > number
> > > > + * of objects
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_mc_dequeue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_VARIABLE, __IS_MC, available); }
> > > > +
> > > > +/**
> > > > + * Dequeue several objects from a ring (NOT multi-consumers
> > > > +safe).When
> > > the
> > > > + * request objects are more than the available objects, only
> > > > +dequeue the
> > > > + * actual number of objects
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_sc_dequeue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +			RTE_RING_QUEUE_VARIABLE, __IS_SC, available); }
> > > > +
> > > > +/**
> > > > + * Dequeue multiple objects from a ring up to a maximum number.
> > > > + */
> > > > +static __rte_always_inline unsigned
> > > > +__RTE_RING_CONCAT(rte_ring_dequeue_burst)(struct rte_ring *r,
> > > > +	RTE_RING_TMPLT_ELEM_TYPE *obj_table, unsigned int n,
> > > > +	unsigned int *available)
> > > > +{
> > > > +	return __RTE_RING_CONCAT(__rte_ring_do_dequeue)(r, obj_table, n,
> > > > +				RTE_RING_QUEUE_VARIABLE,
> > > > +				r->cons.single, available);
> > > > +}
> > > > +
> > > > +#ifdef __cplusplus
> > > > +}
> > > > +#endif
> > > > +
> > > > +#endif /* _RTE_RING_TEMPLATE_H_ */
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-03  3:33         ` Honnappa Nagarahalli
@ 2019-10-03 11:51           ` Ananyev, Konstantin
  2019-10-03 12:27             ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 11:51 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, Wang, Yipeng1, Gobriel,
	Sameh, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd, nd, nd



> > > > > +++ b/lib/librte_ring/rte_ring_template.h
> > > > > @@ -0,0 +1,330 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > +
> > > > > +#ifndef _RTE_RING_TEMPLATE_H_
> > > > > +#define _RTE_RING_TEMPLATE_H_
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif
> > > > > +
> > > > > +#include <stdio.h>
> > > > > +#include <stdint.h>
> > > > > +#include <sys/queue.h>
> > > > > +#include <errno.h>
> > > > > +#include <rte_common.h>
> > > > > +#include <rte_config.h>
> > > > > +#include <rte_memory.h>
> > > > > +#include <rte_lcore.h>
> > > > > +#include <rte_atomic.h>
> > > > > +#include <rte_branch_prediction.h> #include <rte_memzone.h>
> > > > > +#include <rte_pause.h> #include <rte_ring.h>
> > > > > +
> > > > > +/* Ring API suffix name - used to append to API names */ #ifndef
> > > > > +RTE_RING_TMPLT_API_SUFFIX #error RTE_RING_TMPLT_API_SUFFIX
> > not
> > > > > +defined #endif
> > > > > +
> > > > > +/* Ring's element size in bits, should be a power of 2 */ #ifndef
> > > > > +RTE_RING_TMPLT_ELEM_SIZE #error RTE_RING_TMPLT_ELEM_SIZE
> > not
> > > > defined
> > > > > +#endif
> > > > > +
> > > > > +/* Type of ring elements */
> > > > > +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> > > > > +#error RTE_RING_TMPLT_ELEM_TYPE not defined #endif
> > > > > +
> > > > > +#define _rte_fuse(a, b) a##_##b
> > > > > +#define __rte_fuse(a, b) _rte_fuse(a, b) #define
> > > > > +__RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> > > > > +
> > > > > +/* Calculate the memory size needed for a ring */
> > > > > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > > > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > > > > +
> > > > > +/* Create a new ring named *name* in memory. */
> > > > > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > > > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned
> > count,
> > > > > +					int socket_id, unsigned flags);
> > > >
> > > >
> > > > Just an idea - probably same thing can be achieved in a different way.
> > > > Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS
> > > > macros with static inline functions and then make all internal functions,
> > i.e.
> > > > __rte_ring_do_dequeue()
> > > > to accept enqueue/dequeue function pointer as a parameter.
> > > > Then let say default rte_ring_mc_dequeue_bulk will do:
> > > >
> > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > >                 unsigned int n, unsigned int *available) {
> > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > > >                         __IS_MC, available, dequeue_ptr_default); }
> > > >
> > > > Then if someone will like to define ring functions forelt_size==X,
> > > > all he would need to do:
> > > > 1. define his own enqueue/dequeuer functions.
> > > > 2. do something like:
> > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > >                 unsigned int n, unsigned int *available) {
> > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > > >                         __IS_MC, available, dequeue_X); }
> > > >
> > > > Konstantin
> > > Thanks for the feedback/idea. The goal of this patch was to make it
> > > simple enough to define APIs to store any element size without code
> > duplication.
> >
> > Well, then if we store elt_size inside the ring, it should be easy enough to add
> > to the API generic functions that would use memcpy(or rte_memcpy) for
> > enqueue/dequeue.
> > Yes, it might be slower than existing (8B per elem), but might be still
> > acceptable.
> The element size will be a constant in most use cases. If we keep the element size as a parameter, it allows the compiler to do any loop
> unrolling and auto-vectorization optimizations on copying.
> Storing the element size will result in additional memory access.

I understand that, but for you case (rcu defer queue) you probably need highest possible performance, right?
I am sure there will be other cases where such slight perf degradation is acceptatble.

> 
> >
> > >With this patch, the user has to write ~4 lines of code to get APIs for
> > >any element size. I would like to keep the goal still the  same.
> > >
> > > If we have to avoid the macro-fest, the main problem that needs to be
> > > addressed is - how to represent different sizes of element types in a generic
> > way? IMO, we can do this by defining the element type to be a multiple of
> > uint32_t (I do not think we need to go to uint16_t).
> > >
> > > For ex:
> > > rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
> > >                 uint32_t *obj_table, unsigned int num_objs,
> > >                 unsigned int n,
> > >                 enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > >                 unsigned int *free_space) { }
> > >
> > > This approach would ensure that we have generic enough APIs and they
> > > can be used for elements of any size. But the element itself needs to be a
> > multiple of 32b - I think this should not be a concern.
> > >
> > > The API suffix definitely needs to be better, any suggestions?
> >

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-03 11:51           ` Ananyev, Konstantin
@ 2019-10-03 12:27             ` Ananyev, Konstantin
  2019-10-03 22:49               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 12:27 UTC (permalink / raw)
  To: Ananyev, Konstantin, Honnappa Nagarahalli, olivier.matz, Wang,
	Yipeng1, Gobriel, Sameh, Richardson, Bruce, De Lara Guarch,
	Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	nd, nd, nd


> 
> > > > > > +++ b/lib/librte_ring/rte_ring_template.h
> > > > > > @@ -0,0 +1,330 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > > +
> > > > > > +#ifndef _RTE_RING_TEMPLATE_H_
> > > > > > +#define _RTE_RING_TEMPLATE_H_
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <stdio.h>
> > > > > > +#include <stdint.h>
> > > > > > +#include <sys/queue.h>
> > > > > > +#include <errno.h>
> > > > > > +#include <rte_common.h>
> > > > > > +#include <rte_config.h>
> > > > > > +#include <rte_memory.h>
> > > > > > +#include <rte_lcore.h>
> > > > > > +#include <rte_atomic.h>
> > > > > > +#include <rte_branch_prediction.h> #include <rte_memzone.h>
> > > > > > +#include <rte_pause.h> #include <rte_ring.h>
> > > > > > +
> > > > > > +/* Ring API suffix name - used to append to API names */ #ifndef
> > > > > > +RTE_RING_TMPLT_API_SUFFIX #error RTE_RING_TMPLT_API_SUFFIX
> > > not
> > > > > > +defined #endif
> > > > > > +
> > > > > > +/* Ring's element size in bits, should be a power of 2 */ #ifndef
> > > > > > +RTE_RING_TMPLT_ELEM_SIZE #error RTE_RING_TMPLT_ELEM_SIZE
> > > not
> > > > > defined
> > > > > > +#endif
> > > > > > +
> > > > > > +/* Type of ring elements */
> > > > > > +#ifndef RTE_RING_TMPLT_ELEM_TYPE
> > > > > > +#error RTE_RING_TMPLT_ELEM_TYPE not defined #endif
> > > > > > +
> > > > > > +#define _rte_fuse(a, b) a##_##b
> > > > > > +#define __rte_fuse(a, b) _rte_fuse(a, b) #define
> > > > > > +__RTE_RING_CONCAT(a) __rte_fuse(a, RTE_RING_TMPLT_API_SUFFIX)
> > > > > > +
> > > > > > +/* Calculate the memory size needed for a ring */
> > > > > > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > > > > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > > > > > +
> > > > > > +/* Create a new ring named *name* in memory. */
> > > > > > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > > > > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name, unsigned
> > > count,
> > > > > > +					int socket_id, unsigned flags);
> > > > >
> > > > >
> > > > > Just an idea - probably same thing can be achieved in a different way.
> > > > > Instead of all these defines - replace ENQUEUE_PTRS/DEQUEUE_PTRS
> > > > > macros with static inline functions and then make all internal functions,
> > > i.e.
> > > > > __rte_ring_do_dequeue()
> > > > > to accept enqueue/dequeue function pointer as a parameter.
> > > > > Then let say default rte_ring_mc_dequeue_bulk will do:
> > > > >
> > > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > >                 unsigned int n, unsigned int *available) {
> > > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > > >                         __IS_MC, available, dequeue_ptr_default); }
> > > > >
> > > > > Then if someone will like to define ring functions forelt_size==X,
> > > > > all he would need to do:
> > > > > 1. define his own enqueue/dequeuer functions.
> > > > > 2. do something like:
> > > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > >                 unsigned int n, unsigned int *available) {
> > > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > > >                         __IS_MC, available, dequeue_X); }
> > > > >
> > > > > Konstantin
> > > > Thanks for the feedback/idea. The goal of this patch was to make it
> > > > simple enough to define APIs to store any element size without code
> > > duplication.
> > >
> > > Well, then if we store elt_size inside the ring, it should be easy enough to add
> > > to the API generic functions that would use memcpy(or rte_memcpy) for
> > > enqueue/dequeue.
> > > Yes, it might be slower than existing (8B per elem), but might be still
> > > acceptable.
> > The element size will be a constant in most use cases. If we keep the element size as a parameter, it allows the compiler to do any loop
> > unrolling and auto-vectorization optimizations on copying.
> > Storing the element size will result in additional memory access.
> 
> I understand that, but for you case (rcu defer queue) you probably need highest possible performance, right?

Meant 'don't need' of course :)

> I am sure there will be other cases where such slight perf degradation is acceptatble.
> 
> >
> > >
> > > >With this patch, the user has to write ~4 lines of code to get APIs for
> > > >any element size. I would like to keep the goal still the  same.
> > > >
> > > > If we have to avoid the macro-fest, the main problem that needs to be
> > > > addressed is - how to represent different sizes of element types in a generic
> > > way? IMO, we can do this by defining the element type to be a multiple of
> > > uint32_t (I do not think we need to go to uint16_t).
> > > >
> > > > For ex:
> > > > rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
> > > >                 uint32_t *obj_table, unsigned int num_objs,
> > > >                 unsigned int n,
> > > >                 enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > >                 unsigned int *free_space) { }
> > > >
> > > > This approach would ensure that we have generic enough APIs and they
> > > > can be used for elements of any size. But the element itself needs to be a
> > > multiple of 32b - I think this should not be a concern.
> > > >
> > > > The API suffix definitely needs to be better, any suggestions?
> > >

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes
  2019-10-03 12:27             ` Ananyev, Konstantin
@ 2019-10-03 22:49               ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03 22:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, Wang, Yipeng1, Gobriel, Sameh,
	Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Dharmik Thakkar, Gavin Hu (Arm Technology China),
	Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

<snip>

> > > > > > > +++ b/lib/librte_ring/rte_ring_template.h
> > > > > > > @@ -0,0 +1,330 @@
> > > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > > > +
> > > > > > > +#ifndef _RTE_RING_TEMPLATE_H_ #define
> _RTE_RING_TEMPLATE_H_
> > > > > > > +
> > > > > > > +#ifdef __cplusplus
> > > > > > > +extern "C" {
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +#include <stdio.h>
> > > > > > > +#include <stdint.h>
> > > > > > > +#include <sys/queue.h>
> > > > > > > +#include <errno.h>
> > > > > > > +#include <rte_common.h>
> > > > > > > +#include <rte_config.h>
> > > > > > > +#include <rte_memory.h>
> > > > > > > +#include <rte_lcore.h>
> > > > > > > +#include <rte_atomic.h>
> > > > > > > +#include <rte_branch_prediction.h> #include <rte_memzone.h>
> > > > > > > +#include <rte_pause.h> #include <rte_ring.h>
> > > > > > > +
> > > > > > > +/* Ring API suffix name - used to append to API names */
> > > > > > > +#ifndef RTE_RING_TMPLT_API_SUFFIX #error
> > > > > > > +RTE_RING_TMPLT_API_SUFFIX
> > > > not
> > > > > > > +defined #endif
> > > > > > > +
> > > > > > > +/* Ring's element size in bits, should be a power of 2 */
> > > > > > > +#ifndef RTE_RING_TMPLT_ELEM_SIZE #error
> > > > > > > +RTE_RING_TMPLT_ELEM_SIZE
> > > > not
> > > > > > defined
> > > > > > > +#endif
> > > > > > > +
> > > > > > > +/* Type of ring elements */ #ifndef
> > > > > > > +RTE_RING_TMPLT_ELEM_TYPE #error
> RTE_RING_TMPLT_ELEM_TYPE
> > > > > > > +not defined #endif
> > > > > > > +
> > > > > > > +#define _rte_fuse(a, b) a##_##b #define __rte_fuse(a, b)
> > > > > > > +_rte_fuse(a, b) #define
> > > > > > > +__RTE_RING_CONCAT(a) __rte_fuse(a,
> > > > > > > +RTE_RING_TMPLT_API_SUFFIX)
> > > > > > > +
> > > > > > > +/* Calculate the memory size needed for a ring */
> > > > > > > +RTE_RING_TMPLT_EXPERIMENTAL ssize_t
> > > > > > > +__RTE_RING_CONCAT(rte_ring_get_memsize)(unsigned count);
> > > > > > > +
> > > > > > > +/* Create a new ring named *name* in memory. */
> > > > > > > +RTE_RING_TMPLT_EXPERIMENTAL struct rte_ring *
> > > > > > > +__RTE_RING_CONCAT(rte_ring_create)(const char *name,
> > > > > > > +unsigned
> > > > count,
> > > > > > > +					int socket_id, unsigned flags);
> > > > > >
> > > > > >
> > > > > > Just an idea - probably same thing can be achieved in a different
> way.
> > > > > > Instead of all these defines - replace
> > > > > > ENQUEUE_PTRS/DEQUEUE_PTRS macros with static inline functions
> > > > > > and then make all internal functions,
> > > > i.e.
> > > > > > __rte_ring_do_dequeue()
> > > > > > to accept enqueue/dequeue function pointer as a parameter.
> > > > > > Then let say default rte_ring_mc_dequeue_bulk will do:
> > > > > >
> > > > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > > >                 unsigned int n, unsigned int *available) {
> > > > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > > > RTE_RING_QUEUE_FIXED,
> > > > > >                         __IS_MC, available,
> > > > > > dequeue_ptr_default); }
> > > > > >
> > > > > > Then if someone will like to define ring functions
> > > > > > forelt_size==X, all he would need to do:
> > > > > > 1. define his own enqueue/dequeuer functions.
> > > > > > 2. do something like:
> > > > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > > >                 unsigned int n, unsigned int *available) {
> > > > > >         return __rte_ring_do_dequeue(r, obj_table, n,
> > > > RTE_RING_QUEUE_FIXED,
> > > > > >                         __IS_MC, available, dequeue_X); }
> > > > > >
> > > > > > Konstantin
> > > > > Thanks for the feedback/idea. The goal of this patch was to make
> > > > > it simple enough to define APIs to store any element size
> > > > > without code
> > > > duplication.
> > > >
> > > > Well, then if we store elt_size inside the ring, it should be easy
> > > > enough to add to the API generic functions that would use
> > > > memcpy(or rte_memcpy) for enqueue/dequeue.
> > > > Yes, it might be slower than existing (8B per elem), but might be
> > > > still acceptable.
> > > The element size will be a constant in most use cases. If we keep
> > > the element size as a parameter, it allows the compiler to do any loop
> unrolling and auto-vectorization optimizations on copying.
> > > Storing the element size will result in additional memory access.
> >
> > I understand that, but for you case (rcu defer queue) you probably need
> highest possible performance, right?
> 
> Meant 'don't need' of course :)
😊 understood. that is just one use case. It actually started as an option to reduce memory usage in different places. You can look at the rte_hash changes in this patch. I also have plans for further changes.

> 
> > I am sure there will be other cases where such slight perf degradation is
> acceptatble.
> >
> > >
> > > >
> > > > >With this patch, the user has to write ~4 lines of code to get
> > > > >APIs for any element size. I would like to keep the goal still the  same.
> > > > >
> > > > > If we have to avoid the macro-fest, the main problem that needs
> > > > > to be addressed is - how to represent different sizes of element
> > > > > types in a generic
> > > > way? IMO, we can do this by defining the element type to be a
> > > > multiple of uint32_t (I do not think we need to go to uint16_t).
> > > > >
> > > > > For ex:
> > > > > rte_ring_mp_enqueue_bulk_objs(struct rte_ring *r,
> > > > >                 uint32_t *obj_table, unsigned int num_objs,
> > > > >                 unsigned int n,
> > > > >                 enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > > >                 unsigned int *free_space) { }
> > > > >
> > > > > This approach would ensure that we have generic enough APIs and
> > > > > they can be used for elements of any size. But the element
> > > > > itself needs to be a
> > > > multiple of 32b - I think this should not be a concern.
> > > > >
> > > > > The API suffix definitely needs to be better, any suggestions?
> > > >

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (6 preceding siblings ...)
  2019-09-09 13:04   ` [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size Aaron Conole
@ 2019-10-07 13:49   ` David Marchand
  2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 173+ messages in thread
From: David Marchand @ 2019-10-07 13:49 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Olivier Matz, Wang, Yipeng1, Gobriel, Sameh, Bruce Richardson,
	Pablo de Lara, dev, pbhagavatula, Jerin Jacob Kollanukkaran

On Fri, Sep 6, 2019 at 9:05 PM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> The current rte_ring hard-codes the type of the ring element to 'void *',
> hence the size of the element is hard-coded to 32b/64b. Since the ring
> element type is not an input to rte_ring APIs, it results in couple
> of issues:
>
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
>
> This patch consists of several parts:
> 1) New APIs to support configurable ring element size
>    These will help reduce code duplication in the templates. I think these
>    can be made internal (do not expose to DPDK applications, but expose to
>    DPDK libraries), feedback needed.
>
> 2) rte_ring templates
>    The templates provide an easy way to add new APIs for different ring
>    element types/sizes which can be used by multiple libraries. These
>    also allow for creating APIs to store elements of custom types
>    (for ex: a structure)
>
>    The template needs 4 parameters:
>    a) RTE_RING_TMPLT_API_SUFFIX - This is used as a suffix to the
>       rte_ring APIs.
>       For ex: if RTE_RING_TMPLT_API_SUFFIX is '32b', the API name will be
>       rte_ring_create_32b
>    b) RTE_RING_TMPLT_ELEM_SIZE - Size of the ring element in bytes.
>       For ex: sizeof(uint32_t)
>    c) RTE_RING_TMPLT_ELEM_TYPE - Type of the ring element.
>       For ex: uint32_t. If a common ring library does not use a standard
>       data type, it should create its own type by defining a structure
>       with standard data type. For ex: for an elment size of 96b, one
>       could define a structure
>
>       struct s_96b {
>           uint32_t a[3];
>       }
>       The common library can use this structure to define
>       RTE_RING_TMPLT_ELEM_TYPE.
>
>       The application using this common ring library should define its
>       element type as a union with the above structure.
>
>       union app_element_type {
>           struct s_96b v;
>           struct app_element {
>               uint16_t a;
>               uint16_t b;
>               uint32_t c;
>               uint32_t d;
>           }
>       }
>    d) RTE_RING_TMPLT_EXPERIMENTAL - Indicates if the new APIs being defined
>       are experimental. Should be set to empty to remove the experimental
>       tag.
>
>    The ring library consists of some APIs that are defined as inline
>    functions and some APIs that are non-inline functions. The non-inline
>    functions are in rte_ring_template.c. However, this file needs to be
>    included in other .c files. Any feedback on how to handle this is
>    appreciated.
>
>    Note that the templates help create the APIs that are dependent on the
>    element size (for ex: rte_ring_create, enqueue/dequeue etc). Other APIs
>    that do NOT depend on the element size do not need to be part of the
>    template (for ex: rte_ring_dump, rte_ring_count, rte_ring_free_count
>    etc).
>
> 3) APIs for 32b ring element size
>    This uses the templates to create APIs to enqueue/dequeue elements of
>    size 32b.
>
> 4) rte_hash libray is changed to use 32b ring APIs
>    The 32b APIs are used in rte_hash library to store the free slot index
>    and free bucket index.
>
> 5) Event Dev changed to use ring templates
>    Event Dev defines its own 128b ring APIs using the templates. This helps
>    in keeping the 'struct rte_event' as is. If Event Dev has to use generic
>    128b ring APIs, it requires 'struct rte_event' to change to
>    'union rte_event' to include a generic data type such as '__int128_t'.
>    This breaks the API compatibility and results in large number of
>    changes.
>    With this change, the event rings are stored on rte_ring's tailq.
>    Event Dev specific ring list is NOT available. IMO, this does not have
>    any impact to the user.
>
> This patch results in following checkpatch issue:
> WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
>
> However, this patch is following the rules in the existing code. Please
> let me know if this needs to be fixed.
>
> v2
>  - Change Event Ring implementation to use ring templates
>    (Jerin, Pavan)

I expect a v3 on this series:
- Bruce/Stephen were not happy with using macros,
- Aaron caught test issues,
- from my side, if patch 3 still applies after your changes, I prefer
we drop this patch on the check script, we can live with these
warnings,


Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (7 preceding siblings ...)
  2019-10-07 13:49   ` David Marchand
@ 2019-10-08 19:19   ` Honnappa Nagarahalli
  2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
  2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                     ` (6 subsequent siblings)
  15 siblings, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 19:19 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation. The patch adds same performance tests that are run
for existing APIs. This allows for performance comparison.

I also tested with memcpy. x86 shows significant improvements on bulk
and burst tests. On the Arm platform, I used, there is a drop of
4% to 6% in few tests. May be this is something that we can explore
later.

Note that this version skips changes to other libraries as I would
like to get an agreement on the implementation from the community.
They will be added once there is agreement on the rte_ring changes.

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (2):
  lib/ring: apis to support configurable element size
  test/ring: add test cases for configurable element size ring

 app/test/Makefile                    |   1 +
 app/test/meson.build                 |   1 +
 app/test/test_ring_perf_elem.c       | 419 ++++++++++++
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 9 files changed, 1412 insertions(+), 9 deletions(-)
 create mode 100644 app/test/test_ring_perf_elem.c
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] lib/ring: apis to support configurable element size
  2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
@ 2019-10-08 19:19     ` Honnappa Nagarahalli
  2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 19:19 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 6 files changed, 991 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..515a967bb 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..74219840a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,3 +6,6 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..6fed3648b 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, unsigned esize)
 {
 	ssize_t sz;
 
+	/* Supported esize values are 4/8/16.
+	 * Others can be added on need basis.
+	 */
+	if ((esize != 4) && (esize != 8) && (esize != 16)) {
+		RTE_LOG(ERR, RING,
+			"Unsupported esize value. Supported values are 4, 8 and 16\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be "
+			"power of 2, and do not exceed the limit %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..d395229f1
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,946 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with flexible element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
+				unsigned esize, int socket_id, unsigned flags);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 8) \
+		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 16) \
+		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
+} while (0)
+
+#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x8))); i += 8, idx += 8) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+			ring[idx + 4] = obj[i + 4]; \
+			ring[idx + 5] = obj[i + 5]; \
+			ring[idx + 6] = obj[i + 6]; \
+			ring[idx + 7] = obj[i + 7]; \
+		} \
+		switch (n & 0x8) { \
+		case 7: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 6: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 5: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 4: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions.
+ */
+#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 8) \
+		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 16) \
+		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
+} while (0)
+
+#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x8)); i += 8, idx += 8) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+			obj[i + 4] = ring[idx + 4]; \
+			obj[i + 5] = ring[idx + 5]; \
+			obj[i + 6] = ring[idx + 6]; \
+			obj[i+7] = ring[idx+7]; \
+		} \
+		switch (n & 0x8) { \
+		case 7: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 6: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 5: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 4: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] test/ring: add test cases for configurable element size ring
  2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
  2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-10-08 19:19     ` Honnappa Nagarahalli
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 19:19 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Add test cases to test APIs for configurable element size ring.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile              |   1 +
 app/test/meson.build           |   1 +
 app/test/test_ring_perf_elem.c | 419 +++++++++++++++++++++++++++++++++
 3 files changed, 421 insertions(+)
 create mode 100644 app/test/test_ring_perf_elem.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..e5cb27b75 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_perf_elem.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..995ee9bc7 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_reorder.c',
 	'test_ring.c',
 	'test_ring_perf.c',
+	'test_ring_perf_elem.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_perf_elem.c b/app/test/test_ring_perf_elem.c
new file mode 100644
index 000000000..fc5b82d71
--- /dev/null
+++ b/app/test/test_ring_perf_elem.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+
+#include <stdio.h>
+#include <inttypes.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+
+#include "test.h"
+
+/*
+ * Ring
+ * ====
+ *
+ * Measures performance of various operations using rdtsc
+ *  * Empty ring dequeue
+ *  * Enqueue/dequeue of bursts in 1 threads
+ *  * Enqueue/dequeue of bursts in 2 threads
+ */
+
+#define RING_NAME "RING_PERF"
+#define RING_SIZE 4096
+#define MAX_BURST 64
+
+/*
+ * the sizes to enqueue and dequeue in testing
+ * (marked volatile so they won't be seen as compile-time constants)
+ */
+static const volatile unsigned bulk_sizes[] = { 8, 32 };
+
+struct lcore_pair {
+	unsigned c1, c2;
+};
+
+static volatile unsigned lcore_count;
+
+/**** Functions to analyse our core mask to get cores for different tests ***/
+
+static int
+get_two_hyperthreads(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		/* inner loop just re-reads all id's. We could skip the
+		 * first few elements, but since number of cores is small
+		 * there is little point
+		 */
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 == c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_cores(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 != c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_sockets(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if (s1 != s2) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
+static void
+test_empty_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 26;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[MAX_BURST];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_sc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_mc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SC empty dequeue: %.2F\n",
+			(double)(sc_end-sc_start) / iterations);
+	printf("MC empty dequeue: %.2F\n",
+			(double)(mc_end-mc_start) / iterations);
+}
+
+/*
+ * for the separate enqueue and dequeue threads they take in one param
+ * and return two. Input = burst size, output = cycle average for sp/sc & mp/mc
+ */
+struct thread_params {
+	struct rte_ring *r;
+	unsigned size;        /* input value, the burst size */
+	double spsc, mpmc;    /* output value, the single or multi timings */
+};
+
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sp_end = rte_rdtsc();
+
+	const uint64_t mp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mp_end = rte_rdtsc();
+
+	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
+	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
+ * thread running enqueue_bulk function
+ */
+static int
+dequeue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mc_end = rte_rdtsc();
+
+	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
+	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
+ * used to measure ring perf between hyperthreads, cores and sockets.
+ */
+static void
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
+		lcore_function_t f1, lcore_function_t f2)
+{
+	struct thread_params param1 = {0}, param2 = {0};
+	unsigned i;
+	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
+		lcore_count = 0;
+		param1.size = param2.size = bulk_sizes[i];
+		param1.r = param2.r = r;
+		if (cores->c1 == rte_get_master_lcore()) {
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			f1(&param1);
+			rte_eal_wait_lcore(cores->c2);
+		} else {
+			rte_eal_remote_launch(f1, &param1, cores->c1);
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			rte_eal_wait_lcore(cores->c1);
+			rte_eal_wait_lcore(cores->c2);
+		}
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.spsc + param2.spsc);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.mpmc + param2.mpmc);
+	}
+}
+
+/*
+ * Test function that determines how long an enqueue + dequeue of a single item
+ * takes on a single lcore. Result is for comparison with the bulk enq+deq.
+ */
+static void
+test_single_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 24;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[2];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_sp_enqueue_elem(r, burst, 8);
+		rte_ring_sc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_mp_enqueue_elem(r, burst, 8);
+		rte_ring_mc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
+			(sc_end-sc_start) >> iter_shift);
+	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
+			(mc_end-mc_start) >> iter_shift);
+}
+
+/*
+ * Test that does both enqueue and dequeue on a core using the burst() API calls
+ * instead of the bulk() calls used in other tests. Results should be the same
+ * as for the bulk function called on a single lcore.
+ */
+static void
+test_burst_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) /
+					bulk_sizes[sz];
+		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) /
+					bulk_sizes[sz];
+
+		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+/* Times enqueue and dequeue on a single lcore */
+static void
+test_bulk_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		double sc_avg = ((double)(sc_end-sc_start) /
+				(iterations * bulk_sizes[sz]));
+		double mc_avg = ((double)(mc_end-mc_start) /
+				(iterations * bulk_sizes[sz]));
+
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+static int
+test_ring_perf_elem(void)
+{
+	struct lcore_pair cores;
+	struct rte_ring *r = NULL;
+
+	r = rte_ring_create_elem(RING_NAME, RING_SIZE, 8, rte_socket_id(), 0);
+	if (r == NULL)
+		return -1;
+
+	printf("### Testing single element and burst enq/deq ###\n");
+	test_single_enqueue_dequeue(r);
+	test_burst_enqueue_dequeue(r);
+
+	printf("\n### Testing empty dequeue ###\n");
+	test_empty_dequeue(r);
+
+	printf("\n### Testing using a single lcore ###\n");
+	test_bulk_enqueue_dequeue(r);
+
+	if (get_two_hyperthreads(&cores) == 0) {
+		printf("\n### Testing using two hyperthreads ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_cores(&cores) == 0) {
+		printf("\n### Testing using two physical cores ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_sockets(&cores) == 0) {
+		printf("\n### Testing using two NUMA nodes ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	rte_ring_free(r);
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(ring_perf_elem_autotest, test_ring_perf_elem);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (8 preceding siblings ...)
  2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
@ 2019-10-09  2:47   ` Honnappa Nagarahalli
  2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
  2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                     ` (5 subsequent siblings)
  15 siblings, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  2:47 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation. The patch adds same performance tests that are run
for existing APIs. This allows for performance comparison.

I also tested with memcpy. x86 shows significant improvements on bulk
and burst tests. On the Arm platform, I used, there is a drop of
4% to 6% in few tests. May be this is something that we can explore
later.

Note that this version skips changes to other libraries as I would
like to get an agreement on the implementation from the community.
They will be added once there is agreement on the rte_ring changes.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (2):
  lib/ring: apis to support configurable element size
  test/ring: add test cases for configurable element size ring

 app/test/Makefile                    |   1 +
 app/test/meson.build                 |   1 +
 app/test/test_ring_perf_elem.c       | 419 ++++++++++++
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 9 files changed, 1412 insertions(+), 9 deletions(-)
 create mode 100644 app/test/test_ring_perf_elem.c
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-10-09  2:47     ` Honnappa Nagarahalli
  2019-10-11 19:21       ` Honnappa Nagarahalli
  2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
  1 sibling, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  2:47 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 6 files changed, 991 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..515a967bb 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..74219840a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,3 +6,6 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..6fed3648b 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, unsigned esize)
 {
 	ssize_t sz;
 
+	/* Supported esize values are 4/8/16.
+	 * Others can be added on need basis.
+	 */
+	if ((esize != 4) && (esize != 8) && (esize != 16)) {
+		RTE_LOG(ERR, RING,
+			"Unsupported esize value. Supported values are 4, 8 and 16\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be "
+			"power of 2, and do not exceed the limit %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..860f059ad
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,946 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with flexible element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
+				unsigned esize, int socket_id, unsigned flags);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 8) \
+		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 16) \
+		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
+} while (0)
+
+#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+			ring[idx + 4] = obj[i + 4]; \
+			ring[idx + 5] = obj[i + 5]; \
+			ring[idx + 6] = obj[i + 6]; \
+			ring[idx + 7] = obj[i + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 6: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 5: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 4: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions.
+ */
+#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 8) \
+		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 16) \
+		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
+} while (0)
+
+#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+			obj[i + 4] = ring[idx + 4]; \
+			obj[i + 5] = ring[idx + 5]; \
+			obj[i + 6] = ring[idx + 6]; \
+			obj[i + 7] = ring[idx + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 6: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 5: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 4: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] test/ring: add test cases for configurable element size ring
  2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-10-09  2:47     ` Honnappa Nagarahalli
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  2:47 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Add test cases to test APIs for configurable element size ring.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile              |   1 +
 app/test/meson.build           |   1 +
 app/test/test_ring_perf_elem.c | 419 +++++++++++++++++++++++++++++++++
 3 files changed, 421 insertions(+)
 create mode 100644 app/test/test_ring_perf_elem.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..e5cb27b75 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_perf_elem.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..995ee9bc7 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_reorder.c',
 	'test_ring.c',
 	'test_ring_perf.c',
+	'test_ring_perf_elem.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_perf_elem.c b/app/test/test_ring_perf_elem.c
new file mode 100644
index 000000000..fc5b82d71
--- /dev/null
+++ b/app/test/test_ring_perf_elem.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+
+#include <stdio.h>
+#include <inttypes.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+
+#include "test.h"
+
+/*
+ * Ring
+ * ====
+ *
+ * Measures performance of various operations using rdtsc
+ *  * Empty ring dequeue
+ *  * Enqueue/dequeue of bursts in 1 threads
+ *  * Enqueue/dequeue of bursts in 2 threads
+ */
+
+#define RING_NAME "RING_PERF"
+#define RING_SIZE 4096
+#define MAX_BURST 64
+
+/*
+ * the sizes to enqueue and dequeue in testing
+ * (marked volatile so they won't be seen as compile-time constants)
+ */
+static const volatile unsigned bulk_sizes[] = { 8, 32 };
+
+struct lcore_pair {
+	unsigned c1, c2;
+};
+
+static volatile unsigned lcore_count;
+
+/**** Functions to analyse our core mask to get cores for different tests ***/
+
+static int
+get_two_hyperthreads(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		/* inner loop just re-reads all id's. We could skip the
+		 * first few elements, but since number of cores is small
+		 * there is little point
+		 */
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 == c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_cores(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 != c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_sockets(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if (s1 != s2) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
+static void
+test_empty_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 26;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[MAX_BURST];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_sc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_mc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SC empty dequeue: %.2F\n",
+			(double)(sc_end-sc_start) / iterations);
+	printf("MC empty dequeue: %.2F\n",
+			(double)(mc_end-mc_start) / iterations);
+}
+
+/*
+ * for the separate enqueue and dequeue threads they take in one param
+ * and return two. Input = burst size, output = cycle average for sp/sc & mp/mc
+ */
+struct thread_params {
+	struct rte_ring *r;
+	unsigned size;        /* input value, the burst size */
+	double spsc, mpmc;    /* output value, the single or multi timings */
+};
+
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sp_end = rte_rdtsc();
+
+	const uint64_t mp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mp_end = rte_rdtsc();
+
+	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
+	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
+ * thread running enqueue_bulk function
+ */
+static int
+dequeue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mc_end = rte_rdtsc();
+
+	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
+	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
+ * used to measure ring perf between hyperthreads, cores and sockets.
+ */
+static void
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
+		lcore_function_t f1, lcore_function_t f2)
+{
+	struct thread_params param1 = {0}, param2 = {0};
+	unsigned i;
+	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
+		lcore_count = 0;
+		param1.size = param2.size = bulk_sizes[i];
+		param1.r = param2.r = r;
+		if (cores->c1 == rte_get_master_lcore()) {
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			f1(&param1);
+			rte_eal_wait_lcore(cores->c2);
+		} else {
+			rte_eal_remote_launch(f1, &param1, cores->c1);
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			rte_eal_wait_lcore(cores->c1);
+			rte_eal_wait_lcore(cores->c2);
+		}
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.spsc + param2.spsc);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.mpmc + param2.mpmc);
+	}
+}
+
+/*
+ * Test function that determines how long an enqueue + dequeue of a single item
+ * takes on a single lcore. Result is for comparison with the bulk enq+deq.
+ */
+static void
+test_single_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 24;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[2];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_sp_enqueue_elem(r, burst, 8);
+		rte_ring_sc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_mp_enqueue_elem(r, burst, 8);
+		rte_ring_mc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
+			(sc_end-sc_start) >> iter_shift);
+	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
+			(mc_end-mc_start) >> iter_shift);
+}
+
+/*
+ * Test that does both enqueue and dequeue on a core using the burst() API calls
+ * instead of the bulk() calls used in other tests. Results should be the same
+ * as for the bulk function called on a single lcore.
+ */
+static void
+test_burst_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) /
+					bulk_sizes[sz];
+		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) /
+					bulk_sizes[sz];
+
+		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+/* Times enqueue and dequeue on a single lcore */
+static void
+test_bulk_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		double sc_avg = ((double)(sc_end-sc_start) /
+				(iterations * bulk_sizes[sz]));
+		double mc_avg = ((double)(mc_end-mc_start) /
+				(iterations * bulk_sizes[sz]));
+
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+static int
+test_ring_perf_elem(void)
+{
+	struct lcore_pair cores;
+	struct rte_ring *r = NULL;
+
+	r = rte_ring_create_elem(RING_NAME, RING_SIZE, 8, rte_socket_id(), 0);
+	if (r == NULL)
+		return -1;
+
+	printf("### Testing single element and burst enq/deq ###\n");
+	test_single_enqueue_dequeue(r);
+	test_burst_enqueue_dequeue(r);
+
+	printf("\n### Testing empty dequeue ###\n");
+	test_empty_dequeue(r);
+
+	printf("\n### Testing using a single lcore ###\n");
+	test_bulk_enqueue_dequeue(r);
+
+	if (get_two_hyperthreads(&cores) == 0) {
+		printf("\n### Testing using two hyperthreads ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_cores(&cores) == 0) {
+		printf("\n### Testing using two physical cores ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_sockets(&cores) == 0) {
+		printf("\n### Testing using two NUMA nodes ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	rte_ring_free(r);
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(ring_perf_elem_autotest, test_ring_perf_elem);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-10-11 19:21       ` Honnappa Nagarahalli
  2019-10-14 19:41         ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-11 19:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj,
	bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, Honnappa Nagarahalli, nd, nd

Hi Bruce, Konstantin, Stephen,
	Appreciate if you could provide feedback on this.

Thanks,
Honnappa

> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Sent: Tuesday, October 8, 2019 9:47 PM
> To: olivier.matz@6wind.com; sthemmin@microsoft.com; jerinj@marvell.com;
> bruce.richardson@intel.com; david.marchand@redhat.com;
> pbhagavatula@marvell.com; konstantin.ananyev@intel.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Ruifeng
> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm
> Technology China) <Gavin.Hu@arm.com>
> Subject: [PATCH v4 1/2] lib/ring: apis to support configurable element size
> 
> Current APIs assume ring elements to be pointers. However, in many use cases,
> the size can be different. Add new APIs to support configurable ring element
> sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/Makefile             |   3 +-
>  lib/librte_ring/meson.build          |   3 +
>  lib/librte_ring/rte_ring.c           |  45 +-
>  lib/librte_ring/rte_ring.h           |   1 +
>  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |   2 +
>  6 files changed, 991 insertions(+), 9 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_elem.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 21a36770d..515a967bb 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB =
> librte_ring.a
> 
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -
> DALLOW_EXPERIMENTAL_API
>  LDLIBS += -lrte_eal
> 
>  EXPORT_MAP := rte_ring_version.map
> @@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> 
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> +					rte_ring_elem.h \
>  					rte_ring_generic.h \
>  					rte_ring_c11_mem.h
> 
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> ab8b0b469..74219840a 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -6,3 +6,6 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
>  		'rte_ring_c11_mem.h',
>  		'rte_ring_generic.h')
> +
> +# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> +allow_experimental_apis = true
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> d9b308036..6fed3648b 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -33,6 +33,7 @@
>  #include <rte_tailq.h>
> 
>  #include "rte_ring.h"
> +#include "rte_ring_elem.h"
> 
>  TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
> 
> @@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> 
>  /* return the size of memory occupied by a ring */  ssize_t -
> rte_ring_get_memsize(unsigned count)
> +rte_ring_get_memsize_elem(unsigned count, unsigned esize)
>  {
>  	ssize_t sz;
> 
> +	/* Supported esize values are 4/8/16.
> +	 * Others can be added on need basis.
> +	 */
> +	if ((esize != 4) && (esize != 8) && (esize != 16)) {
> +		RTE_LOG(ERR, RING,
> +			"Unsupported esize value. Supported values are 4, 8
> and 16\n");
> +
> +		return -EINVAL;
> +	}
> +
>  	/* count must be a power of 2 */
>  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
>  		RTE_LOG(ERR, RING,
> -			"Requested size is invalid, must be power of 2, and "
> -			"do not exceed the size limit %u\n",
> RTE_RING_SZ_MASK);
> +			"Requested number of elements is invalid, must be "
> +			"power of 2, and do not exceed the limit %u\n",
> +			RTE_RING_SZ_MASK);
> +
>  		return -EINVAL;
>  	}
> 
> -	sz = sizeof(struct rte_ring) + count * sizeof(void *);
> +	sz = sizeof(struct rte_ring) + count * esize;
>  	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
>  	return sz;
>  }
> 
> +/* return the size of memory occupied by a ring */ ssize_t
> +rte_ring_get_memsize(unsigned count) {
> +	return rte_ring_get_memsize_elem(count, sizeof(void *)); }
> +
>  void
>  rte_ring_reset(struct rte_ring *r)
>  {
> @@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	return 0;
>  }
> 
> -/* create the ring */
> +/* create the ring for a given element size */
>  struct rte_ring *
> -rte_ring_create(const char *name, unsigned count, int socket_id,
> -		unsigned flags)
> +rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
> +		int socket_id, unsigned flags)
>  {
>  	char mz_name[RTE_MEMZONE_NAMESIZE];
>  	struct rte_ring *r;
> @@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned count,
> int socket_id,
>  	if (flags & RING_F_EXACT_SZ)
>  		count = rte_align32pow2(count + 1);
> 
> -	ring_size = rte_ring_get_memsize(count);
> +	ring_size = rte_ring_get_memsize_elem(count, esize);
>  	if (ring_size < 0) {
>  		rte_errno = ring_size;
>  		return NULL;
> @@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned count,
> int socket_id,
>  	return r;
>  }
> 
> +/* create the ring */
> +struct rte_ring *
> +rte_ring_create(const char *name, unsigned count, int socket_id,
> +		unsigned flags)
> +{
> +	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
> +		flags);
> +}
> +
>  /* free the ring */
>  void
>  rte_ring_free(struct rte_ring *r)
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 2a9f768a1..18fc5d845 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>   */
>  struct rte_ring *rte_ring_create(const char *name, unsigned count,
>  				 int socket_id, unsigned flags);
> +
>  /**
>   * De-allocate all memory used by the ring.
>   *
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> new file mode 100644 index 000000000..860f059ad
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -0,0 +1,946 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2019 Arm Limited
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_ELEM_H_
> +#define _RTE_RING_ELEM_H_
> +
> +/**
> + * @file
> + * RTE Ring with flexible element size
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <sys/queue.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +
> +#include "rte_ring.h"
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Calculate the memory size needed for a ring with given element size
> + *
> + * This function returns the number of bytes needed for a ring, given
> + * the number of elements in it and the size of the element. This value
> + * is the sum of the size of the structure rte_ring and the size of the
> + * memory needed for storing the elements. The value is aligned to a
> +cache
> + * line size.
> + *
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported.
> + * @return
> + *   - The memory size needed for the ring on success.
> + *   - -EINVAL if count is not a power of 2.
> + */
> +__rte_experimental
> +ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a new ring named *name* that stores elements with given size.
> + *
> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_ring_init() to initialize an empty ring.
> + *
> + * The new ring size is set to *count*, which must be a power of
> + * two. Water marking is disabled by default. The real usable ring size
> + * is *count-1* instead of *count* to differentiate a free ring from an
> + * empty ring.
> + *
> + * The ring is added in RTE_TAILQ_RING list.
> + *
> + * @param name
> + *   The name of the ring.
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported.
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of
> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + *   constraint for the reserved zone.
> + * @param flags
> + *   An OR of the following:
> + *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
> + *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
> + *      is "single-producer". Otherwise, it is "multi-producers".
> + *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
> + *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
> + *      is "single-consumer". Otherwise, it is "multi-consumers".
> + * @return
> + *   On success, the pointer to the new allocated ring. NULL on error with
> + *    rte_errno set appropriately. Possible errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> structure
> + *    - E_RTE_SECONDARY - function was called from a secondary process
> instance
> + *    - EINVAL - count provided is not a power of 2
> + *    - ENOSPC - the maximum number of memzones has already been
> allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create
> memzone
> + */
> +__rte_experimental
> +struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
> +				unsigned esize, int socket_id, unsigned flags);
> +
> +/* the actual enqueue of pointers on the ring.
> + * Placed here since identical code needed in both
> + * single and multi producer enqueue functions.
> + */
> +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n)
> do { \
> +	if (esize == 4) \
> +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> +	else if (esize == 8) \
> +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> +	else if (esize == 16) \
> +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \ }
> while
> +(0)
> +
> +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
> +	unsigned int i; \
> +	const uint32_t size = (r)->size; \
> +	uint32_t idx = prod_head & (r)->mask; \
> +	uint32_t *ring = (uint32_t *)ring_start; \
> +	uint32_t *obj = (uint32_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> +			ring[idx] = obj[i]; \
> +			ring[idx + 1] = obj[i + 1]; \
> +			ring[idx + 2] = obj[i + 2]; \
> +			ring[idx + 3] = obj[i + 3]; \
> +			ring[idx + 4] = obj[i + 4]; \
> +			ring[idx + 5] = obj[i + 5]; \
> +			ring[idx + 6] = obj[i + 6]; \
> +			ring[idx + 7] = obj[i + 7]; \
> +		} \
> +		switch (n & 0x7) { \
> +		case 7: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 6: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 5: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 4: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 3: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 2: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 1: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++)\
> +			ring[idx] = obj[i]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			ring[idx] = obj[i]; \
> +	} \
> +} while (0)
> +
> +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
> +	unsigned int i; \
> +	const uint32_t size = (r)->size; \
> +	uint32_t idx = prod_head & (r)->mask; \
> +	uint64_t *ring = (uint64_t *)ring_start; \
> +	uint64_t *obj = (uint64_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
> +			ring[idx] = obj[i]; \
> +			ring[idx + 1] = obj[i + 1]; \
> +			ring[idx + 2] = obj[i + 2]; \
> +			ring[idx + 3] = obj[i + 3]; \
> +		} \
> +		switch (n & 0x3) { \
> +		case 3: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 2: \
> +			ring[idx++] = obj[i++]; /* fallthrough */ \
> +		case 1: \
> +			ring[idx++] = obj[i++]; \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++)\
> +			ring[idx] = obj[i]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			ring[idx] = obj[i]; \
> +	} \
> +} while (0)
> +
> +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
> +	unsigned int i; \
> +	const uint32_t size = (r)->size; \
> +	uint32_t idx = prod_head & (r)->mask; \
> +	__uint128_t *ring = (__uint128_t *)ring_start; \
> +	__uint128_t *obj = (__uint128_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> +			ring[idx] = obj[i]; \
> +			ring[idx + 1] = obj[i + 1]; \
> +		} \
> +		switch (n & 0x1) { \
> +		case 1: \
> +			ring[idx++] = obj[i++]; \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++)\
> +			ring[idx] = obj[i]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			ring[idx] = obj[i]; \
> +	} \
> +} while (0)
> +
> +/* the actual copy of pointers on the ring to obj_table.
> + * Placed here since identical code needed in both
> + * single and multi consumer dequeue functions.
> + */
> +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n)
> do { \
> +	if (esize == 4) \
> +		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
> +	else if (esize == 8) \
> +		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
> +	else if (esize == 16) \
> +		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \ }
> while
> +(0)
> +
> +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
> +	unsigned int i; \
> +	uint32_t idx = cons_head & (r)->mask; \
> +	const uint32_t size = (r)->size; \
> +	uint32_t *ring = (uint32_t *)ring_start; \
> +	uint32_t *obj = (uint32_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
> +			obj[i] = ring[idx]; \
> +			obj[i + 1] = ring[idx + 1]; \
> +			obj[i + 2] = ring[idx + 2]; \
> +			obj[i + 3] = ring[idx + 3]; \
> +			obj[i + 4] = ring[idx + 4]; \
> +			obj[i + 5] = ring[idx + 5]; \
> +			obj[i + 6] = ring[idx + 6]; \
> +			obj[i + 7] = ring[idx + 7]; \
> +		} \
> +		switch (n & 0x7) { \
> +		case 7: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 6: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 5: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 4: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 3: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 2: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 1: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +	} \
> +} while (0)
> +
> +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
> +	unsigned int i; \
> +	uint32_t idx = cons_head & (r)->mask; \
> +	const uint32_t size = (r)->size; \
> +	uint64_t *ring = (uint64_t *)ring_start; \
> +	uint64_t *obj = (uint64_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
> +			obj[i] = ring[idx]; \
> +			obj[i + 1] = ring[idx + 1]; \
> +			obj[i + 2] = ring[idx + 2]; \
> +			obj[i + 3] = ring[idx + 3]; \
> +		} \
> +		switch (n & 0x3) { \
> +		case 3: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 2: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		case 1: \
> +			obj[i++] = ring[idx++]; \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +	} \
> +} while (0)
> +
> +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
> +	unsigned int i; \
> +	uint32_t idx = cons_head & (r)->mask; \
> +	const uint32_t size = (r)->size; \
> +	__uint128_t *ring = (__uint128_t *)ring_start; \
> +	__uint128_t *obj = (__uint128_t *)obj_table; \
> +	if (likely(idx + n < size)) { \
> +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> +			obj[i] = ring[idx]; \
> +			obj[i + 1] = ring[idx + 1]; \
> +		} \
> +		switch (n & 0x1) { \
> +		case 1: \
> +			obj[i++] = ring[idx++]; /* fallthrough */ \
> +		} \
> +	} else { \
> +		for (i = 0; idx < size; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +		for (idx = 0; i < n; i++, idx++) \
> +			obj[i] = ring[idx]; \
> +	} \
> +} while (0)
> +
> +/* Between load and load. there might be cpu reorder in weak model
> + * (powerpc/arm).
> + * There are 2 choices for the users
> + * 1.use rmb() memory barrier
> + * 2.use one-direction load_acquire/store_release barrier,defined by
> + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> + * It depends on performance test results.
> + * By default, move common functions to rte_ring_generic.h  */ #ifdef
> +RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> +#else
> +#include "rte_ring_generic.h"
> +#endif
> +
> +/**
> + * @internal Enqueue several objects on the ring
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param is_sp
> + *   Indicates whether to use single producer or multi-producer head update
> + * @param free_space
> + *   returns the amount of space after the enqueue operation has finished
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n,
> +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> +		unsigned int *free_space)
> +{
> +	uint32_t prod_head, prod_next;
> +	uint32_t free_entries;
> +
> +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> +			&prod_head, &prod_next, &free_entries);
> +	if (n == 0)
> +		goto end;
> +
> +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
> +
> +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> +end:
> +	if (free_space != NULL)
> +		*free_space = free_entries - n;
> +	return n;
> +}
> +
> +/**
> + * @internal Dequeue several objects from the ring
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to pull from the ring.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param is_sc
> + *   Indicates whether to use single consumer or multi-consumer head update
> + * @param available
> + *   returns the number of remaining ring entries after the dequeue has
> finished
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n,
> +		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
> +		unsigned int *available)
> +{
> +	uint32_t cons_head, cons_next;
> +	uint32_t entries;
> +
> +	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> +			&cons_head, &cons_next, &entries);
> +	if (n == 0)
> +		goto end;
> +
> +	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
> +
> +	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> +
> +end:
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
> +/**
> + * Enqueue several objects on the ring (multi-producers safe).
> + *
> + * This function uses a "compare and set" instruction to move the
> + * producer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, __IS_MP, free_space); }
> +
> +/**
> + * Enqueue several objects on a ring (NOT multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, __IS_SP, free_space); }
> +
> +/**
> + * Enqueue several objects on a ring.
> + *
> + * This function calls the multi-producer or the single-producer
> + * version depending on the default behavior that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, r->prod.single, free_space); }
> +
> +/**
> + * Enqueue one object on a ring (multi-producers safe).
> + *
> + * This function uses a "compare and set" instruction to move the
> + * producer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> + */
> +static __rte_always_inline int
> +rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int
> +esize) {
> +	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
> +								-ENOBUFS;
> +}
> +
> +/**
> + * Enqueue one object on a ring (NOT multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> + */
> +static __rte_always_inline int
> +rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int
> +esize) {
> +	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
> +								-ENOBUFS;
> +}
> +
> +/**
> + * Enqueue one object on a ring.
> + *
> + * This function calls the multi-producer or the single-producer
> + * version, depending on the default behaviour that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj
> + *   A pointer to the object to be added.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success; objects enqueued.
> + *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is
> enqueued.
> + */
> +static __rte_always_inline int
> +rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int
> +esize) {
> +	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
> +								-ENOBUFS;
> +}
> +
> +/**
> + * Dequeue several objects from a ring (multi-consumers safe).
> + *
> + * This function uses a "compare and set" instruction to move the
> + * consumer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +				RTE_RING_QUEUE_FIXED, __IS_MC,
> available); }
> +
> +/**
> + * Dequeue several objects from a ring (NOT multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table,
> + *   must be strictly positive.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, __IS_SC, available); }
> +
> +/**
> + * Dequeue several objects from a ring.
> + *
> + * This function calls the multi-consumers or the single-consumer
> + * version, depending on the default behaviour that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, r->cons.single, available); }
> +
> +/**
> + * Dequeue one object from a ring (multi-consumers safe).
> + *
> + * This function uses a "compare and set" instruction to move the
> + * consumer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success; objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
> +				unsigned int esize)
> +{
> +	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
> +								-ENOENT;
> +}
> +
> +/**
> + * Dequeue one object from a ring (NOT multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success; objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
> +				unsigned int esize)
> +{
> +	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
> +								-ENOENT;
> +}
> +
> +/**
> + * Dequeue one object from a ring.
> + *
> + * This function calls the multi-consumers or the single-consumer
> + * version depending on the default behaviour that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @return
> + *   - 0: Success, objects dequeued.
> + *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
> + *     dequeued.
> + */
> +static __rte_always_inline int
> +rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int
> +esize) {
> +	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
> +								-ENOENT;
> +}
> +
> +/**
> + * Enqueue several objects on the ring (multi-producers safe).
> + *
> + * This function uses a "compare and set" instruction to move the
> + * producer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned
> +rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space); }
> +
> +/**
> + * Enqueue several objects on a ring (NOT multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned
> +rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space); }
> +
> +/**
> + * Enqueue several objects on a ring.
> + *
> + * This function calls the multi-producer or the single-producer
> + * version depending on the default behavior that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +static __rte_always_inline unsigned
> +rte_ring_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> free_space); }
> +
> +/**
> + * Dequeue several objects from a ring (multi-consumers safe). When the
> +request
> + * objects are more than the available objects, only dequeue the actual
> +number
> + * of objects
> + *
> + * This function uses a "compare and set" instruction to move the
> + * consumer index atomically.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +static __rte_always_inline unsigned
> +rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_MC, available); }
> +
> +/**
> + * Dequeue several objects from a ring (NOT multi-consumers safe).When
> +the
> + * request objects are more than the available objects, only dequeue
> +the
> + * actual number of objects
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +static __rte_always_inline unsigned
> +rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, __IS_SC, available); }
> +
> +/**
> + * Dequeue multiple objects from a ring up to a maximum number.
> + *
> + * This function calls the multi-consumers or the single-consumer
> + * version, depending on the default behaviour that was specified at
> + * ring creation time (see flags).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - Number of objects dequeued
> + */
> +static __rte_always_inline unsigned
> +rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> +				RTE_RING_QUEUE_VARIABLE,
> +				r->cons.single, available);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_ELEM_H_ */
> diff --git a/lib/librte_ring/rte_ring_version.map
> b/lib/librte_ring/rte_ring_version.map
> index 510c1386e..e410a7503 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -21,6 +21,8 @@ DPDK_2.2 {
>  EXPERIMENTAL {
>  	global:
> 
> +	rte_ring_create_elem;
> +	rte_ring_get_memsize_elem;
>  	rte_ring_reset;
> 
>  };
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-11 19:21       ` Honnappa Nagarahalli
@ 2019-10-14 19:41         ` Ananyev, Konstantin
  2019-10-14 23:56           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-14 19:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, nd


> >
> > Current APIs assume ring elements to be pointers. However, in many use cases,
> > the size can be different. Add new APIs to support configurable ring element
> > sizes.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/Makefile             |   3 +-
> >  lib/librte_ring/meson.build          |   3 +
> >  lib/librte_ring/rte_ring.c           |  45 +-
> >  lib/librte_ring/rte_ring.h           |   1 +
> >  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
> >  lib/librte_ring/rte_ring_version.map |   2 +
> >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode 100644
> > lib/librte_ring/rte_ring_elem.h
> >
> > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > 21a36770d..515a967bb 100644
> > --- a/lib/librte_ring/Makefile
> > +++ b/lib/librte_ring/Makefile
> > @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB =
> > librte_ring.a
> >
> > -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -
> > DALLOW_EXPERIMENTAL_API
> >  LDLIBS += -lrte_eal
> >
> >  EXPORT_MAP := rte_ring_version.map
> > @@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> >
> >  # install includes
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> > +					rte_ring_elem.h \
> >  					rte_ring_generic.h \
> >  					rte_ring_c11_mem.h
> >
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> > ab8b0b469..74219840a 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -6,3 +6,6 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
> >  		'rte_ring_c11_mem.h',
> >  		'rte_ring_generic.h')
> > +
> > +# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> > +allow_experimental_apis = true
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> > d9b308036..6fed3648b 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -33,6 +33,7 @@
> >  #include <rte_tailq.h>
> >
> >  #include "rte_ring.h"
> > +#include "rte_ring_elem.h"
> >
> >  TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
> >
> > @@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> >
> >  /* return the size of memory occupied by a ring */  ssize_t -
> > rte_ring_get_memsize(unsigned count)
> > +rte_ring_get_memsize_elem(unsigned count, unsigned esize)
> >  {
> >  	ssize_t sz;
> >
> > +	/* Supported esize values are 4/8/16.
> > +	 * Others can be added on need basis.
> > +	 */
> > +	if ((esize != 4) && (esize != 8) && (esize != 16)) {
> > +		RTE_LOG(ERR, RING,
> > +			"Unsupported esize value. Supported values are 4, 8
> > and 16\n");
> > +
> > +		return -EINVAL;
> > +	}
> > +
> >  	/* count must be a power of 2 */
> >  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
> >  		RTE_LOG(ERR, RING,
> > -			"Requested size is invalid, must be power of 2, and "
> > -			"do not exceed the size limit %u\n",
> > RTE_RING_SZ_MASK);
> > +			"Requested number of elements is invalid, must be "
> > +			"power of 2, and do not exceed the limit %u\n",
> > +			RTE_RING_SZ_MASK);
> > +
> >  		return -EINVAL;
> >  	}
> >
> > -	sz = sizeof(struct rte_ring) + count * sizeof(void *);
> > +	sz = sizeof(struct rte_ring) + count * esize;
> >  	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> >  	return sz;
> >  }
> >
> > +/* return the size of memory occupied by a ring */ ssize_t
> > +rte_ring_get_memsize(unsigned count) {
> > +	return rte_ring_get_memsize_elem(count, sizeof(void *)); }
> > +
> >  void
> >  rte_ring_reset(struct rte_ring *r)
> >  {
> > @@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char *name,
> > unsigned count,
> >  	return 0;
> >  }
> >
> > -/* create the ring */
> > +/* create the ring for a given element size */
> >  struct rte_ring *
> > -rte_ring_create(const char *name, unsigned count, int socket_id,
> > -		unsigned flags)
> > +rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
> > +		int socket_id, unsigned flags)
> >  {
> >  	char mz_name[RTE_MEMZONE_NAMESIZE];
> >  	struct rte_ring *r;
> > @@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned count,
> > int socket_id,
> >  	if (flags & RING_F_EXACT_SZ)
> >  		count = rte_align32pow2(count + 1);
> >
> > -	ring_size = rte_ring_get_memsize(count);
> > +	ring_size = rte_ring_get_memsize_elem(count, esize);
> >  	if (ring_size < 0) {
> >  		rte_errno = ring_size;
> >  		return NULL;
> > @@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned count,
> > int socket_id,
> >  	return r;
> >  }
> >
> > +/* create the ring */
> > +struct rte_ring *
> > +rte_ring_create(const char *name, unsigned count, int socket_id,
> > +		unsigned flags)
> > +{
> > +	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
> > +		flags);
> > +}
> > +
> >  /* free the ring */
> >  void
> >  rte_ring_free(struct rte_ring *r)
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > 2a9f768a1..18fc5d845 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name,
> > unsigned count,
> >   */
> >  struct rte_ring *rte_ring_create(const char *name, unsigned count,
> >  				 int socket_id, unsigned flags);
> > +
> >  /**
> >   * De-allocate all memory used by the ring.
> >   *
> > diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> > new file mode 100644 index 000000000..860f059ad
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -0,0 +1,946 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2019 Arm Limited
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_ELEM_H_
> > +#define _RTE_RING_ELEM_H_
> > +
> > +/**
> > + * @file
> > + * RTE Ring with flexible element size
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdio.h>
> > +#include <stdint.h>
> > +#include <sys/queue.h>
> > +#include <errno.h>
> > +#include <rte_common.h>
> > +#include <rte_config.h>
> > +#include <rte_memory.h>
> > +#include <rte_lcore.h>
> > +#include <rte_atomic.h>
> > +#include <rte_branch_prediction.h>
> > +#include <rte_memzone.h>
> > +#include <rte_pause.h>
> > +
> > +#include "rte_ring.h"
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Calculate the memory size needed for a ring with given element size
> > + *
> > + * This function returns the number of bytes needed for a ring, given
> > + * the number of elements in it and the size of the element. This value
> > + * is the sum of the size of the structure rte_ring and the size of the
> > + * memory needed for storing the elements. The value is aligned to a
> > +cache
> > + * line size.
> > + *
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2).
> > + * @param esize
> > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > + *   Currently, sizes 4, 8 and 16 are supported.
> > + * @return
> > + *   - The memory size needed for the ring on success.
> > + *   - -EINVAL if count is not a power of 2.
> > + */
> > +__rte_experimental
> > +ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a new ring named *name* that stores elements with given size.
> > + *
> > + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> > + * calls rte_ring_init() to initialize an empty ring.
> > + *
> > + * The new ring size is set to *count*, which must be a power of
> > + * two. Water marking is disabled by default. The real usable ring size
> > + * is *count-1* instead of *count* to differentiate a free ring from an
> > + * empty ring.
> > + *
> > + * The ring is added in RTE_TAILQ_RING list.
> > + *
> > + * @param name
> > + *   The name of the ring.
> > + * @param count
> > + *   The number of elements in the ring (must be a power of 2).
> > + * @param esize
> > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > + *   Currently, sizes 4, 8 and 16 are supported.
> > + * @param socket_id
> > + *   The *socket_id* argument is the socket identifier in case of
> > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > + *   constraint for the reserved zone.
> > + * @param flags
> > + *   An OR of the following:
> > + *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
> > + *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
> > + *      is "single-producer". Otherwise, it is "multi-producers".
> > + *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
> > + *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
> > + *      is "single-consumer". Otherwise, it is "multi-consumers".
> > + * @return
> > + *   On success, the pointer to the new allocated ring. NULL on error with
> > + *    rte_errno set appropriately. Possible errno values include:
> > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> > structure
> > + *    - E_RTE_SECONDARY - function was called from a secondary process
> > instance
> > + *    - EINVAL - count provided is not a power of 2
> > + *    - ENOSPC - the maximum number of memzones has already been
> > allocated
> > + *    - EEXIST - a memzone with the same name already exists
> > + *    - ENOMEM - no appropriate memory area found in which to create
> > memzone
> > + */
> > +__rte_experimental
> > +struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
> > +				unsigned esize, int socket_id, unsigned flags);
> > +
> > +/* the actual enqueue of pointers on the ring.
> > + * Placed here since identical code needed in both
> > + * single and multi producer enqueue functions.
> > + */
> > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n)
> > do { \
> > +	if (esize == 4) \
> > +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> > +	else if (esize == 8) \
> > +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> > +	else if (esize == 16) \
> > +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \ }
> > while
> > +(0)
> > +
> > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	const uint32_t size = (r)->size; \
> > +	uint32_t idx = prod_head & (r)->mask; \
> > +	uint32_t *ring = (uint32_t *)ring_start; \
> > +	uint32_t *obj = (uint32_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > +			ring[idx] = obj[i]; \
> > +			ring[idx + 1] = obj[i + 1]; \
> > +			ring[idx + 2] = obj[i + 2]; \
> > +			ring[idx + 3] = obj[i + 3]; \
> > +			ring[idx + 4] = obj[i + 4]; \
> > +			ring[idx + 5] = obj[i + 5]; \
> > +			ring[idx + 6] = obj[i + 6]; \
> > +			ring[idx + 7] = obj[i + 7]; \
> > +		} \
> > +		switch (n & 0x7) { \
> > +		case 7: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 6: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 5: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 4: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 3: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 2: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 1: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++)\
> > +			ring[idx] = obj[i]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			ring[idx] = obj[i]; \
> > +	} \
> > +} while (0)
> > +
> > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	const uint32_t size = (r)->size; \
> > +	uint32_t idx = prod_head & (r)->mask; \
> > +	uint64_t *ring = (uint64_t *)ring_start; \
> > +	uint64_t *obj = (uint64_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
> > +			ring[idx] = obj[i]; \
> > +			ring[idx + 1] = obj[i + 1]; \
> > +			ring[idx + 2] = obj[i + 2]; \
> > +			ring[idx + 3] = obj[i + 3]; \
> > +		} \
> > +		switch (n & 0x3) { \
> > +		case 3: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 2: \
> > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > +		case 1: \
> > +			ring[idx++] = obj[i++]; \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++)\
> > +			ring[idx] = obj[i]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			ring[idx] = obj[i]; \
> > +	} \
> > +} while (0)
> > +
> > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	const uint32_t size = (r)->size; \
> > +	uint32_t idx = prod_head & (r)->mask; \
> > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > +			ring[idx] = obj[i]; \
> > +			ring[idx + 1] = obj[i + 1]; \
> > +		} \
> > +		switch (n & 0x1) { \
> > +		case 1: \
> > +			ring[idx++] = obj[i++]; \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++)\
> > +			ring[idx] = obj[i]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			ring[idx] = obj[i]; \
> > +	} \
> > +} while (0)
> > +
> > +/* the actual copy of pointers on the ring to obj_table.
> > + * Placed here since identical code needed in both
> > + * single and multi consumer dequeue functions.
> > + */
> > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n)
> > do { \
> > +	if (esize == 4) \
> > +		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
> > +	else if (esize == 8) \
> > +		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
> > +	else if (esize == 16) \
> > +		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \ }
> > while
> > +(0)
> > +
> > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	uint32_t idx = cons_head & (r)->mask; \
> > +	const uint32_t size = (r)->size; \
> > +	uint32_t *ring = (uint32_t *)ring_start; \
> > +	uint32_t *obj = (uint32_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
> > +			obj[i] = ring[idx]; \
> > +			obj[i + 1] = ring[idx + 1]; \
> > +			obj[i + 2] = ring[idx + 2]; \
> > +			obj[i + 3] = ring[idx + 3]; \
> > +			obj[i + 4] = ring[idx + 4]; \
> > +			obj[i + 5] = ring[idx + 5]; \
> > +			obj[i + 6] = ring[idx + 6]; \
> > +			obj[i + 7] = ring[idx + 7]; \
> > +		} \
> > +		switch (n & 0x7) { \
> > +		case 7: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 6: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 5: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 4: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 3: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 2: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 1: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +	} \
> > +} while (0)
> > +
> > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	uint32_t idx = cons_head & (r)->mask; \
> > +	const uint32_t size = (r)->size; \
> > +	uint64_t *ring = (uint64_t *)ring_start; \
> > +	uint64_t *obj = (uint64_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
> > +			obj[i] = ring[idx]; \
> > +			obj[i + 1] = ring[idx + 1]; \
> > +			obj[i + 2] = ring[idx + 2]; \
> > +			obj[i + 3] = ring[idx + 3]; \
> > +		} \
> > +		switch (n & 0x3) { \
> > +		case 3: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 2: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		case 1: \
> > +			obj[i++] = ring[idx++]; \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +	} \
> > +} while (0)
> > +
> > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
> > +	unsigned int i; \
> > +	uint32_t idx = cons_head & (r)->mask; \
> > +	const uint32_t size = (r)->size; \
> > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > +	if (likely(idx + n < size)) { \
> > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > +			obj[i] = ring[idx]; \
> > +			obj[i + 1] = ring[idx + 1]; \
> > +		} \
> > +		switch (n & 0x1) { \
> > +		case 1: \
> > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > +		} \
> > +	} else { \
> > +		for (i = 0; idx < size; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +		for (idx = 0; i < n; i++, idx++) \
> > +			obj[i] = ring[idx]; \
> > +	} \
> > +} while (0)
> > +
> > +/* Between load and load. there might be cpu reorder in weak model
> > + * (powerpc/arm).
> > + * There are 2 choices for the users
> > + * 1.use rmb() memory barrier
> > + * 2.use one-direction load_acquire/store_release barrier,defined by
> > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > + * It depends on performance test results.
> > + * By default, move common functions to rte_ring_generic.h  */ #ifdef
> > +RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > +#else
> > +#include "rte_ring_generic.h"
> > +#endif
> > +
> > +/**
> > + * @internal Enqueue several objects on the ring
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param esize
> > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> > + *   as passed while creating the ring, otherwise the results are undefined.
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param is_sp
> > + *   Indicates whether to use single producer or multi-producer head update
> > + * @param free_space
> > + *   returns the amount of space after the enqueue operation has finished
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
> > +		unsigned int esize, unsigned int n,
> > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > +		unsigned int *free_space)


I like the idea to add esize as an argument to the public API,
so the compiler can do it's jib optimizing calls with constant esize.
Though I am not very happy with the rest of implementation:
1. It doesn't really provide configurable elem size - only 4/8/16B elems are supported.
2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE macros.

Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always
does 32B copy per iteration.
So wonder can we make a generic function that would do 32B copy per iteration
in a main loop, and copy tail  by 4B chunks?
That would avoid copy duplication and will allow user to have any elem
size (multiple of 4B) he wants.
Something like that (note didn't test it, just a rough idea):

 static inline void
copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num, uint32_t esize)
{
        uint32_t i, sz;

        sz = (num * esize) / sizeof(uint32_t);

        for (i = 0; i < (sz & ~7); i += 8)
                memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));

        switch (sz & 7) {
        case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
        case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
        case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
        case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
        case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
        case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
        case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
        }
}

static inline void
enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
                void *obj_table, uint32_t num, uint32_t esize)
{
        uint32_t idx, n;
        uint32_t *du32;

        const uint32_t size = r->size;

        idx = prod_head & (r)->mask;

        du32 = ring_start + idx * sizeof(uint32_t);

        if (idx + num < size)
                copy_elems(du32, obj_table, num, esize);
        else {
                n = size - idx;
                copy_elems(du32, obj_table, n, esize);
                copy_elems(ring_start, obj_table + n * sizeof(uint32_t),
                        num - n, esize);
        }
}

And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:

enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);

 
> > +{
> > +	uint32_t prod_head, prod_next;
> > +	uint32_t free_entries;
> > +
> > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > +			&prod_head, &prod_next, &free_entries);
> > +	if (n == 0)
> > +		goto end;
> > +
> > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
> > +
> > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > +end:
> > +	if (free_space != NULL)
> > +		*free_space = free_entries - n;
> > +	return n;
> > +}
> > +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-14 19:41         ` Ananyev, Konstantin
@ 2019-10-14 23:56           ` Honnappa Nagarahalli
  2019-10-15  9:34             ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-14 23:56 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, Honnappa Nagarahalli, nd, nd

Hi Konstantin,
	Thank you for the feedback.

<snip>

> 
> > >
> > > Current APIs assume ring elements to be pointers. However, in many
> > > use cases, the size can be different. Add new APIs to support
> > > configurable ring element sizes.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/Makefile             |   3 +-
> > >  lib/librte_ring/meson.build          |   3 +
> > >  lib/librte_ring/rte_ring.c           |  45 +-
> > >  lib/librte_ring/rte_ring.h           |   1 +
> > >  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
> > >  lib/librte_ring/rte_ring_version.map |   2 +
> > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > 100644 lib/librte_ring/rte_ring_elem.h
> > >
> > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > index 21a36770d..515a967bb 100644
> > > --- a/lib/librte_ring/Makefile
> > > +++ b/lib/librte_ring/Makefile
> > > @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > LIB = librte_ring.a
> > >
> > > -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -
> > > DALLOW_EXPERIMENTAL_API
> > >  LDLIBS += -lrte_eal
> > >
> > >  EXPORT_MAP := rte_ring_version.map
> > > @@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> > >
> > >  # install includes
> > >  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> > > +					rte_ring_elem.h \
> > >  					rte_ring_generic.h \
> > >  					rte_ring_c11_mem.h
> > >
> > > diff --git a/lib/librte_ring/meson.build
> > > b/lib/librte_ring/meson.build index ab8b0b469..74219840a 100644
> > > --- a/lib/librte_ring/meson.build
> > > +++ b/lib/librte_ring/meson.build
> > > @@ -6,3 +6,6 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
> > >  		'rte_ring_c11_mem.h',
> > >  		'rte_ring_generic.h')
> > > +
> > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > +experimental allow_experimental_apis = true
> > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > index d9b308036..6fed3648b 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -33,6 +33,7 @@
> > >  #include <rte_tailq.h>
> > >
> > >  #include "rte_ring.h"
> > > +#include "rte_ring_elem.h"
> > >
> > >  TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
> > >
> > > @@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> > >
> > >  /* return the size of memory occupied by a ring */  ssize_t -
> > > rte_ring_get_memsize(unsigned count)
> > > +rte_ring_get_memsize_elem(unsigned count, unsigned esize)
> > >  {
> > >  	ssize_t sz;
> > >
> > > +	/* Supported esize values are 4/8/16.
> > > +	 * Others can be added on need basis.
> > > +	 */
> > > +	if ((esize != 4) && (esize != 8) && (esize != 16)) {
> > > +		RTE_LOG(ERR, RING,
> > > +			"Unsupported esize value. Supported values are 4, 8
> > > and 16\n");
> > > +
> > > +		return -EINVAL;
> > > +	}
> > > +
> > >  	/* count must be a power of 2 */
> > >  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
> > >  		RTE_LOG(ERR, RING,
> > > -			"Requested size is invalid, must be power of 2, and "
> > > -			"do not exceed the size limit %u\n",
> > > RTE_RING_SZ_MASK);
> > > +			"Requested number of elements is invalid, must be "
> > > +			"power of 2, and do not exceed the limit %u\n",
> > > +			RTE_RING_SZ_MASK);
> > > +
> > >  		return -EINVAL;
> > >  	}
> > >
> > > -	sz = sizeof(struct rte_ring) + count * sizeof(void *);
> > > +	sz = sizeof(struct rte_ring) + count * esize;
> > >  	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > >  	return sz;
> > >  }
> > >
> > > +/* return the size of memory occupied by a ring */ ssize_t
> > > +rte_ring_get_memsize(unsigned count) {
> > > +	return rte_ring_get_memsize_elem(count, sizeof(void *)); }
> > > +
> > >  void
> > >  rte_ring_reset(struct rte_ring *r)
> > >  {
> > > @@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char
> > > *name, unsigned count,
> > >  	return 0;
> > >  }
> > >
> > > -/* create the ring */
> > > +/* create the ring for a given element size */
> > >  struct rte_ring *
> > > -rte_ring_create(const char *name, unsigned count, int socket_id,
> > > -		unsigned flags)
> > > +rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
> > > +		int socket_id, unsigned flags)
> > >  {
> > >  	char mz_name[RTE_MEMZONE_NAMESIZE];
> > >  	struct rte_ring *r;
> > > @@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned
> > > count, int socket_id,
> > >  	if (flags & RING_F_EXACT_SZ)
> > >  		count = rte_align32pow2(count + 1);
> > >
> > > -	ring_size = rte_ring_get_memsize(count);
> > > +	ring_size = rte_ring_get_memsize_elem(count, esize);
> > >  	if (ring_size < 0) {
> > >  		rte_errno = ring_size;
> > >  		return NULL;
> > > @@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned
> > > count, int socket_id,
> > >  	return r;
> > >  }
> > >
> > > +/* create the ring */
> > > +struct rte_ring *
> > > +rte_ring_create(const char *name, unsigned count, int socket_id,
> > > +		unsigned flags)
> > > +{
> > > +	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
> > > +		flags);
> > > +}
> > > +
> > >  /* free the ring */
> > >  void
> > >  rte_ring_free(struct rte_ring *r)
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index
> > > 2a9f768a1..18fc5d845 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char
> > > *name, unsigned count,
> > >   */
> > >  struct rte_ring *rte_ring_create(const char *name, unsigned count,
> > >  				 int socket_id, unsigned flags);
> > > +
> > >  /**
> > >   * De-allocate all memory used by the ring.
> > >   *
> > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > 000000000..860f059ad
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > @@ -0,0 +1,946 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + *
> > > + * Copyright (c) 2019 Arm Limited
> > > + * Copyright (c) 2010-2017 Intel Corporation
> > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > + * All rights reserved.
> > > + * Derived from FreeBSD's bufring.h
> > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > + */
> > > +
> > > +#ifndef _RTE_RING_ELEM_H_
> > > +#define _RTE_RING_ELEM_H_
> > > +
> > > +/**
> > > + * @file
> > > + * RTE Ring with flexible element size  */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <stdio.h>
> > > +#include <stdint.h>
> > > +#include <sys/queue.h>
> > > +#include <errno.h>
> > > +#include <rte_common.h>
> > > +#include <rte_config.h>
> > > +#include <rte_memory.h>
> > > +#include <rte_lcore.h>
> > > +#include <rte_atomic.h>
> > > +#include <rte_branch_prediction.h>
> > > +#include <rte_memzone.h>
> > > +#include <rte_pause.h>
> > > +
> > > +#include "rte_ring.h"
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Calculate the memory size needed for a ring with given element
> > > +size
> > > + *
> > > + * This function returns the number of bytes needed for a ring,
> > > +given
> > > + * the number of elements in it and the size of the element. This
> > > +value
> > > + * is the sum of the size of the structure rte_ring and the size of
> > > +the
> > > + * memory needed for storing the elements. The value is aligned to
> > > +a cache
> > > + * line size.
> > > + *
> > > + * @param count
> > > + *   The number of elements in the ring (must be a power of 2).
> > > + * @param esize
> > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > + *   Currently, sizes 4, 8 and 16 are supported.
> > > + * @return
> > > + *   - The memory size needed for the ring on success.
> > > + *   - -EINVAL if count is not a power of 2.
> > > + */
> > > +__rte_experimental
> > > +ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a new ring named *name* that stores elements with given size.
> > > + *
> > > + * This function uses ``memzone_reserve()`` to allocate memory.
> > > +Then it
> > > + * calls rte_ring_init() to initialize an empty ring.
> > > + *
> > > + * The new ring size is set to *count*, which must be a power of
> > > + * two. Water marking is disabled by default. The real usable ring
> > > +size
> > > + * is *count-1* instead of *count* to differentiate a free ring
> > > +from an
> > > + * empty ring.
> > > + *
> > > + * The ring is added in RTE_TAILQ_RING list.
> > > + *
> > > + * @param name
> > > + *   The name of the ring.
> > > + * @param count
> > > + *   The number of elements in the ring (must be a power of 2).
> > > + * @param esize
> > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > + *   Currently, sizes 4, 8 and 16 are supported.
> > > + * @param socket_id
> > > + *   The *socket_id* argument is the socket identifier in case of
> > > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > > + *   constraint for the reserved zone.
> > > + * @param flags
> > > + *   An OR of the following:
> > > + *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
> > > + *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
> > > + *      is "single-producer". Otherwise, it is "multi-producers".
> > > + *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
> > > + *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
> > > + *      is "single-consumer". Otherwise, it is "multi-consumers".
> > > + * @return
> > > + *   On success, the pointer to the new allocated ring. NULL on error with
> > > + *    rte_errno set appropriately. Possible errno values include:
> > > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> > > structure
> > > + *    - E_RTE_SECONDARY - function was called from a secondary process
> > > instance
> > > + *    - EINVAL - count provided is not a power of 2
> > > + *    - ENOSPC - the maximum number of memzones has already been
> > > allocated
> > > + *    - EEXIST - a memzone with the same name already exists
> > > + *    - ENOMEM - no appropriate memory area found in which to create
> > > memzone
> > > + */
> > > +__rte_experimental
> > > +struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
> > > +				unsigned esize, int socket_id, unsigned flags);
> > > +
> > > +/* the actual enqueue of pointers on the ring.
> > > + * Placed here since identical code needed in both
> > > + * single and multi producer enqueue functions.
> > > + */
> > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > +esize, n)
> > > do { \
> > > +	if (esize == 4) \
> > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> > > +	else if (esize == 8) \
> > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> > > +	else if (esize == 16) \
> > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n);
> \ }
> > > while
> > > +(0)
> > > +
> > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
> > > +	unsigned int i; \
> > > +	const uint32_t size = (r)->size; \
> > > +	uint32_t idx = prod_head & (r)->mask; \
> > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > > +			ring[idx] = obj[i]; \
> > > +			ring[idx + 1] = obj[i + 1]; \
> > > +			ring[idx + 2] = obj[i + 2]; \
> > > +			ring[idx + 3] = obj[i + 3]; \
> > > +			ring[idx + 4] = obj[i + 4]; \
> > > +			ring[idx + 5] = obj[i + 5]; \
> > > +			ring[idx + 6] = obj[i + 6]; \
> > > +			ring[idx + 7] = obj[i + 7]; \
> > > +		} \
> > > +		switch (n & 0x7) { \
> > > +		case 7: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 6: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 5: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 4: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 3: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 2: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 1: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++)\
> > > +			ring[idx] = obj[i]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			ring[idx] = obj[i]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
> > > +	unsigned int i; \
> > > +	const uint32_t size = (r)->size; \
> > > +	uint32_t idx = prod_head & (r)->mask; \
> > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
> > > +			ring[idx] = obj[i]; \
> > > +			ring[idx + 1] = obj[i + 1]; \
> > > +			ring[idx + 2] = obj[i + 2]; \
> > > +			ring[idx + 3] = obj[i + 3]; \
> > > +		} \
> > > +		switch (n & 0x3) { \
> > > +		case 3: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 2: \
> > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > +		case 1: \
> > > +			ring[idx++] = obj[i++]; \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++)\
> > > +			ring[idx] = obj[i]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			ring[idx] = obj[i]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do
> { \
> > > +	unsigned int i; \
> > > +	const uint32_t size = (r)->size; \
> > > +	uint32_t idx = prod_head & (r)->mask; \
> > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > +			ring[idx] = obj[i]; \
> > > +			ring[idx + 1] = obj[i + 1]; \
> > > +		} \
> > > +		switch (n & 0x1) { \
> > > +		case 1: \
> > > +			ring[idx++] = obj[i++]; \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++)\
> > > +			ring[idx] = obj[i]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			ring[idx] = obj[i]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +/* the actual copy of pointers on the ring to obj_table.
> > > + * Placed here since identical code needed in both
> > > + * single and multi consumer dequeue functions.
> > > + */
> > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > +esize, n)
> > > do { \
> > > +	if (esize == 4) \
> > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
> > > +	else if (esize == 8) \
> > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
> > > +	else if (esize == 16) \
> > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n);
> \ }
> > > while
> > > +(0)
> > > +
> > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
> > > +	unsigned int i; \
> > > +	uint32_t idx = cons_head & (r)->mask; \
> > > +	const uint32_t size = (r)->size; \
> > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
> > > +			obj[i] = ring[idx]; \
> > > +			obj[i + 1] = ring[idx + 1]; \
> > > +			obj[i + 2] = ring[idx + 2]; \
> > > +			obj[i + 3] = ring[idx + 3]; \
> > > +			obj[i + 4] = ring[idx + 4]; \
> > > +			obj[i + 5] = ring[idx + 5]; \
> > > +			obj[i + 6] = ring[idx + 6]; \
> > > +			obj[i + 7] = ring[idx + 7]; \
> > > +		} \
> > > +		switch (n & 0x7) { \
> > > +		case 7: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 6: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 5: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 4: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 3: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 2: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 1: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
> > > +	unsigned int i; \
> > > +	uint32_t idx = cons_head & (r)->mask; \
> > > +	const uint32_t size = (r)->size; \
> > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
> > > +			obj[i] = ring[idx]; \
> > > +			obj[i + 1] = ring[idx + 1]; \
> > > +			obj[i + 2] = ring[idx + 2]; \
> > > +			obj[i + 3] = ring[idx + 3]; \
> > > +		} \
> > > +		switch (n & 0x3) { \
> > > +		case 3: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 2: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		case 1: \
> > > +			obj[i++] = ring[idx++]; \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do
> { \
> > > +	unsigned int i; \
> > > +	uint32_t idx = cons_head & (r)->mask; \
> > > +	const uint32_t size = (r)->size; \
> > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > +	if (likely(idx + n < size)) { \
> > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > +			obj[i] = ring[idx]; \
> > > +			obj[i + 1] = ring[idx + 1]; \
> > > +		} \
> > > +		switch (n & 0x1) { \
> > > +		case 1: \
> > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > +		} \
> > > +	} else { \
> > > +		for (i = 0; idx < size; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +		for (idx = 0; i < n; i++, idx++) \
> > > +			obj[i] = ring[idx]; \
> > > +	} \
> > > +} while (0)
> > > +
> > > +/* Between load and load. there might be cpu reorder in weak model
> > > + * (powerpc/arm).
> > > + * There are 2 choices for the users
> > > + * 1.use rmb() memory barrier
> > > + * 2.use one-direction load_acquire/store_release barrier,defined
> > > +by
> > > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > + * It depends on performance test results.
> > > + * By default, move common functions to rte_ring_generic.h  */
> > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > +#else
> > > +#include "rte_ring_generic.h"
> > > +#endif
> > > +
> > > +/**
> > > + * @internal Enqueue several objects on the ring
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param esize
> > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> > > + *   as passed while creating the ring, otherwise the results are undefined.
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param behavior
> > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> ring
> > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > + * @param is_sp
> > > + *   Indicates whether to use single producer or multi-producer head
> update
> > > + * @param free_space
> > > + *   returns the amount of space after the enqueue operation has
> finished
> > > + * @return
> > > + *   Actual number of objects enqueued.
> > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
> > > +		unsigned int esize, unsigned int n,
> > > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > +		unsigned int *free_space)
> 
> 
> I like the idea to add esize as an argument to the public API, so the compiler
> can do it's jib optimizing calls with constant esize.
> Though I am not very happy with the rest of implementation:
> 1. It doesn't really provide configurable elem size - only 4/8/16B elems are
> supported.
Agree. I was thinking other sizes can be added on need basis.
However, I am wondering if we should just provide for 4B and then the users can use bulk operations to construct whatever they need? It would mean extra work for the users.

> 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> macros.
> 
> Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always does
> 32B copy per iteration.
Yes, I tried to keep it the same as the existing one (originally, I guess the intention was to allow for 256b vector instructions to be generated)

> So wonder can we make a generic function that would do 32B copy per
> iteration in a main loop, and copy tail  by 4B chunks?
> That would avoid copy duplication and will allow user to have any elem size
> (multiple of 4B) he wants.
> Something like that (note didn't test it, just a rough idea):
> 
>  static inline void
> copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num, uint32_t
> esize) {
>         uint32_t i, sz;
> 
>         sz = (num * esize) / sizeof(uint32_t);
If 'num' is a compile time constant, 'sz' will be a compile time constant. Otherwise, this will result in a multiplication operation. I have tried to avoid the multiplication operation and try to use shift and mask operations (just like how the rest of the ring code does).

> 
>         for (i = 0; i < (sz & ~7); i += 8)
>                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
I had used memcpy to start with (for the entire copy operation), performance is not the same for 64b elements when compared with the existing ring APIs (some cases more and some cases less).

IMO, we have to keep the performance of the 64b and 128b the same as what we get with the existing ring and event-ring APIs. That would allow us to replace them with these new APIs. I suggest that we keep the macros in this patch for 64b and 128b.

For the rest of the sizes, we could put a for loop around 32b macro (this would allow for all sizes as well).

> 
>         switch (sz & 7) {
>         case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
>         case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
>         case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
>         case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
>         case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
>         case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
>         case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
>         }
> }
> 
> static inline void
> enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
>                 void *obj_table, uint32_t num, uint32_t esize) {
>         uint32_t idx, n;
>         uint32_t *du32;
> 
>         const uint32_t size = r->size;
> 
>         idx = prod_head & (r)->mask;
> 
>         du32 = ring_start + idx * sizeof(uint32_t);
> 
>         if (idx + num < size)
>                 copy_elems(du32, obj_table, num, esize);
>         else {
>                 n = size - idx;
>                 copy_elems(du32, obj_table, n, esize);
>                 copy_elems(ring_start, obj_table + n * sizeof(uint32_t),
>                         num - n, esize);
>         }
> }
> 
> And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:
> 
> enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> 
> 
> > > +{
> > > +	uint32_t prod_head, prod_next;
> > > +	uint32_t free_entries;
> > > +
> > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > +			&prod_head, &prod_next, &free_entries);
> > > +	if (n == 0)
> > > +		goto end;
> > > +
> > > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
> > > +
> > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > +end:
> > > +	if (free_space != NULL)
> > > +		*free_space = free_entries - n;
> > > +	return n;
> > > +}
> > > +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-14 23:56           ` Honnappa Nagarahalli
@ 2019-10-15  9:34             ` Ananyev, Konstantin
  2019-10-17  4:46               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-15  9:34 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, nd


Hi Honnappa,
 
> > > >
> > > > Current APIs assume ring elements to be pointers. However, in many
> > > > use cases, the size can be different. Add new APIs to support
> > > > configurable ring element sizes.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  lib/librte_ring/Makefile             |   3 +-
> > > >  lib/librte_ring/meson.build          |   3 +
> > > >  lib/librte_ring/rte_ring.c           |  45 +-
> > > >  lib/librte_ring/rte_ring.h           |   1 +
> > > >  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
> > > >  lib/librte_ring/rte_ring_version.map |   2 +
> > > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > >
> > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > index 21a36770d..515a967bb 100644
> > > > --- a/lib/librte_ring/Makefile
> > > > +++ b/lib/librte_ring/Makefile
> > > > @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > > LIB = librte_ring.a
> > > >
> > > > -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > > > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -
> > > > DALLOW_EXPERIMENTAL_API
> > > >  LDLIBS += -lrte_eal
> > > >
> > > >  EXPORT_MAP := rte_ring_version.map
> > > > @@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> > > >
> > > >  # install includes
> > > >  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> > > > +					rte_ring_elem.h \
> > > >  					rte_ring_generic.h \
> > > >  					rte_ring_c11_mem.h
> > > >
> > > > diff --git a/lib/librte_ring/meson.build
> > > > b/lib/librte_ring/meson.build index ab8b0b469..74219840a 100644
> > > > --- a/lib/librte_ring/meson.build
> > > > +++ b/lib/librte_ring/meson.build
> > > > @@ -6,3 +6,6 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
> > > >  		'rte_ring_c11_mem.h',
> > > >  		'rte_ring_generic.h')
> > > > +
> > > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > +experimental allow_experimental_apis = true
> > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > index d9b308036..6fed3648b 100644
> > > > --- a/lib/librte_ring/rte_ring.c
> > > > +++ b/lib/librte_ring/rte_ring.c
> > > > @@ -33,6 +33,7 @@
> > > >  #include <rte_tailq.h>
> > > >
> > > >  #include "rte_ring.h"
> > > > +#include "rte_ring_elem.h"
> > > >
> > > >  TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
> > > >
> > > > @@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> > > >
> > > >  /* return the size of memory occupied by a ring */  ssize_t -
> > > > rte_ring_get_memsize(unsigned count)
> > > > +rte_ring_get_memsize_elem(unsigned count, unsigned esize)
> > > >  {
> > > >  	ssize_t sz;
> > > >
> > > > +	/* Supported esize values are 4/8/16.
> > > > +	 * Others can be added on need basis.
> > > > +	 */
> > > > +	if ((esize != 4) && (esize != 8) && (esize != 16)) {
> > > > +		RTE_LOG(ERR, RING,
> > > > +			"Unsupported esize value. Supported values are 4, 8
> > > > and 16\n");
> > > > +
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > >  	/* count must be a power of 2 */
> > > >  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
> > > >  		RTE_LOG(ERR, RING,
> > > > -			"Requested size is invalid, must be power of 2, and "
> > > > -			"do not exceed the size limit %u\n",
> > > > RTE_RING_SZ_MASK);
> > > > +			"Requested number of elements is invalid, must be "
> > > > +			"power of 2, and do not exceed the limit %u\n",
> > > > +			RTE_RING_SZ_MASK);
> > > > +
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > -	sz = sizeof(struct rte_ring) + count * sizeof(void *);
> > > > +	sz = sizeof(struct rte_ring) + count * esize;
> > > >  	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > > >  	return sz;
> > > >  }
> > > >
> > > > +/* return the size of memory occupied by a ring */ ssize_t
> > > > +rte_ring_get_memsize(unsigned count) {
> > > > +	return rte_ring_get_memsize_elem(count, sizeof(void *)); }
> > > > +
> > > >  void
> > > >  rte_ring_reset(struct rte_ring *r)
> > > >  {
> > > > @@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char
> > > > *name, unsigned count,
> > > >  	return 0;
> > > >  }
> > > >
> > > > -/* create the ring */
> > > > +/* create the ring for a given element size */
> > > >  struct rte_ring *
> > > > -rte_ring_create(const char *name, unsigned count, int socket_id,
> > > > -		unsigned flags)
> > > > +rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
> > > > +		int socket_id, unsigned flags)
> > > >  {
> > > >  	char mz_name[RTE_MEMZONE_NAMESIZE];
> > > >  	struct rte_ring *r;
> > > > @@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned
> > > > count, int socket_id,
> > > >  	if (flags & RING_F_EXACT_SZ)
> > > >  		count = rte_align32pow2(count + 1);
> > > >
> > > > -	ring_size = rte_ring_get_memsize(count);
> > > > +	ring_size = rte_ring_get_memsize_elem(count, esize);
> > > >  	if (ring_size < 0) {
> > > >  		rte_errno = ring_size;
> > > >  		return NULL;
> > > > @@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned
> > > > count, int socket_id,
> > > >  	return r;
> > > >  }
> > > >
> > > > +/* create the ring */
> > > > +struct rte_ring *
> > > > +rte_ring_create(const char *name, unsigned count, int socket_id,
> > > > +		unsigned flags)
> > > > +{
> > > > +	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
> > > > +		flags);
> > > > +}
> > > > +
> > > >  /* free the ring */
> > > >  void
> > > >  rte_ring_free(struct rte_ring *r)
> > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > index
> > > > 2a9f768a1..18fc5d845 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char
> > > > *name, unsigned count,
> > > >   */
> > > >  struct rte_ring *rte_ring_create(const char *name, unsigned count,
> > > >  				 int socket_id, unsigned flags);
> > > > +
> > > >  /**
> > > >   * De-allocate all memory used by the ring.
> > > >   *
> > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > > 000000000..860f059ad
> > > > --- /dev/null
> > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > @@ -0,0 +1,946 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + *
> > > > + * Copyright (c) 2019 Arm Limited
> > > > + * Copyright (c) 2010-2017 Intel Corporation
> > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > + * All rights reserved.
> > > > + * Derived from FreeBSD's bufring.h
> > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > + */
> > > > +
> > > > +#ifndef _RTE_RING_ELEM_H_
> > > > +#define _RTE_RING_ELEM_H_
> > > > +
> > > > +/**
> > > > + * @file
> > > > + * RTE Ring with flexible element size  */
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +#include <stdio.h>
> > > > +#include <stdint.h>
> > > > +#include <sys/queue.h>
> > > > +#include <errno.h>
> > > > +#include <rte_common.h>
> > > > +#include <rte_config.h>
> > > > +#include <rte_memory.h>
> > > > +#include <rte_lcore.h>
> > > > +#include <rte_atomic.h>
> > > > +#include <rte_branch_prediction.h>
> > > > +#include <rte_memzone.h>
> > > > +#include <rte_pause.h>
> > > > +
> > > > +#include "rte_ring.h"
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Calculate the memory size needed for a ring with given element
> > > > +size
> > > > + *
> > > > + * This function returns the number of bytes needed for a ring,
> > > > +given
> > > > + * the number of elements in it and the size of the element. This
> > > > +value
> > > > + * is the sum of the size of the structure rte_ring and the size of
> > > > +the
> > > > + * memory needed for storing the elements. The value is aligned to
> > > > +a cache
> > > > + * line size.
> > > > + *
> > > > + * @param count
> > > > + *   The number of elements in the ring (must be a power of 2).
> > > > + * @param esize
> > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > + *   Currently, sizes 4, 8 and 16 are supported.
> > > > + * @return
> > > > + *   - The memory size needed for the ring on success.
> > > > + *   - -EINVAL if count is not a power of 2.
> > > > + */
> > > > +__rte_experimental
> > > > +ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Create a new ring named *name* that stores elements with given size.
> > > > + *
> > > > + * This function uses ``memzone_reserve()`` to allocate memory.
> > > > +Then it
> > > > + * calls rte_ring_init() to initialize an empty ring.
> > > > + *
> > > > + * The new ring size is set to *count*, which must be a power of
> > > > + * two. Water marking is disabled by default. The real usable ring
> > > > +size
> > > > + * is *count-1* instead of *count* to differentiate a free ring
> > > > +from an
> > > > + * empty ring.
> > > > + *
> > > > + * The ring is added in RTE_TAILQ_RING list.
> > > > + *
> > > > + * @param name
> > > > + *   The name of the ring.
> > > > + * @param count
> > > > + *   The number of elements in the ring (must be a power of 2).
> > > > + * @param esize
> > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > + *   Currently, sizes 4, 8 and 16 are supported.
> > > > + * @param socket_id
> > > > + *   The *socket_id* argument is the socket identifier in case of
> > > > + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> > > > + *   constraint for the reserved zone.
> > > > + * @param flags
> > > > + *   An OR of the following:
> > > > + *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
> > > > + *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
> > > > + *      is "single-producer". Otherwise, it is "multi-producers".
> > > > + *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
> > > > + *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
> > > > + *      is "single-consumer". Otherwise, it is "multi-consumers".
> > > > + * @return
> > > > + *   On success, the pointer to the new allocated ring. NULL on error with
> > > > + *    rte_errno set appropriately. Possible errno values include:
> > > > + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config
> > > > structure
> > > > + *    - E_RTE_SECONDARY - function was called from a secondary process
> > > > instance
> > > > + *    - EINVAL - count provided is not a power of 2
> > > > + *    - ENOSPC - the maximum number of memzones has already been
> > > > allocated
> > > > + *    - EEXIST - a memzone with the same name already exists
> > > > + *    - ENOMEM - no appropriate memory area found in which to create
> > > > memzone
> > > > + */
> > > > +__rte_experimental
> > > > +struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
> > > > +				unsigned esize, int socket_id, unsigned flags);
> > > > +
> > > > +/* the actual enqueue of pointers on the ring.
> > > > + * Placed here since identical code needed in both
> > > > + * single and multi producer enqueue functions.
> > > > + */
> > > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > > +esize, n)
> > > > do { \
> > > > +	if (esize == 4) \
> > > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> > > > +	else if (esize == 8) \
> > > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> > > > +	else if (esize == 16) \
> > > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n);
> > \ }
> > > > while
> > > > +(0)
> > > > +
> > > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
> > > > +	unsigned int i; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > > > +			ring[idx] = obj[i]; \
> > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > +			ring[idx + 4] = obj[i + 4]; \
> > > > +			ring[idx + 5] = obj[i + 5]; \
> > > > +			ring[idx + 6] = obj[i + 6]; \
> > > > +			ring[idx + 7] = obj[i + 7]; \
> > > > +		} \
> > > > +		switch (n & 0x7) { \
> > > > +		case 7: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 6: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 5: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 4: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 3: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 2: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 1: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > +			ring[idx] = obj[i]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			ring[idx] = obj[i]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
> > > > +	unsigned int i; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
> > > > +			ring[idx] = obj[i]; \
> > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > +		} \
> > > > +		switch (n & 0x3) { \
> > > > +		case 3: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 2: \
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > +		case 1: \
> > > > +			ring[idx++] = obj[i++]; \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > +			ring[idx] = obj[i]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			ring[idx] = obj[i]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do
> > { \
> > > > +	unsigned int i; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > +			ring[idx] = obj[i]; \
> > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > +		} \
> > > > +		switch (n & 0x1) { \
> > > > +		case 1: \
> > > > +			ring[idx++] = obj[i++]; \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > +			ring[idx] = obj[i]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			ring[idx] = obj[i]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +/* the actual copy of pointers on the ring to obj_table.
> > > > + * Placed here since identical code needed in both
> > > > + * single and multi consumer dequeue functions.
> > > > + */
> > > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > > +esize, n)
> > > > do { \
> > > > +	if (esize == 4) \
> > > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
> > > > +	else if (esize == 8) \
> > > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
> > > > +	else if (esize == 16) \
> > > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n);
> > \ }
> > > > while
> > > > +(0)
> > > > +
> > > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
> > > > +	unsigned int i; \
> > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
> > > > +			obj[i] = ring[idx]; \
> > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > +			obj[i + 4] = ring[idx + 4]; \
> > > > +			obj[i + 5] = ring[idx + 5]; \
> > > > +			obj[i + 6] = ring[idx + 6]; \
> > > > +			obj[i + 7] = ring[idx + 7]; \
> > > > +		} \
> > > > +		switch (n & 0x7) { \
> > > > +		case 7: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 6: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 5: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 4: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 3: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 2: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 1: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
> > > > +	unsigned int i; \
> > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
> > > > +			obj[i] = ring[idx]; \
> > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > +		} \
> > > > +		switch (n & 0x3) { \
> > > > +		case 3: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 2: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		case 1: \
> > > > +			obj[i++] = ring[idx++]; \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do
> > { \
> > > > +	unsigned int i; \
> > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > +	const uint32_t size = (r)->size; \
> > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > +	if (likely(idx + n < size)) { \
> > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > +			obj[i] = ring[idx]; \
> > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > +		} \
> > > > +		switch (n & 0x1) { \
> > > > +		case 1: \
> > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > +		} \
> > > > +	} else { \
> > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > +			obj[i] = ring[idx]; \
> > > > +	} \
> > > > +} while (0)
> > > > +
> > > > +/* Between load and load. there might be cpu reorder in weak model
> > > > + * (powerpc/arm).
> > > > + * There are 2 choices for the users
> > > > + * 1.use rmb() memory barrier
> > > > + * 2.use one-direction load_acquire/store_release barrier,defined
> > > > +by
> > > > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > + * It depends on performance test results.
> > > > + * By default, move common functions to rte_ring_generic.h  */
> > > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > > +#else
> > > > +#include "rte_ring_generic.h"
> > > > +#endif
> > > > +
> > > > +/**
> > > > + * @internal Enqueue several objects on the ring
> > > > + *
> > > > + * @param r
> > > > + *   A pointer to the ring structure.
> > > > + * @param obj_table
> > > > + *   A pointer to a table of void * pointers (objects).
> > > > + * @param esize
> > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> > > > + *   as passed while creating the ring, otherwise the results are undefined.
> > > > + * @param n
> > > > + *   The number of objects to add in the ring from the obj_table.
> > > > + * @param behavior
> > > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > ring
> > > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > from
> > > > ring
> > > > + * @param is_sp
> > > > + *   Indicates whether to use single producer or multi-producer head
> > update
> > > > + * @param free_space
> > > > + *   returns the amount of space after the enqueue operation has
> > finished
> > > > + * @return
> > > > + *   Actual number of objects enqueued.
> > > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > + */
> > > > +static __rte_always_inline unsigned int
> > > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
> > > > +		unsigned int esize, unsigned int n,
> > > > +		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
> > > > +		unsigned int *free_space)
> >
> >
> > I like the idea to add esize as an argument to the public API, so the compiler
> > can do it's jib optimizing calls with constant esize.
> > Though I am not very happy with the rest of implementation:
> > 1. It doesn't really provide configurable elem size - only 4/8/16B elems are
> > supported.
> Agree. I was thinking other sizes can be added on need basis.
> However, I am wondering if we should just provide for 4B and then the users can use bulk operations to construct whatever they need?

I suppose it could be plan B... if there would be no agreement on generic case.
And for 4B elems, I guess you do have a particular use-case?

> It
> would mean extra work for the users.
> 
> > 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> > macros.
> >
> > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always does
> > 32B copy per iteration.
> Yes, I tried to keep it the same as the existing one (originally, I guess the intention was to allow for 256b vector instructions to be
> generated)
> 
> > So wonder can we make a generic function that would do 32B copy per
> > iteration in a main loop, and copy tail  by 4B chunks?
> > That would avoid copy duplication and will allow user to have any elem size
> > (multiple of 4B) he wants.
> > Something like that (note didn't test it, just a rough idea):
> >
> >  static inline void
> > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num, uint32_t
> > esize) {
> >         uint32_t i, sz;
> >
> >         sz = (num * esize) / sizeof(uint32_t);
> If 'num' is a compile time constant, 'sz' will be a compile time constant. Otherwise, this will result in a multiplication operation. 

Not always.
If esize is compile time constant, then for esize as power of 2 (4,8,16,...), it would be just one shift.
For other constant values it could be a 'mul' or in many cases just 2 shifts plus 'add' (if compiler is smart enough).
I.E. let say for 24B elem is would be either num * 6 or (num << 2) + (num << 1).
I suppose for non-power of 2 elems it might be ok to get such small perf hit.

>I have tried 
> to avoid the multiplication operation and try to use shift and mask operations (just like how the rest of the ring code does).
> 
> >
> >         for (i = 0; i < (sz & ~7); i += 8)
> >                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> I had used memcpy to start with (for the entire copy operation), performance is not the same for 64b elements when compared with the
> existing ring APIs (some cases more and some cases less).

I remember that from one of your previous mails, that's why here I suggest to use in a loop memcpy() with fixed size.
That way for each iteration complier will replace memcpy() with instructions to copy 32B in a way he thinks is optimal
(same as for original macro, I think).

> 
> IMO, we have to keep the performance of the 64b and 128b the same as what we get with the existing ring and event-ring APIs. That would
> allow us to replace them with these new APIs. I suggest that we keep the macros in this patch for 64b and 128b.

I still think we probably can achieve that without duplicating macros, while still supporting arbitrary elem size.
See above.

> For the rest of the sizes, we could put a for loop around 32b macro (this would allow for all sizes as well).
> 
> >
> >         switch (sz & 7) {
> >         case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> >         case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> >         case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> >         case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> >         case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> >         case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> >         case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> >         }
> > }
> >
> > static inline void
> > enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> >                 void *obj_table, uint32_t num, uint32_t esize) {
> >         uint32_t idx, n;
> >         uint32_t *du32;
> >
> >         const uint32_t size = r->size;
> >
> >         idx = prod_head & (r)->mask;
> >
> >         du32 = ring_start + idx * sizeof(uint32_t);
> >
> >         if (idx + num < size)
> >                 copy_elems(du32, obj_table, num, esize);
> >         else {
> >                 n = size - idx;
> >                 copy_elems(du32, obj_table, n, esize);
> >                 copy_elems(ring_start, obj_table + n * sizeof(uint32_t),
> >                         num - n, esize);
> >         }
> > }
> >
> > And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:
> >
> > enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> >
> >
> > > > +{
> > > > +	uint32_t prod_head, prod_next;
> > > > +	uint32_t free_entries;
> > > > +
> > > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > > +			&prod_head, &prod_next, &free_entries);
> > > > +	if (n == 0)
> > > > +		goto end;
> > > > +
> > > > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
> > > > +
> > > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > +end:
> > > > +	if (free_space != NULL)
> > > > +		*free_space = free_entries - n;
> > > > +	return n;
> > > > +}
> > > > +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-15  9:34             ` Ananyev, Konstantin
@ 2019-10-17  4:46               ` Honnappa Nagarahalli
  2019-10-17 11:51                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17  4:46 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, Honnappa Nagarahalli, nd, nd

<snip>

> Hi Honnappa,
> 
> > > > >
> > > > > Current APIs assume ring elements to be pointers. However, in
> > > > > many use cases, the size can be different. Add new APIs to
> > > > > support configurable ring element sizes.
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/Makefile             |   3 +-
> > > > >  lib/librte_ring/meson.build          |   3 +
> > > > >  lib/librte_ring/rte_ring.c           |  45 +-
> > > > >  lib/librte_ring/rte_ring.h           |   1 +
> > > > >  lib/librte_ring/rte_ring_elem.h      | 946
> +++++++++++++++++++++++++++
> > > > >  lib/librte_ring/rte_ring_version.map |   2 +
> > > > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > > >
> > > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > > index 21a36770d..515a967bb 100644
> > > > > --- a/lib/librte_ring/Makefile
> > > > > +++ b/lib/librte_ring/Makefile

<snip>

> > > > > +
> > > > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > > +experimental allow_experimental_apis = true
> > > > > diff --git a/lib/librte_ring/rte_ring.c
> > > > > b/lib/librte_ring/rte_ring.c index d9b308036..6fed3648b 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  #include <rte_tailq.h>
> > > > >
> > > > >  #include "rte_ring.h"
> > > > > +#include "rte_ring_elem.h"
> > > > >

<snip>

> > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > > > 000000000..860f059ad
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > @@ -0,0 +1,946 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + *
> > > > > + * Copyright (c) 2019 Arm Limited
> > > > > + * Copyright (c) 2010-2017 Intel Corporation
> > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > + * All rights reserved.
> > > > > + * Derived from FreeBSD's bufring.h
> > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > + */
> > > > > +
> > > > > +#ifndef _RTE_RING_ELEM_H_
> > > > > +#define _RTE_RING_ELEM_H_
> > > > > +

<snip>

> > > > > +
> > > > > +/* the actual enqueue of pointers on the ring.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi producer enqueue functions.
> > > > > + */
> > > > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize == 4) \
> > > > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize == 8) \
> > > > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize == 16) \
> > > > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8)
> { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > +			ring[idx + 4] = obj[i + 4]; \
> > > > > +			ring[idx + 5] = obj[i + 5]; \
> > > > > +			ring[idx + 6] = obj[i + 6]; \
> > > > > +			ring[idx + 7] = obj[i + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4)
> { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			ring[idx++] = obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] = obj[i]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			ring[idx] = obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* the actual copy of pointers on the ring to obj_table.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi consumer dequeue functions.
> > > > > + */
> > > > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize == 4) \
> > > > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize == 8) \
> > > > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize == 16) \
> > > > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8)
> {\
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > +			obj[i + 4] = ring[idx + 4]; \
> > > > > +			obj[i + 5] = ring[idx + 5]; \
> > > > > +			obj[i + 6] = ring[idx + 6]; \
> > > > > +			obj[i + 7] = ring[idx + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4)
> {\
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > +	const uint32_t size = (r)->size; \
> > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > +			obj[i] = ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* Between load and load. there might be cpu reorder in weak
> > > > > +model
> > > > > + * (powerpc/arm).
> > > > > + * There are 2 choices for the users
> > > > > + * 1.use rmb() memory barrier
> > > > > + * 2.use one-direction load_acquire/store_release
> > > > > +barrier,defined by
> > > > > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > > + * It depends on performance test results.
> > > > > + * By default, move common functions to rte_ring_generic.h  */
> > > > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > > > +#else
> > > > > +#include "rte_ring_generic.h"
> > > > > +#endif
> > > > > +
> > > > > +/**
> > > > > + * @internal Enqueue several objects on the ring
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_table
> > > > > + *   A pointer to a table of void * pointers (objects).
> > > > > + * @param esize
> > > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > > + *   Currently, sizes 4, 8 and 16 are supported. This should be the
> same
> > > > > + *   as passed while creating the ring, otherwise the results are
> undefined.
> > > > > + * @param n
> > > > > + *   The number of objects to add in the ring from the obj_table.
> > > > > + * @param behavior
> > > > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items
> from a
> > > ring
> > > > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > > from
> > > > > ring
> > > > > + * @param is_sp
> > > > > + *   Indicates whether to use single producer or multi-producer head
> > > update
> > > > > + * @param free_space
> > > > > + *   returns the amount of space after the enqueue operation has
> > > finished
> > > > > + * @return
> > > > > + *   Actual number of objects enqueued.
> > > > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > > + */
> > > > > +static __rte_always_inline unsigned int
> > > > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const
> obj_table,
> > > > > +		unsigned int esize, unsigned int n,
> > > > > +		enum rte_ring_queue_behavior behavior, unsigned
> int is_sp,
> > > > > +		unsigned int *free_space)
> > >
> > >
> > > I like the idea to add esize as an argument to the public API, so
> > > the compiler can do it's jib optimizing calls with constant esize.
> > > Though I am not very happy with the rest of implementation:
> > > 1. It doesn't really provide configurable elem size - only 4/8/16B
> > > elems are supported.
> > Agree. I was thinking other sizes can be added on need basis.
> > However, I am wondering if we should just provide for 4B and then the
> users can use bulk operations to construct whatever they need?
> 
> I suppose it could be plan B... if there would be no agreement on generic case.
> And for 4B elems, I guess you do have a particular use-case?
Yes

> 
> > It
> > would mean extra work for the users.
> >
> > > 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> > > macros.
> > >
> > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always
> > > does 32B copy per iteration.
> > Yes, I tried to keep it the same as the existing one (originally, I
> > guess the intention was to allow for 256b vector instructions to be
> > generated)
> >
> > > So wonder can we make a generic function that would do 32B copy per
> > > iteration in a main loop, and copy tail  by 4B chunks?
> > > That would avoid copy duplication and will allow user to have any
> > > elem size (multiple of 4B) he wants.
> > > Something like that (note didn't test it, just a rough idea):
> > >
> > >  static inline void
> > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > > uint32_t
> > > esize) {
> > >         uint32_t i, sz;
> > >
> > >         sz = (num * esize) / sizeof(uint32_t);
> > If 'num' is a compile time constant, 'sz' will be a compile time constant.
> Otherwise, this will result in a multiplication operation.
> 
> Not always.
> If esize is compile time constant, then for esize as power of 2 (4,8,16,...), it
> would be just one shift.
> For other constant values it could be a 'mul' or in many cases just 2 shifts plus
> 'add' (if compiler is smart enough).
> I.E. let say for 24B elem is would be either num * 6 or (num << 2) + (num <<
> 1).
With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num
Not sure if the compiler will do this.

> I suppose for non-power of 2 elems it might be ok to get such small perf hit.
Agree, should be ok not to focus on right now.

> 
> >I have tried
> > to avoid the multiplication operation and try to use shift and mask
> operations (just like how the rest of the ring code does).
> >
> > >
> > >         for (i = 0; i < (sz & ~7); i += 8)
> > >                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> > I had used memcpy to start with (for the entire copy operation),
> > performance is not the same for 64b elements when compared with the
> existing ring APIs (some cases more and some cases less).
> 
> I remember that from one of your previous mails, that's why here I suggest to
> use in a loop memcpy() with fixed size.
> That way for each iteration complier will replace memcpy() with instructions
> to copy 32B in a way he thinks is optimal (same as for original macro, I think).
I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as follows. The numbers in brackets are with the code on master.
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 5
MP/MC single enq/dequeue: 40 (35)
SP/SC burst enq/dequeue (size: 8): 2
MP/MC burst enq/dequeue (size: 8): 6
SP/SC burst enq/dequeue (size: 32): 1 (2)
MP/MC burst enq/dequeue (size: 32): 2

### Testing empty dequeue ###
SC empty dequeue: 2.11
MC empty dequeue: 1.41 (2.11)

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86)
MP/MC bulk enq/dequeue (size: 8): 6.35 (6.91)
SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 73.81 (15.33)
MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58)
MP/MC bulk enq/dequeue (size: 32): 25.74 (20.91)

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66)
MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
SP/SC bulk enq/dequeue (size: 32): 50.78 (23)
MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)

On one of the Arm platform
MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are ok)

On another Arm platform, all numbers are same or slightly better.

I can post the patch with this change if you want to run some benchmarks on your platform.
I have not used the same code you have suggested, instead I have used the same logic in a single macro with memcpy.

> 
> >
> > IMO, we have to keep the performance of the 64b and 128b the same as
> > what we get with the existing ring and event-ring APIs. That would allow us
> to replace them with these new APIs. I suggest that we keep the macros in
> this patch for 64b and 128b.
> 
> I still think we probably can achieve that without duplicating macros, while
> still supporting arbitrary elem size.
> See above.
> 
> > For the rest of the sizes, we could put a for loop around 32b macro (this
> would allow for all sizes as well).
> >
> > >
> > >         switch (sz & 7) {
> > >         case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> > >         case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> > >         case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> > >         case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> > >         case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> > >         case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> > >         case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> > >         }
> > > }
> > >
> > > static inline void
> > > enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> > >                 void *obj_table, uint32_t num, uint32_t esize) {
> > >         uint32_t idx, n;
> > >         uint32_t *du32;
> > >
> > >         const uint32_t size = r->size;
> > >
> > >         idx = prod_head & (r)->mask;
> > >
> > >         du32 = ring_start + idx * sizeof(uint32_t);
> > >
> > >         if (idx + num < size)
> > >                 copy_elems(du32, obj_table, num, esize);
> > >         else {
> > >                 n = size - idx;
> > >                 copy_elems(du32, obj_table, n, esize);
> > >                 copy_elems(ring_start, obj_table + n * sizeof(uint32_t),
> > >                         num - n, esize);
> > >         }
> > > }
> > >
> > > And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:
> > >
> > > enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> > >
> > >
> > > > > +{
> > > > > +	uint32_t prod_head, prod_next;
> > > > > +	uint32_t free_entries;
> > > > > +
> > > > > +	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > > > +			&prod_head, &prod_next, &free_entries);
> > > > > +	if (n == 0)
> > > > > +		goto end;
> > > > > +
> > > > > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize,
> n);
> > > > > +
> > > > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > > +end:
> > > > > +	if (free_space != NULL)
> > > > > +		*free_space = free_entries - n;
> > > > > +	return n;
> > > > > +}
> > > > > +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-17  4:46               ` Honnappa Nagarahalli
@ 2019-10-17 11:51                 ` Ananyev, Konstantin
  2019-10-17 20:16                   ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-17 11:51 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, nd


> > > > > > Current APIs assume ring elements to be pointers. However, in
> > > > > > many use cases, the size can be different. Add new APIs to
> > > > > > support configurable ring element sizes.
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/Makefile             |   3 +-
> > > > > >  lib/librte_ring/meson.build          |   3 +
> > > > > >  lib/librte_ring/rte_ring.c           |  45 +-
> > > > > >  lib/librte_ring/rte_ring.h           |   1 +
> > > > > >  lib/librte_ring/rte_ring_elem.h      | 946
> > +++++++++++++++++++++++++++
> > > > > >  lib/librte_ring/rte_ring_version.map |   2 +
> > > > > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > > > >
> > > > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > > > index 21a36770d..515a967bb 100644
> > > > > > --- a/lib/librte_ring/Makefile
> > > > > > +++ b/lib/librte_ring/Makefile
> 
> <snip>
> 
> > > > > > +
> > > > > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > > > +experimental allow_experimental_apis = true
> > > > > > diff --git a/lib/librte_ring/rte_ring.c
> > > > > > b/lib/librte_ring/rte_ring.c index d9b308036..6fed3648b 100644
> > > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > > @@ -33,6 +33,7 @@
> > > > > >  #include <rte_tailq.h>
> > > > > >
> > > > > >  #include "rte_ring.h"
> > > > > > +#include "rte_ring_elem.h"
> > > > > >
> 
> <snip>
> 
> > > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > > > > 000000000..860f059ad
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > > @@ -0,0 +1,946 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + *
> > > > > > + * Copyright (c) 2019 Arm Limited
> > > > > > + * Copyright (c) 2010-2017 Intel Corporation
> > > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > > + * All rights reserved.
> > > > > > + * Derived from FreeBSD's bufring.h
> > > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _RTE_RING_ELEM_H_
> > > > > > +#define _RTE_RING_ELEM_H_
> > > > > > +
> 
> <snip>
> 
> > > > > > +
> > > > > > +/* the actual enqueue of pointers on the ring.
> > > > > > + * Placed here since identical code needed in both
> > > > > > + * single and multi producer enqueue functions.
> > > > > > + */
> > > > > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > > > > +esize, n)
> > > > > > do { \
> > > > > > +	if (esize == 4) \
> > > > > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head,
> > obj_table, n); \
> > > > > > +	else if (esize == 8) \
> > > > > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head,
> > obj_table, n); \
> > > > > > +	else if (esize == 16) \
> > > > > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head,
> > obj_table, n);
> > > > \ }
> > > > > > while
> > > > > > +(0)
> > > > > > +
> > > > > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n)
> > do { \
> > > > > > +	unsigned int i; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8)
> > { \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > > +			ring[idx + 4] = obj[i + 4]; \
> > > > > > +			ring[idx + 5] = obj[i + 5]; \
> > > > > > +			ring[idx + 6] = obj[i + 6]; \
> > > > > > +			ring[idx + 7] = obj[i + 7]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x7) { \
> > > > > > +		case 7: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 6: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 5: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 4: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 3: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 2: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 1: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n)
> > do { \
> > > > > > +	unsigned int i; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4)
> > { \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > > +			ring[idx + 2] = obj[i + 2]; \
> > > > > > +			ring[idx + 3] = obj[i + 3]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x3) { \
> > > > > > +		case 3: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 2: \
> > > > > > +			ring[idx++] = obj[i++]; /* fallthrough */ \
> > > > > > +		case 1: \
> > > > > > +			ring[idx++] = obj[i++]; \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table,
> > > > > > +n) do
> > > > { \
> > > > > > +	unsigned int i; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	uint32_t idx = prod_head & (r)->mask; \
> > > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +			ring[idx + 1] = obj[i + 1]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x1) { \
> > > > > > +		case 1: \
> > > > > > +			ring[idx++] = obj[i++]; \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++)\
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			ring[idx] = obj[i]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +/* the actual copy of pointers on the ring to obj_table.
> > > > > > + * Placed here since identical code needed in both
> > > > > > + * single and multi consumer dequeue functions.
> > > > > > + */
> > > > > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > > > > +esize, n)
> > > > > > do { \
> > > > > > +	if (esize == 4) \
> > > > > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head,
> > obj_table, n); \
> > > > > > +	else if (esize == 8) \
> > > > > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head,
> > obj_table, n); \
> > > > > > +	else if (esize == 16) \
> > > > > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head,
> > obj_table, n);
> > > > \ }
> > > > > > while
> > > > > > +(0)
> > > > > > +
> > > > > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do
> > { \
> > > > > > +	unsigned int i; \
> > > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	uint32_t *ring = (uint32_t *)ring_start; \
> > > > > > +	uint32_t *obj = (uint32_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8)
> > {\
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > > +			obj[i + 4] = ring[idx + 4]; \
> > > > > > +			obj[i + 5] = ring[idx + 5]; \
> > > > > > +			obj[i + 6] = ring[idx + 6]; \
> > > > > > +			obj[i + 7] = ring[idx + 7]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x7) { \
> > > > > > +		case 7: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 6: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 5: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 4: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 3: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 2: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 1: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do
> > { \
> > > > > > +	unsigned int i; \
> > > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	uint64_t *ring = (uint64_t *)ring_start; \
> > > > > > +	uint64_t *obj = (uint64_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4)
> > {\
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > > +			obj[i + 2] = ring[idx + 2]; \
> > > > > > +			obj[i + 3] = ring[idx + 3]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x3) { \
> > > > > > +		case 3: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 2: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		case 1: \
> > > > > > +			obj[i++] = ring[idx++]; \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table,
> > > > > > +n) do
> > > > { \
> > > > > > +	unsigned int i; \
> > > > > > +	uint32_t idx = cons_head & (r)->mask; \
> > > > > > +	const uint32_t size = (r)->size; \
> > > > > > +	__uint128_t *ring = (__uint128_t *)ring_start; \
> > > > > > +	__uint128_t *obj = (__uint128_t *)obj_table; \
> > > > > > +	if (likely(idx + n < size)) { \
> > > > > > +		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +			obj[i + 1] = ring[idx + 1]; \
> > > > > > +		} \
> > > > > > +		switch (n & 0x1) { \
> > > > > > +		case 1: \
> > > > > > +			obj[i++] = ring[idx++]; /* fallthrough */ \
> > > > > > +		} \
> > > > > > +	} else { \
> > > > > > +		for (i = 0; idx < size; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +		for (idx = 0; i < n; i++, idx++) \
> > > > > > +			obj[i] = ring[idx]; \
> > > > > > +	} \
> > > > > > +} while (0)
> > > > > > +
> > > > > > +/* Between load and load. there might be cpu reorder in weak
> > > > > > +model
> > > > > > + * (powerpc/arm).
> > > > > > + * There are 2 choices for the users
> > > > > > + * 1.use rmb() memory barrier
> > > > > > + * 2.use one-direction load_acquire/store_release
> > > > > > +barrier,defined by
> > > > > > + * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > > > + * It depends on performance test results.
> > > > > > + * By default, move common functions to rte_ring_generic.h  */
> > > > > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > > > > +#else
> > > > > > +#include "rte_ring_generic.h"
> > > > > > +#endif
> > > > > > +
> > > > > > +/**
> > > > > > + * @internal Enqueue several objects on the ring
> > > > > > + *
> > > > > > + * @param r
> > > > > > + *   A pointer to the ring structure.
> > > > > > + * @param obj_table
> > > > > > + *   A pointer to a table of void * pointers (objects).
> > > > > > + * @param esize
> > > > > > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > > > > > + *   Currently, sizes 4, 8 and 16 are supported. This should be the
> > same
> > > > > > + *   as passed while creating the ring, otherwise the results are
> > undefined.
> > > > > > + * @param n
> > > > > > + *   The number of objects to add in the ring from the obj_table.
> > > > > > + * @param behavior
> > > > > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items
> > from a
> > > > ring
> > > > > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > > > from
> > > > > > ring
> > > > > > + * @param is_sp
> > > > > > + *   Indicates whether to use single producer or multi-producer head
> > > > update
> > > > > > + * @param free_space
> > > > > > + *   returns the amount of space after the enqueue operation has
> > > > finished
> > > > > > + * @return
> > > > > > + *   Actual number of objects enqueued.
> > > > > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > > > + */
> > > > > > +static __rte_always_inline unsigned int
> > > > > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const
> > obj_table,
> > > > > > +		unsigned int esize, unsigned int n,
> > > > > > +		enum rte_ring_queue_behavior behavior, unsigned
> > int is_sp,
> > > > > > +		unsigned int *free_space)
> > > >
> > > >
> > > > I like the idea to add esize as an argument to the public API, so
> > > > the compiler can do it's jib optimizing calls with constant esize.
> > > > Though I am not very happy with the rest of implementation:
> > > > 1. It doesn't really provide configurable elem size - only 4/8/16B
> > > > elems are supported.
> > > Agree. I was thinking other sizes can be added on need basis.
> > > However, I am wondering if we should just provide for 4B and then the
> > users can use bulk operations to construct whatever they need?
> >
> > I suppose it could be plan B... if there would be no agreement on generic case.
> > And for 4B elems, I guess you do have a particular use-case?
> Yes
> 
> >
> > > It
> > > would mean extra work for the users.
> > >
> > > > 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> > > > macros.
> > > >
> > > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always
> > > > does 32B copy per iteration.
> > > Yes, I tried to keep it the same as the existing one (originally, I
> > > guess the intention was to allow for 256b vector instructions to be
> > > generated)
> > >
> > > > So wonder can we make a generic function that would do 32B copy per
> > > > iteration in a main loop, and copy tail  by 4B chunks?
> > > > That would avoid copy duplication and will allow user to have any
> > > > elem size (multiple of 4B) he wants.
> > > > Something like that (note didn't test it, just a rough idea):
> > > >
> > > >  static inline void
> > > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > > > uint32_t
> > > > esize) {
> > > >         uint32_t i, sz;
> > > >
> > > >         sz = (num * esize) / sizeof(uint32_t);
> > > If 'num' is a compile time constant, 'sz' will be a compile time constant.
> > Otherwise, this will result in a multiplication operation.
> >
> > Not always.
> > If esize is compile time constant, then for esize as power of 2 (4,8,16,...), it
> > would be just one shift.
> > For other constant values it could be a 'mul' or in many cases just 2 shifts plus
> > 'add' (if compiler is smart enough).
> > I.E. let say for 24B elem is would be either num * 6 or (num << 2) + (num <<
> > 1).
> With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num
> Not sure if the compiler will do this.

For 15, it can be just (num << 4) - num

> 
> > I suppose for non-power of 2 elems it might be ok to get such small perf hit.
> Agree, should be ok not to focus on right now.
> 
> >
> > >I have tried
> > > to avoid the multiplication operation and try to use shift and mask
> > operations (just like how the rest of the ring code does).
> > >
> > > >
> > > >         for (i = 0; i < (sz & ~7); i += 8)
> > > >                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> > > I had used memcpy to start with (for the entire copy operation),
> > > performance is not the same for 64b elements when compared with the
> > existing ring APIs (some cases more and some cases less).
> >
> > I remember that from one of your previous mails, that's why here I suggest to
> > use in a loop memcpy() with fixed size.
> > That way for each iteration complier will replace memcpy() with instructions
> > to copy 32B in a way he thinks is optimal (same as for original macro, I think).
> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as follows. The numbers in brackets are with the code on master.
> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> 
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 5
> MP/MC single enq/dequeue: 40 (35)
> SP/SC burst enq/dequeue (size: 8): 2
> MP/MC burst enq/dequeue (size: 8): 6
> SP/SC burst enq/dequeue (size: 32): 1 (2)
> MP/MC burst enq/dequeue (size: 32): 2
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 2.11
> MC empty dequeue: 1.41 (2.11)
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86)
> MP/MC bulk enq/dequeue (size: 8): 6.35 (6.91)
> SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
> MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 73.81 (15.33)
> MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58)
> MP/MC bulk enq/dequeue (size: 32): 25.74 (20.91)
> 
> ### Testing using two NUMA nodes ###
> SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66)
> MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> SP/SC bulk enq/dequeue (size: 32): 50.78 (23)
> MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> 
> On one of the Arm platform
> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are ok)

So it shows better numbers for one core, but worse on 2, right?

 
> On another Arm platform, all numbers are same or slightly better.
> 
> I can post the patch with this change if you want to run some benchmarks on your platform.

Sure, please do.
I'll try to run on my boxes.

> I have not used the same code you have suggested, instead I have used the same logic in a single macro with memcpy.
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (9 preceding siblings ...)
  2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-10-17 20:08   ` Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
                       ` (2 more replies)
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                     ` (4 subsequent siblings)
  15 siblings, 3 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17 20:08 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation. The patch adds same performance tests that are run
for existing APIs. This allows for performance comparison.

I also tested with memcpy. x86 shows significant improvements on bulk
and burst tests. On the Arm platform, I used, there is a drop of
4% to 6% in few tests. May be this is something that we can explore
later.

Note that this version skips changes to other libraries as I would
like to get an agreement on the implementation from the community.
They will be added once there is agreement on the rte_ring changes.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (3):
  lib/ring: apis to support configurable element size
  test/ring: add test cases for configurable element size ring
  lib/ring: copy ring elements using memcpy partially

 app/test/Makefile                    |   1 +
 app/test/meson.build                 |   1 +
 app/test/test_ring_perf_elem.c       | 419 ++++++++++++++
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 805 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 9 files changed, 1271 insertions(+), 9 deletions(-)
 create mode 100644 app/test/test_ring_perf_elem.c
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable element size
  2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-10-17 20:08     ` Honnappa Nagarahalli
  2019-10-17 20:39       ` Stephen Hemminger
  2019-10-17 20:40       ` Stephen Hemminger
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 2/3] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 3/3] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
  2 siblings, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17 20:08 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   3 +
 lib/librte_ring/rte_ring.c           |  45 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 6 files changed, 991 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..515a967bb 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..74219840a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,3 +6,6 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..6fed3648b 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,42 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, unsigned esize)
 {
 	ssize_t sz;
 
+	/* Supported esize values are 4/8/16.
+	 * Others can be added on need basis.
+	 */
+	if ((esize != 4) && (esize != 8) && (esize != 16)) {
+		RTE_LOG(ERR, RING,
+			"Unsupported esize value. Supported values are 4, 8 and 16\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be "
+			"power of 2, and do not exceed the limit %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +134,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +155,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +202,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..860f059ad
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,946 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with flexible element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned count, unsigned esize);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
+				unsigned esize, int socket_id, unsigned flags);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 8) \
+		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 16) \
+		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
+} while (0)
+
+#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+			ring[idx + 4] = obj[i + 4]; \
+			ring[idx + 5] = obj[i + 5]; \
+			ring[idx + 6] = obj[i + 6]; \
+			ring[idx + 7] = obj[i + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 6: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 5: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 4: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions.
+ */
+#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 8) \
+		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 16) \
+		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
+} while (0)
+
+#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+			obj[i + 4] = ring[idx + 4]; \
+			obj[i + 5] = ring[idx + 5]; \
+			obj[i + 6] = ring[idx + 6]; \
+			obj[i + 7] = ring[idx + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 6: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 5: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 4: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v5 2/3] test/ring: add test cases for configurable element size ring
  2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-10-17 20:08     ` Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 3/3] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
  2 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17 20:08 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Add test cases to test APIs for configurable element size ring.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile              |   1 +
 app/test/meson.build           |   1 +
 app/test/test_ring_perf_elem.c | 419 +++++++++++++++++++++++++++++++++
 3 files changed, 421 insertions(+)
 create mode 100644 app/test/test_ring_perf_elem.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..e5cb27b75 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_perf_elem.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..995ee9bc7 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_reorder.c',
 	'test_ring.c',
 	'test_ring_perf.c',
+	'test_ring_perf_elem.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_perf_elem.c b/app/test/test_ring_perf_elem.c
new file mode 100644
index 000000000..fc5b82d71
--- /dev/null
+++ b/app/test/test_ring_perf_elem.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+
+#include <stdio.h>
+#include <inttypes.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+
+#include "test.h"
+
+/*
+ * Ring
+ * ====
+ *
+ * Measures performance of various operations using rdtsc
+ *  * Empty ring dequeue
+ *  * Enqueue/dequeue of bursts in 1 threads
+ *  * Enqueue/dequeue of bursts in 2 threads
+ */
+
+#define RING_NAME "RING_PERF"
+#define RING_SIZE 4096
+#define MAX_BURST 64
+
+/*
+ * the sizes to enqueue and dequeue in testing
+ * (marked volatile so they won't be seen as compile-time constants)
+ */
+static const volatile unsigned bulk_sizes[] = { 8, 32 };
+
+struct lcore_pair {
+	unsigned c1, c2;
+};
+
+static volatile unsigned lcore_count;
+
+/**** Functions to analyse our core mask to get cores for different tests ***/
+
+static int
+get_two_hyperthreads(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		/* inner loop just re-reads all id's. We could skip the
+		 * first few elements, but since number of cores is small
+		 * there is little point
+		 */
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 == c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_cores(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 != c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_sockets(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if (s1 != s2) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
+static void
+test_empty_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 26;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[MAX_BURST];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_sc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_mc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SC empty dequeue: %.2F\n",
+			(double)(sc_end-sc_start) / iterations);
+	printf("MC empty dequeue: %.2F\n",
+			(double)(mc_end-mc_start) / iterations);
+}
+
+/*
+ * for the separate enqueue and dequeue threads they take in one param
+ * and return two. Input = burst size, output = cycle average for sp/sc & mp/mc
+ */
+struct thread_params {
+	struct rte_ring *r;
+	unsigned size;        /* input value, the burst size */
+	double spsc, mpmc;    /* output value, the single or multi timings */
+};
+
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sp_end = rte_rdtsc();
+
+	const uint64_t mp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mp_end = rte_rdtsc();
+
+	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
+	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
+ * thread running enqueue_bulk function
+ */
+static int
+dequeue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mc_end = rte_rdtsc();
+
+	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
+	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
+ * used to measure ring perf between hyperthreads, cores and sockets.
+ */
+static void
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
+		lcore_function_t f1, lcore_function_t f2)
+{
+	struct thread_params param1 = {0}, param2 = {0};
+	unsigned i;
+	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
+		lcore_count = 0;
+		param1.size = param2.size = bulk_sizes[i];
+		param1.r = param2.r = r;
+		if (cores->c1 == rte_get_master_lcore()) {
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			f1(&param1);
+			rte_eal_wait_lcore(cores->c2);
+		} else {
+			rte_eal_remote_launch(f1, &param1, cores->c1);
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			rte_eal_wait_lcore(cores->c1);
+			rte_eal_wait_lcore(cores->c2);
+		}
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.spsc + param2.spsc);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.mpmc + param2.mpmc);
+	}
+}
+
+/*
+ * Test function that determines how long an enqueue + dequeue of a single item
+ * takes on a single lcore. Result is for comparison with the bulk enq+deq.
+ */
+static void
+test_single_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 24;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[2];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_sp_enqueue_elem(r, burst, 8);
+		rte_ring_sc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_mp_enqueue_elem(r, burst, 8);
+		rte_ring_mc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
+			(sc_end-sc_start) >> iter_shift);
+	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
+			(mc_end-mc_start) >> iter_shift);
+}
+
+/*
+ * Test that does both enqueue and dequeue on a core using the burst() API calls
+ * instead of the bulk() calls used in other tests. Results should be the same
+ * as for the bulk function called on a single lcore.
+ */
+static void
+test_burst_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) /
+					bulk_sizes[sz];
+		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) /
+					bulk_sizes[sz];
+
+		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+/* Times enqueue and dequeue on a single lcore */
+static void
+test_bulk_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		double sc_avg = ((double)(sc_end-sc_start) /
+				(iterations * bulk_sizes[sz]));
+		double mc_avg = ((double)(mc_end-mc_start) /
+				(iterations * bulk_sizes[sz]));
+
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+static int
+test_ring_perf_elem(void)
+{
+	struct lcore_pair cores;
+	struct rte_ring *r = NULL;
+
+	r = rte_ring_create_elem(RING_NAME, RING_SIZE, 8, rte_socket_id(), 0);
+	if (r == NULL)
+		return -1;
+
+	printf("### Testing single element and burst enq/deq ###\n");
+	test_single_enqueue_dequeue(r);
+	test_burst_enqueue_dequeue(r);
+
+	printf("\n### Testing empty dequeue ###\n");
+	test_empty_dequeue(r);
+
+	printf("\n### Testing using a single lcore ###\n");
+	test_bulk_enqueue_dequeue(r);
+
+	if (get_two_hyperthreads(&cores) == 0) {
+		printf("\n### Testing using two hyperthreads ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_cores(&cores) == 0) {
+		printf("\n### Testing using two physical cores ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_sockets(&cores) == 0) {
+		printf("\n### Testing using two NUMA nodes ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	rte_ring_free(r);
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(ring_perf_elem_autotest, test_ring_perf_elem);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v5 3/3] lib/ring: copy ring elements using memcpy partially
  2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 2/3] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
@ 2019-10-17 20:08     ` Honnappa Nagarahalli
  2 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17 20:08 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Copy of ring elements uses memcpy for 32B chunks. The remaining
bytes are copied using assignments.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_ring/rte_ring_elem.h | 163 +++-----------------------------
 1 file changed, 11 insertions(+), 152 deletions(-)

diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 860f059ad..92e92f150 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -24,6 +24,7 @@ extern "C" {
 #include <stdint.h>
 #include <sys/queue.h>
 #include <errno.h>
+#include <string.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include <rte_memory.h>
@@ -108,35 +109,16 @@ __rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
 				unsigned esize, int socket_id, unsigned flags);
 
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions.
- */
-#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
-	if (esize == 4) \
-		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
-	else if (esize == 8) \
-		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
-	else if (esize == 16) \
-		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
-} while (0)
-
-#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
+#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n) do { \
 	unsigned int i; \
 	const uint32_t size = (r)->size; \
 	uint32_t idx = prod_head & (r)->mask; \
 	uint32_t *ring = (uint32_t *)ring_start; \
 	uint32_t *obj = (uint32_t *)obj_table; \
+	uint32_t sz = n * (esize / sizeof(uint32_t)); \
 	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(unsigned)0x7))); i += 8, idx += 8) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-			ring[idx + 2] = obj[i + 2]; \
-			ring[idx + 3] = obj[i + 3]; \
-			ring[idx + 4] = obj[i + 4]; \
-			ring[idx + 5] = obj[i + 5]; \
-			ring[idx + 6] = obj[i + 6]; \
-			ring[idx + 7] = obj[i + 7]; \
+		for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
+			memcpy (ring + i, obj + i, 8 * sizeof (uint32_t)); \
 		} \
 		switch (n & 0x7) { \
 		case 7: \
@@ -162,87 +144,16 @@ struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
 	} \
 } while (0)
 
-#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	uint64_t *ring = (uint64_t *)ring_start; \
-	uint64_t *obj = (uint64_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(unsigned)0x3))); i += 4, idx += 4) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-			ring[idx + 2] = obj[i + 2]; \
-			ring[idx + 3] = obj[i + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		case 2: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		case 1: \
-			ring[idx++] = obj[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj[i]; \
-	} \
-} while (0)
-
-#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	__uint128_t *ring = (__uint128_t *)ring_start; \
-	__uint128_t *obj = (__uint128_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-		} \
-		switch (n & 0x1) { \
-		case 1: \
-			ring[idx++] = obj[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj[i]; \
-	} \
-} while (0)
-
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions.
- */
-#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
-	if (esize == 4) \
-		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
-	else if (esize == 8) \
-		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
-	else if (esize == 16) \
-		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
-} while (0)
-
-#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
+#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n) do { \
 	unsigned int i; \
 	uint32_t idx = cons_head & (r)->mask; \
 	const uint32_t size = (r)->size; \
 	uint32_t *ring = (uint32_t *)ring_start; \
 	uint32_t *obj = (uint32_t *)obj_table; \
+	uint32_t sz = n * (esize / sizeof(uint32_t)); \
 	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(unsigned)0x7)); i += 8, idx += 8) {\
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-			obj[i + 2] = ring[idx + 2]; \
-			obj[i + 3] = ring[idx + 3]; \
-			obj[i + 4] = ring[idx + 4]; \
-			obj[i + 5] = ring[idx + 5]; \
-			obj[i + 6] = ring[idx + 6]; \
-			obj[i + 7] = ring[idx + 7]; \
+		for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
+			memcpy (obj + i, ring + i, 8 * sizeof (uint32_t)); \
 		} \
 		switch (n & 0x7) { \
 		case 7: \
@@ -268,58 +179,6 @@ struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
 	} \
 } while (0)
 
-#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	uint64_t *ring = (uint64_t *)ring_start; \
-	uint64_t *obj = (uint64_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(unsigned)0x3)); i += 4, idx += 4) {\
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-			obj[i + 2] = ring[idx + 2]; \
-			obj[i + 3] = ring[idx + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		case 2: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		case 1: \
-			obj[i++] = ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj[i] = ring[idx]; \
-	} \
-} while (0)
-
-#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	__uint128_t *ring = (__uint128_t *)ring_start; \
-	__uint128_t *obj = (__uint128_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-		} \
-		switch (n & 0x1) { \
-		case 1: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj[i] = ring[idx]; \
-	} \
-} while (0)
-
 /* Between load and load. there might be cpu reorder in weak model
  * (powerpc/arm).
  * There are 2 choices for the users
@@ -373,7 +232,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
 	if (n == 0)
 		goto end;
 
-	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+	ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
 
 	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
 end:
@@ -420,7 +279,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
 	if (n == 0)
 		goto end;
 
-	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+	DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
 
 	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-17 11:51                 ` Ananyev, Konstantin
@ 2019-10-17 20:16                   ` Honnappa Nagarahalli
  2019-10-17 23:17                     ` David Christensen
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-17 20:16 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula, David Christensen
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, nd

<snip>

+ David Christensen for Power architecture

> > >
> > > > It
> > > > would mean extra work for the users.
> > > >
> > > > > 2. A lot of code duplication with these 3 copies of
> > > > > ENQUEUE/DEQUEUE macros.
> > > > >
> > > > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop
> > > > > always does 32B copy per iteration.
> > > > Yes, I tried to keep it the same as the existing one (originally,
> > > > I guess the intention was to allow for 256b vector instructions to
> > > > be
> > > > generated)
> > > >
> > > > > So wonder can we make a generic function that would do 32B copy
> > > > > per iteration in a main loop, and copy tail  by 4B chunks?
> > > > > That would avoid copy duplication and will allow user to have
> > > > > any elem size (multiple of 4B) he wants.
> > > > > Something like that (note didn't test it, just a rough idea):
> > > > >
> > > > >  static inline void
> > > > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > > > > uint32_t
> > > > > esize) {
> > > > >         uint32_t i, sz;
> > > > >
> > > > >         sz = (num * esize) / sizeof(uint32_t);
> > > > If 'num' is a compile time constant, 'sz' will be a compile time constant.
> > > Otherwise, this will result in a multiplication operation.
> > >
> > > Not always.
> > > If esize is compile time constant, then for esize as power of 2
> > > (4,8,16,...), it would be just one shift.
> > > For other constant values it could be a 'mul' or in many cases just
> > > 2 shifts plus 'add' (if compiler is smart enough).
> > > I.E. let say for 24B elem is would be either num * 6 or (num << 2) +
> > > (num << 1).
> > With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num
> > Not sure if the compiler will do this.
> 
> For 15, it can be just (num << 4) - num
> 
> >
> > > I suppose for non-power of 2 elems it might be ok to get such small perf hit.
> > Agree, should be ok not to focus on right now.
> >
> > >
> > > >I have tried
> > > > to avoid the multiplication operation and try to use shift and
> > > >mask
> > > operations (just like how the rest of the ring code does).
> > > >
> > > > >
> > > > >         for (i = 0; i < (sz & ~7); i += 8)
> > > > >                 memcpy(du32 + i, su32 + i, 8 *
> > > > > sizeof(uint32_t));
> > > > I had used memcpy to start with (for the entire copy operation),
> > > > performance is not the same for 64b elements when compared with
> > > > the
> > > existing ring APIs (some cases more and some cases less).
> > >
> > > I remember that from one of your previous mails, that's why here I
> > > suggest to use in a loop memcpy() with fixed size.
> > > That way for each iteration complier will replace memcpy() with
> > > instructions to copy 32B in a way he thinks is optimal (same as for original
> macro, I think).
> > I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as
> follows. The numbers in brackets are with the code on master.
> > gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> > burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> > 32): 2
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 2.11
> > MC empty dequeue: 1.41 (2.11)
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
> > MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk enq/dequeue
> > (size: 32): 25.74 (20.91)
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66) MP/MC bulk
> > enq/dequeue (size: 8): 176.02 (173.43) SP/SC bulk enq/dequeue (size:
> > 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> >
> > On one of the Arm platform
> > MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are
> > ok)
> 
> So it shows better numbers for one core, but worse on 2, right?
> 
> 
> > On another Arm platform, all numbers are same or slightly better.
> >
> > I can post the patch with this change if you want to run some benchmarks on
> your platform.
> 
> Sure, please do.
> I'll try to run on my boxes.
Sent v5, please check. Other platform owners should run this as well.

> 
> > I have not used the same code you have suggested, instead I have used the
> same logic in a single macro with memcpy.
> >


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable element size
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
@ 2019-10-17 20:39       ` Stephen Hemminger
  2019-10-17 20:40       ` Stephen Hemminger
  1 sibling, 0 replies; 173+ messages in thread
From: Stephen Hemminger @ 2019-10-17 20:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal, dev,
	dharmik.thakkar, ruifeng.wang, gavin.hu

On Thu, 17 Oct 2019 15:08:05 -0500
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> wrote:

> +	if ((esize != 4) && (esize != 8) && (esize != 16)) {
> +		RTE_LOG(ERR, RING,
> +			"Unsupported esize value. Supported values are 4, 8 and 16\n");
> +
> +		return -EINVAL;
> +	}
> +
>  	/* count must be a power of 2 */
>  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {

Minor nit, you don't need as many parens in conditionals.

	if (esize != 4 && esize != 8 && esize != 16) {

and
	if (!POWEROF2(count) || count > RTE_RING_SZ_MASK) {

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable element size
  2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
  2019-10-17 20:39       ` Stephen Hemminger
@ 2019-10-17 20:40       ` Stephen Hemminger
  1 sibling, 0 replies; 173+ messages in thread
From: Stephen Hemminger @ 2019-10-17 20:40 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal, dev,
	dharmik.thakkar, ruifeng.wang, gavin.hu

On Thu, 17 Oct 2019 15:08:05 -0500
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> wrote:

>  	/* count must be a power of 2 */
>  	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
>  		RTE_LOG(ERR, RING,
> -			"Requested size is invalid, must be power of 2, and "
> -			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
> +			"Requested number of elements is invalid, must be "
> +			"power of 2, and do not exceed the limit %u\n",

Error messages often go to syslog. Please don't use multi-line messages, syslog doesn't handle it.
Better to be less wordy

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-17 20:16                   ` Honnappa Nagarahalli
@ 2019-10-17 23:17                     ` David Christensen
  2019-10-18  3:18                       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: David Christensen @ 2019-10-17 23:17 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, olivier.matz,
	sthemmin, jerinj, Richardson, Bruce, david.marchand,
	pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd

>>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as
>> follows. The numbers in brackets are with the code on master.
>>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
>>>
>>> RTE>>ring_perf_elem_autotest
>>> ### Testing single element and burst enq/deq ### SP/SC single
>>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
>>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
>>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
>>> 32): 2
>>>
>>> ### Testing empty dequeue ###
>>> SC empty dequeue: 2.11
>>> MC empty dequeue: 1.41 (2.11)
>>>
>>> ### Testing using a single lcore ###
>>> SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86) MP/MC bulk enq/dequeue
>>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
>>> MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
>>>
>>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
>>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
>>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk enq/dequeue
>>> (size: 32): 25.74 (20.91)
>>>
>>> ### Testing using two NUMA nodes ###
>>> SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66) MP/MC bulk
>>> enq/dequeue (size: 8): 176.02 (173.43) SP/SC bulk enq/dequeue (size:
>>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
>>>
>>> On one of the Arm platform
>>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are
>>> ok)

Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16 
cores/node (SMT=4).  Applied all 3 patches in v5, test results are as 
follows:

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 42
MP/MC single enq/dequeue: 59
SP/SC burst enq/dequeue (size: 8): 5
MP/MC burst enq/dequeue (size: 8): 7
SP/SC burst enq/dequeue (size: 32): 2
MP/MC burst enq/dequeue (size: 32): 2

### Testing empty dequeue ###
SC empty dequeue: 7.81
MC empty dequeue: 7.81

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 5.76
MP/MC bulk enq/dequeue (size: 8): 7.66
SP/SC bulk enq/dequeue (size: 32): 2.10
MP/MC bulk enq/dequeue (size: 32): 2.57

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 13.13
MP/MC bulk enq/dequeue (size: 8): 13.98
SP/SC bulk enq/dequeue (size: 32): 3.41
MP/MC bulk enq/dequeue (size: 32): 4.45

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 11.00
MP/MC bulk enq/dequeue (size: 8): 10.95
SP/SC bulk enq/dequeue (size: 32): 3.08
MP/MC bulk enq/dequeue (size: 32): 3.40

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 63.41
MP/MC bulk enq/dequeue (size: 8): 62.70
SP/SC bulk enq/dequeue (size: 32): 15.39
MP/MC bulk enq/dequeue (size: 32): 22.96

Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-17 23:17                     ` David Christensen
@ 2019-10-18  3:18                       ` Honnappa Nagarahalli
  2019-10-18  8:04                         ` Jerin Jacob
  2019-10-18 17:23                         ` David Christensen
  0 siblings, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18  3:18 UTC (permalink / raw)
  To: David Christensen, Ananyev, Konstantin, olivier.matz, sthemmin,
	jerinj, Richardson, Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, Honnappa Nagarahalli, nd, nd

<snip>

> Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable element
> size
> 
> >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results
> >>> are as
> >> follows. The numbers in brackets are with the code on master.
> >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> >>>
> >>> RTE>>ring_perf_elem_autotest
> >>> ### Testing single element and burst enq/deq ### SP/SC single
> >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> >>> 32): 2
> >>>
> >>> ### Testing empty dequeue ###
> >>> SC empty dequeue: 2.11
> >>> MC empty dequeue: 1.41 (2.11)
> >>>
> >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> >>>
> >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> >>> enq/dequeue
> >>> (size: 32): 25.74 (20.91)
> >>>
> >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> >>> SP/SC bulk enq/dequeue (size:
> >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> >>>
> >>> On one of the Arm platform
> >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest
> >>> are
> >>> ok)
> 
> Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
> follows:
> 
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
> 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
> MP/MC burst enq/dequeue (size: 32): 2
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 7.81
> MC empty dequeue: 7.81
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 5.76
> MP/MC bulk enq/dequeue (size: 8): 7.66
> SP/SC bulk enq/dequeue (size: 32): 2.10
> MP/MC bulk enq/dequeue (size: 32): 2.57
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 13.13
> MP/MC bulk enq/dequeue (size: 8): 13.98
> SP/SC bulk enq/dequeue (size: 32): 3.41
> MP/MC bulk enq/dequeue (size: 32): 4.45
> 
> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> 
> ### Testing using two NUMA nodes ###
> SP/SC bulk enq/dequeue (size: 8): 63.41
> MP/MC bulk enq/dequeue (size: 8): 62.70
> SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> 32): 22.96
> 
Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal is to make sure the numbers with the original implementation are the same as these. Can you please run that as well?

> Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18  3:18                       ` Honnappa Nagarahalli
@ 2019-10-18  8:04                         ` Jerin Jacob
  2019-10-18 16:11                           ` Jerin Jacob
  2019-10-18 16:44                           ` Ananyev, Konstantin
  2019-10-18 17:23                         ` David Christensen
  1 sibling, 2 replies; 173+ messages in thread
From: Jerin Jacob @ 2019-10-18  8:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: David Christensen, Ananyev, Konstantin, olivier.matz, sthemmin,
	jerinj, Richardson, Bruce, david.marchand, pbhagavatula, dev,
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd

On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable element
> > size
> >
> > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results
> > >>> are as
> > >> follows. The numbers in brackets are with the code on master.
> > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > >>>
> > >>> RTE>>ring_perf_elem_autotest
> > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> > >>> 32): 2
> > >>>
> > >>> ### Testing empty dequeue ###
> > >>> SC empty dequeue: 2.11
> > >>> MC empty dequeue: 1.41 (2.11)
> > >>>
> > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > >>>
> > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> > >>> enq/dequeue
> > >>> (size: 32): 25.74 (20.91)
> > >>>
> > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> > >>> SP/SC bulk enq/dequeue (size:
> > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> > >>>
> > >>> On one of the Arm platform
> > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest
> > >>> are
> > >>> ok)
> >
> > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> > cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
> > follows:
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
> > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
> > MP/MC burst enq/dequeue (size: 32): 2
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 7.81
> > MC empty dequeue: 7.81
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 5.76
> > MP/MC bulk enq/dequeue (size: 8): 7.66
> > SP/SC bulk enq/dequeue (size: 32): 2.10
> > MP/MC bulk enq/dequeue (size: 32): 2.57
> >
> > ### Testing using two hyperthreads ###
> > SP/SC bulk enq/dequeue (size: 8): 13.13
> > MP/MC bulk enq/dequeue (size: 8): 13.98
> > SP/SC bulk enq/dequeue (size: 32): 3.41
> > MP/MC bulk enq/dequeue (size: 32): 4.45
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8): 63.41
> > MP/MC bulk enq/dequeue (size: 8): 62.70
> > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > 32): 22.96
> >
> Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal is to make sure the numbers with the original implementation are the same as these. Can you please run that as well?

Honnappa,

Your earlier perf report shows the cycles are in less than 1. That's
is due to it is using 50 or 100MHz clock in EL0.
Please check with PMU counter. See "ARM64 profiling" in

http://doc.dpdk.org/guides/prog_guide/profile_app.html


Here is the octeontx2 values. There is a regression in two core cases
as you reported earlier in x86.


RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 21

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.36
SP/SC bulk enq/dequeue (size: 32): 13.10
MP/MC bulk enq/dequeue (size: 32): 21.64

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.94
MP/MC bulk enq/dequeue (size: 8): 107.66
SP/SC bulk enq/dequeue (size: 32): 24.51
MP/MC bulk enq/dequeue (size: 32): 33.23
Test OK
RTE>>

---- after applying v5 of the patch ------

RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 289
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 40
MP/MC burst enq/dequeue (size: 8): 64
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 39.73
MP/MC bulk enq/dequeue (size: 8): 69.13
SP/SC bulk enq/dequeue (size: 32): 13.44
MP/MC bulk enq/dequeue (size: 32): 22.00

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 76.02
MP/MC bulk enq/dequeue (size: 8): 112.50
SP/SC bulk enq/dequeue (size: 32): 24.71
MP/MC bulk enq/dequeue (size: 32): 33.34
Test OK
RTE>>

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 290
MP/MC single enq/dequeue: 503
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 63
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 19

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.92
MP/MC bulk enq/dequeue (size: 8): 62.54
SP/SC bulk enq/dequeue (size: 32): 11.46
MP/MC bulk enq/dequeue (size: 32): 19.89

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 87.55
MP/MC bulk enq/dequeue (size: 8): 99.10
SP/SC bulk enq/dequeue (size: 32): 26.63
MP/MC bulk enq/dequeue (size: 32): 29.91
Test OK
RTE>>



> > Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18  8:04                         ` Jerin Jacob
@ 2019-10-18 16:11                           ` Jerin Jacob
  2019-10-21  0:27                             ` Honnappa Nagarahalli
  2019-10-18 16:44                           ` Ananyev, Konstantin
  1 sibling, 1 reply; 173+ messages in thread
From: Jerin Jacob @ 2019-10-18 16:11 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: David Christensen, Ananyev, Konstantin, olivier.matz, sthemmin,
	jerinj, Richardson, Bruce, david.marchand, pbhagavatula, dev,
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd

On Fri, Oct 18, 2019 at 1:34 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > <snip>
> >
> > > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable element
> > > size
> > >
> > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results
> > > >>> are as
> > > >> follows. The numbers in brackets are with the code on master.
> > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > >>>
> > > >>> RTE>>ring_perf_elem_autotest
> > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> > > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> > > >>> 32): 2
> > > >>>
> > > >>> ### Testing empty dequeue ###
> > > >>> SC empty dequeue: 2.11
> > > >>> MC empty dequeue: 1.41 (2.11)
> > > >>>
> > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > >>>
> > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> > > >>> enq/dequeue
> > > >>> (size: 32): 25.74 (20.91)
> > > >>>
> > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> > > >>> SP/SC bulk enq/dequeue (size:
> > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> > > >>>
> > > >>> On one of the Arm platform
> > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest
> > > >>> are
> > > >>> ok)
> > >
> > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> > > cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
> > > follows:
> > >
> > > RTE>>ring_perf_elem_autotest
> > > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
> > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> > > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
> > > MP/MC burst enq/dequeue (size: 32): 2
> > >
> > > ### Testing empty dequeue ###
> > > SC empty dequeue: 7.81
> > > MC empty dequeue: 7.81
> > >
> > > ### Testing using a single lcore ###
> > > SP/SC bulk enq/dequeue (size: 8): 5.76
> > > MP/MC bulk enq/dequeue (size: 8): 7.66
> > > SP/SC bulk enq/dequeue (size: 32): 2.10
> > > MP/MC bulk enq/dequeue (size: 32): 2.57
> > >
> > > ### Testing using two hyperthreads ###
> > > SP/SC bulk enq/dequeue (size: 8): 13.13
> > > MP/MC bulk enq/dequeue (size: 8): 13.98
> > > SP/SC bulk enq/dequeue (size: 32): 3.41
> > > MP/MC bulk enq/dequeue (size: 32): 4.45
> > >
> > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > >
> > > ### Testing using two NUMA nodes ###
> > > SP/SC bulk enq/dequeue (size: 8): 63.41
> > > MP/MC bulk enq/dequeue (size: 8): 62.70
> > > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > > 32): 22.96
> > >
> > Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal is to make sure the numbers with the original implementation are the same as these. Can you please run that as well?
>
> Honnappa,
>
> Your earlier perf report shows the cycles are in less than 1. That's
> is due to it is using 50 or 100MHz clock in EL0.
> Please check with PMU counter. See "ARM64 profiling" in
>
> http://doc.dpdk.org/guides/prog_guide/profile_app.html
>
>
> Here is the octeontx2 values. There is a regression in two core cases
> as you reported earlier in x86.
>
>
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 288
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 61
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 21
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.35
> MP/MC bulk enq/dequeue (size: 8): 67.36
> SP/SC bulk enq/dequeue (size: 32): 13.10
> MP/MC bulk enq/dequeue (size: 32): 21.64
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 75.94
> MP/MC bulk enq/dequeue (size: 8): 107.66
> SP/SC bulk enq/dequeue (size: 32): 24.51
> MP/MC bulk enq/dequeue (size: 32): 33.23
> Test OK
> RTE>>
>
> ---- after applying v5 of the patch ------
>
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 289
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 40
> MP/MC burst enq/dequeue (size: 8): 64
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 22
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 39.73
> MP/MC bulk enq/dequeue (size: 8): 69.13
> SP/SC bulk enq/dequeue (size: 32): 13.44
> MP/MC bulk enq/dequeue (size: 32): 22.00
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 76.02
> MP/MC bulk enq/dequeue (size: 8): 112.50
> SP/SC bulk enq/dequeue (size: 32): 24.71
> MP/MC bulk enq/dequeue (size: 32): 33.34
> Test OK
> RTE>>
>
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 290
> MP/MC single enq/dequeue: 503
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 63
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 19
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.92
> MP/MC bulk enq/dequeue (size: 8): 62.54
> SP/SC bulk enq/dequeue (size: 32): 11.46
> MP/MC bulk enq/dequeue (size: 32): 19.89
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 87.55
> MP/MC bulk enq/dequeue (size: 8): 99.10
> SP/SC bulk enq/dequeue (size: 32): 26.63
> MP/MC bulk enq/dequeue (size: 32): 29.91
> Test OK
> RTE>>

it looks like removal of 3/3 and keeping only 1/3 and 2/3 shows better
results in some cases


RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 439
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.48
SP/SC bulk enq/dequeue (size: 32): 13.40
MP/MC bulk enq/dequeue (size: 32): 22.03

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.94
MP/MC bulk enq/dequeue (size: 8): 105.84
SP/SC bulk enq/dequeue (size: 32): 25.11
MP/MC bulk enq/dequeue (size: 32): 33.48
Test OK
RTE>>


RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.46
SP/SC bulk enq/dequeue (size: 32): 13.42
MP/MC bulk enq/dequeue (size: 32): 22.01

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 76.04
MP/MC bulk enq/dequeue (size: 8): 104.88
SP/SC bulk enq/dequeue (size: 32): 24.75
MP/MC bulk enq/dequeue (size: 32): 34.66
Test OK
RTE>>


>
>
>
> > > Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18  8:04                         ` Jerin Jacob
  2019-10-18 16:11                           ` Jerin Jacob
@ 2019-10-18 16:44                           ` Ananyev, Konstantin
  2019-10-18 19:03                             ` Honnappa Nagarahalli
  2019-10-21  0:36                             ` Honnappa Nagarahalli
  1 sibling, 2 replies; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-18 16:44 UTC (permalink / raw)
  To: Jerin Jacob, Honnappa Nagarahalli
  Cc: David Christensen, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd


Hi everyone,


> > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results
> > > >>> are as
> > > >> follows. The numbers in brackets are with the code on master.
> > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > >>>
> > > >>> RTE>>ring_perf_elem_autotest
> > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC
> > > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size:
> > > >>> 32): 2
> > > >>>
> > > >>> ### Testing empty dequeue ###
> > > >>> SC empty dequeue: 2.11
> > > >>> MC empty dequeue: 1.41 (2.11)
> > > >>>
> > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > >>>
> > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> > > >>> enq/dequeue
> > > >>> (size: 32): 25.74 (20.91)
> > > >>>
> > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
> > > >>> SP/SC bulk enq/dequeue (size:
> > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> > > >>>
> > > >>> On one of the Arm platform
> > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest
> > > >>> are
> > > >>> ok)
> > >
> > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> > > cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
> > > follows:
> > >
> > > RTE>>ring_perf_elem_autotest
> > > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
> > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> > > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
> > > MP/MC burst enq/dequeue (size: 32): 2
> > >
> > > ### Testing empty dequeue ###
> > > SC empty dequeue: 7.81
> > > MC empty dequeue: 7.81
> > >
> > > ### Testing using a single lcore ###
> > > SP/SC bulk enq/dequeue (size: 8): 5.76
> > > MP/MC bulk enq/dequeue (size: 8): 7.66
> > > SP/SC bulk enq/dequeue (size: 32): 2.10
> > > MP/MC bulk enq/dequeue (size: 32): 2.57
> > >
> > > ### Testing using two hyperthreads ###
> > > SP/SC bulk enq/dequeue (size: 8): 13.13
> > > MP/MC bulk enq/dequeue (size: 8): 13.98
> > > SP/SC bulk enq/dequeue (size: 32): 3.41
> > > MP/MC bulk enq/dequeue (size: 32): 4.45
> > >
> > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > >
> > > ### Testing using two NUMA nodes ###
> > > SP/SC bulk enq/dequeue (size: 8): 63.41
> > > MP/MC bulk enq/dequeue (size: 8): 62.70
> > > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > > 32): 22.96
> > >
> > Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal
> is to make sure the numbers with the original implementation are the same as these. Can you please run that as well?
> 
> Honnappa,
> 
> Your earlier perf report shows the cycles are in less than 1. That's
> is due to it is using 50 or 100MHz clock in EL0.
> Please check with PMU counter. See "ARM64 profiling" in
> 
> http://doc.dpdk.org/guides/prog_guide/profile_app.html
> 
> 
> Here is the octeontx2 values. There is a regression in two core cases
> as you reported earlier in x86.
> 
> 
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 288
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 61
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 21
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.35
> MP/MC bulk enq/dequeue (size: 8): 67.36
> SP/SC bulk enq/dequeue (size: 32): 13.10
> MP/MC bulk enq/dequeue (size: 32): 21.64
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 75.94
> MP/MC bulk enq/dequeue (size: 8): 107.66
> SP/SC bulk enq/dequeue (size: 32): 24.51
> MP/MC bulk enq/dequeue (size: 32): 33.23
> Test OK
> RTE>>
> 
> ---- after applying v5 of the patch ------
> 
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 289
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 40
> MP/MC burst enq/dequeue (size: 8): 64
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 22
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 39.73
> MP/MC bulk enq/dequeue (size: 8): 69.13
> SP/SC bulk enq/dequeue (size: 32): 13.44
> MP/MC bulk enq/dequeue (size: 32): 22.00
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 76.02
> MP/MC bulk enq/dequeue (size: 8): 112.50
> SP/SC bulk enq/dequeue (size: 32): 24.71
> MP/MC bulk enq/dequeue (size: 32): 33.34
> Test OK
> RTE>>
> 
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 290
> MP/MC single enq/dequeue: 503
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 63
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 19
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.92
> MP/MC bulk enq/dequeue (size: 8): 62.54
> SP/SC bulk enq/dequeue (size: 32): 11.46
> MP/MC bulk enq/dequeue (size: 32): 19.89
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 87.55
> MP/MC bulk enq/dequeue (size: 8): 99.10
> SP/SC bulk enq/dequeue (size: 32): 26.63
> MP/MC bulk enq/dequeue (size: 32): 29.91
> Test OK
> RTE>>
> 

As I can see, there is copy&paste bug in patch #3
(that's why it probably produced some weird numbers for me first).
After fix applied (see patch below), things look pretty good on my box.
As I can see there are only 3 results noticably lower:
   SP/SC (size=8) over 2 physical cores same numa socket
   MP/MC (size=8) over 2 physical cores on different numa sockets. 
All others seems about same or better. 
Anyway I went ahead and reworked code a bit (as I suggested before)
to get rid of these huge ENQUEUE/DEQUEUE macros.
Results are very close to fixed patch #3 version (patch is also attached).
Though I suggest people hold on to re-run perf tests till we'll make ring
functional test to run for _elem_ functions too.
I started to work on that, but not sure I'll finish today (most likely Monday).
Perf results from my box, plus patches below.
Konstantin

perf results
==========

Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
  
A - ring_perf_autotest
B - ring_perf_elem_autotest + patch #3 + fix
C - B + update

### Testing using a single lcore ###	A	B	C
SP/SC bulk enq/dequeue (size: 8): 	4.06	3.06	3.22
MP/MC bulk enq/dequeue (size: 8): 	10.05	9.04	9.38
SP/SC bulk enq/dequeue (size: 32): 	2.93	1.91	1.84
MP/MC bulk enq/dequeue (size: 32): 	4.12	3.39	3.35

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 	9.24	8.92	8.89
MP/MC bulk enq/dequeue (size: 8): 	15.47	15.39	16.02
SP/SC bulk enq/dequeue (size: 32): 	5.78	3.87	3.86
MP/MC bulk enq/dequeue (size: 32): 	6.41	4.57	4.45

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 	24.14	29.89	27.05
MP/MC bulk enq/dequeue (size: 8): 	68.61	70.55	69.85
SP/SC bulk enq/dequeue (size: 32): 	12.11	12.99	13.04
MP/MC bulk enq/dequeue (size: 32): 	22.14	17.86	18.25

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 	48.78	31.98	33.57
MP/MC bulk enq/dequeue (size: 8): 	167.53	197.29	192.13
SP/SC bulk enq/dequeue (size: 32): 	31.28	21.68	21.61
MP/MC bulk enq/dequeue (size: 32): 	53.45	49.94	48.81
 
fix patch
=======
 
From a2be5a9b136333a56d466ef042c655e522ca7012 Mon Sep 17 00:00:00 2001
From: Konstantin Ananyev <konstantin.ananyev@intel.com>
Date: Fri, 18 Oct 2019 15:50:43 +0100
Subject: [PATCH] fix1

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/rte_ring_elem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 92e92f150..5e1819069 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -118,7 +118,7 @@ struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
        uint32_t sz = n * (esize / sizeof(uint32_t)); \
        if (likely(idx + n < size)) { \
                for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
-                       memcpy (ring + i, obj + i, 8 * sizeof (uint32_t)); \
+                       memcpy (ring + idx, obj + i, 8 * sizeof (uint32_t)); \
                } \
                switch (n & 0x7) { \
                case 7: \
@@ -153,7 +153,7 @@ struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
        uint32_t sz = n * (esize / sizeof(uint32_t)); \
        if (likely(idx + n < size)) { \
                for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
-                       memcpy (obj + i, ring + i, 8 * sizeof (uint32_t)); \
+                       memcpy (obj + i, ring + idx, 8 * sizeof (uint32_t)); \
                } \
                switch (n & 0x7) { \
                case 7: \
--
2.17.1

update patch (remove macros)
=========================

From 18b388e877b97e243f807f27a323e876b30869dd Mon Sep 17 00:00:00 2001
From: Konstantin Ananyev <konstantin.ananyev@intel.com>
Date: Fri, 18 Oct 2019 17:35:43 +0100
Subject: [PATCH] update1

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/rte_ring_elem.h | 141 ++++++++++++++++----------------
 1 file changed, 70 insertions(+), 71 deletions(-)

diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 5e1819069..eb706b12f 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -109,75 +109,74 @@ __rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned count,
                                unsigned esize, int socket_id, unsigned flags);

-#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n) do { \
-       unsigned int i; \
-       const uint32_t size = (r)->size; \
-       uint32_t idx = prod_head & (r)->mask; \
-       uint32_t *ring = (uint32_t *)ring_start; \
-       uint32_t *obj = (uint32_t *)obj_table; \
-       uint32_t sz = n * (esize / sizeof(uint32_t)); \
-       if (likely(idx + n < size)) { \
-               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
-                       memcpy (ring + idx, obj + i, 8 * sizeof (uint32_t)); \
-               } \
-               switch (n & 0x7) { \
-               case 7: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 6: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 5: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 4: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 3: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 2: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               case 1: \
-                       ring[idx++] = obj[i++]; /* fallthrough */ \
-               } \
-       } else { \
-               for (i = 0; idx < size; i++, idx++)\
-                       ring[idx] = obj[i]; \
-               for (idx = 0; i < n; i++, idx++) \
-                       ring[idx] = obj[i]; \
-       } \
-} while (0)
-
-#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n) do { \
-       unsigned int i; \
-       uint32_t idx = cons_head & (r)->mask; \
-       const uint32_t size = (r)->size; \
-       uint32_t *ring = (uint32_t *)ring_start; \
-       uint32_t *obj = (uint32_t *)obj_table; \
-       uint32_t sz = n * (esize / sizeof(uint32_t)); \
-       if (likely(idx + n < size)) { \
-               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
-                       memcpy (obj + i, ring + idx, 8 * sizeof (uint32_t)); \
-               } \
-               switch (n & 0x7) { \
-               case 7: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 6: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 5: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 4: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 3: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 2: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               case 1: \
-                       obj[i++] = ring[idx++]; /* fallthrough */ \
-               } \
-       } else { \
-               for (i = 0; idx < size; i++, idx++) \
-                       obj[i] = ring[idx]; \
-               for (idx = 0; i < n; i++, idx++) \
-                       obj[i] = ring[idx]; \
-       } \
-} while (0)
+static __rte_always_inline void
+copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num, uint32_t esize)
+{
+       uint32_t i, sz;
+
+       sz = (num * esize) / sizeof(uint32_t);
+
+       for (i = 0; i < (sz & ~7); i += 8)
+               memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
+
+       switch (sz & 7) {
+       case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
+       case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
+       case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
+       case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
+       case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
+       case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
+       case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
+       }
+}
+
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
+               void *obj_table, uint32_t num, uint32_t esize)
+{
+       uint32_t idx, n;
+       uint32_t *du32;
+       const uint32_t *su32;
+
+       const uint32_t size = r->size;
+
+       idx = prod_head & (r)->mask;
+
+       du32 = (uint32_t *)ring_start + idx;
+       su32 = obj_table;
+
+       if (idx + num < size)
+               copy_elems(du32, su32, num, esize);
+       else {
+               n = size - idx;
+               copy_elems(du32, su32, n, esize);
+               copy_elems(ring_start, su32 + n, num - n, esize);
+       }
+}
+
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, void *ring_start, uint32_t cons_head,
+               void *obj_table, uint32_t num, uint32_t esize)
+{
+       uint32_t idx, n;
+       uint32_t *du32;
+       const uint32_t *su32;
+
+       const uint32_t size = r->size;
+
+       idx = cons_head & (r)->mask;
+
+       su32 = (uint32_t *)ring_start + idx;
+       du32 = obj_table;
+
+       if (idx + num < size)
+               copy_elems(du32, su32, num, esize);
+       else {
+               n = size - idx;
+               copy_elems(du32, su32, n, esize);
+               copy_elems(du32 + n, ring_start, num - n, esize);
+       }
+}

 /* Between load and load. there might be cpu reorder in weak model
  * (powerpc/arm).
@@ -232,7 +231,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
        if (n == 0)
                goto end;

-       ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
+       enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);

        update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
 end:
@@ -279,7 +278,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
        if (n == 0)
                goto end;

-       DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
+       dequeue_elems(r, &r[1], cons_head, obj_table, n, esize);

        update_tail(&r->cons, cons_head, cons_next, is_sc, 0);

--
2.17.1



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18  3:18                       ` Honnappa Nagarahalli
  2019-10-18  8:04                         ` Jerin Jacob
@ 2019-10-18 17:23                         ` David Christensen
  1 sibling, 0 replies; 173+ messages in thread
From: David Christensen @ 2019-10-18 17:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, olivier.matz,
	sthemmin, jerinj, Richardson, Bruce, david.marchand,
	pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd


>> Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
>> cores/node (SMT=4).  Applied all 3 patches in v5, test results are as
>> follows:
>>
>> RTE>>ring_perf_elem_autotest
>> ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue:
>> 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
>> MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2
>> MP/MC burst enq/dequeue (size: 32): 2
>>
>> ### Testing empty dequeue ###
>> SC empty dequeue: 7.81
>> MC empty dequeue: 7.81
>>
>> ### Testing using a single lcore ###
>> SP/SC bulk enq/dequeue (size: 8): 5.76
>> MP/MC bulk enq/dequeue (size: 8): 7.66
>> SP/SC bulk enq/dequeue (size: 32): 2.10
>> MP/MC bulk enq/dequeue (size: 32): 2.57
>>
>> ### Testing using two hyperthreads ###
>> SP/SC bulk enq/dequeue (size: 8): 13.13
>> MP/MC bulk enq/dequeue (size: 8): 13.98
>> SP/SC bulk enq/dequeue (size: 32): 3.41
>> MP/MC bulk enq/dequeue (size: 32): 4.45
>>
>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
>> 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
>> (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
>>
>> ### Testing using two NUMA nodes ###
>> SP/SC bulk enq/dequeue (size: 8): 63.41
>> MP/MC bulk enq/dequeue (size: 8): 62.70
>> SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
>> 32): 22.96
>>
> Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal is to make sure the numbers with the original implementation are the same as these. Can you please run that as well?
> 
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 42
MP/MC single enq/dequeue: 59
SP/SC burst enq/dequeue (size: 8): 6
MP/MC burst enq/dequeue (size: 8): 8
SP/SC burst enq/dequeue (size: 32): 2
MP/MC burst enq/dequeue (size: 32): 3

### Testing empty dequeue ###
SC empty dequeue: 7.81
MC empty dequeue: 7.81

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 6.91
MP/MC bulk enq/dequeue (size: 8): 8.87
SP/SC bulk enq/dequeue (size: 32): 2.55
MP/MC bulk enq/dequeue (size: 32): 3.04

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 11.70
MP/MC bulk enq/dequeue (size: 8): 13.56
SP/SC bulk enq/dequeue (size: 32): 3.48
MP/MC bulk enq/dequeue (size: 32): 3.95

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 10.86
MP/MC bulk enq/dequeue (size: 8): 11.11
SP/SC bulk enq/dequeue (size: 32): 2.97
MP/MC bulk enq/dequeue (size: 32): 3.43

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 48.07
MP/MC bulk enq/dequeue (size: 8): 67.38
SP/SC bulk enq/dequeue (size: 32): 13.04
MP/MC bulk enq/dequeue (size: 32): 27.10
Test OK

Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18 16:44                           ` Ananyev, Konstantin
@ 2019-10-18 19:03                             ` Honnappa Nagarahalli
  2019-10-21  0:36                             ` Honnappa Nagarahalli
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18 19:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, Jerin Jacob
  Cc: David Christensen, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: RE: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable
> element size
> 
> 
> Hi everyone,
> 
> 
> > > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the
> > > > >>> results are as
> > > > >> follows. The numbers in brackets are with the code on master.
> > > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > > >>>
> > > > >>> RTE>>ring_perf_elem_autotest
> > > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6
> > > > >>> SP/SC burst enq/dequeue (size: 32): 1 (2) MP/MC burst
> enq/dequeue (size:
> > > > >>> 32): 2
> > > > >>>
> > > > >>> ### Testing empty dequeue ###
> > > > >>> SC empty dequeue: 2.11
> > > > >>> MC empty dequeue: 1.41 (2.11)
> > > > >>>
> > > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > > >>>
> > > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10
> > > > >>> (71.27) SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC
> > > > >>> bulk enq/dequeue
> > > > >>> (size: 32): 25.74 (20.91)
> > > > >>>
> > > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02
> > > > >>> (173.43) SP/SC bulk enq/dequeue (size:
> > > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17
> > > > >>> (46.74)
> > > > >>>
> > > > >>> On one of the Arm platform
> > > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the
> > > > >>> rest are
> > > > >>> ok)
> > > >
> > > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and
> > > > 16 cores/node (SMT=4).  Applied all 3 patches in v5, test results
> > > > are as
> > > > follows:
> > > >
> > > > RTE>>ring_perf_elem_autotest
> > > > ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue:
> > > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8):
> > > > 5 MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue
> > > > (size: 32): 2 MP/MC burst enq/dequeue (size: 32): 2
> > > >
> > > > ### Testing empty dequeue ###
> > > > SC empty dequeue: 7.81
> > > > MC empty dequeue: 7.81
> > > >
> > > > ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > 8): 5.76 MP/MC bulk enq/dequeue (size: 8): 7.66 SP/SC bulk
> > > > enq/dequeue (size: 32): 2.10 MP/MC bulk enq/dequeue (size: 32):
> > > > 2.57
> > > >
> > > > ### Testing using two hyperthreads ### SP/SC bulk enq/dequeue
> > > > (size: 8): 13.13 MP/MC bulk enq/dequeue (size: 8): 13.98 SP/SC
> > > > bulk enq/dequeue (size: 32): 3.41 MP/MC bulk enq/dequeue (size:
> > > > 32): 4.45
> > > >
> > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> 8):
> > > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk
> > > > enq/dequeue
> > > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > > >
> > > > ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > > > 8): 63.41 MP/MC bulk enq/dequeue (size: 8): 62.70 SP/SC bulk
> > > > enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > > > 32): 22.96
> > > >
> > > Thanks for running this. There is another test 'ring_perf_autotest'
> > > which provides the numbers with the original implementation. The
> > > goal
> > is to make sure the numbers with the original implementation are the same
> as these. Can you please run that as well?
> >
> > Honnappa,
> >
> > Your earlier perf report shows the cycles are in less than 1. That's
> > is due to it is using 50 or 100MHz clock in EL0.
> > Please check with PMU counter. See "ARM64 profiling" in
> >
> > http://doc.dpdk.org/guides/prog_guide/profile_app.html
> >
> >
> > Here is the octeontx2 values. There is a regression in two core cases
> > as you reported earlier in x86.
> >
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 288 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 21
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size:
> > 8): 67.36 SP/SC bulk enq/dequeue (size: 32): 13.10 MP/MC bulk
> > enq/dequeue (size: 32): 21.64
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 75.94 MP/MC bulk enq/dequeue (size: 8): 107.66 SP/SC bulk
> > enq/dequeue (size: 32): 24.51 MP/MC bulk enq/dequeue (size: 32): 33.23
> > Test OK
> > RTE>>
> >
> > ---- after applying v5 of the patch ------
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 289 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 40 MP/MC burst enq/dequeue (size: 8): 64 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 39.73 MP/MC bulk enq/dequeue (size:
> > 8): 69.13 SP/SC bulk enq/dequeue (size: 32): 13.44 MP/MC bulk
> > enq/dequeue (size: 32): 22.00
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 76.02 MP/MC bulk enq/dequeue (size: 8): 112.50 SP/SC bulk
> > enq/dequeue (size: 32): 24.71 MP/MC bulk enq/dequeue (size: 32): 33.34
> > Test OK
> > RTE>>
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 290 MP/MC single enq/dequeue: 503 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 63 SP/SC burst
> > enq/dequeue (size: 32): 11 MP/MC burst enq/dequeue (size: 32): 19
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.92 MP/MC bulk enq/dequeue (size:
> > 8): 62.54 SP/SC bulk enq/dequeue (size: 32): 11.46 MP/MC bulk
> > enq/dequeue (size: 32): 19.89
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 87.55 MP/MC bulk enq/dequeue (size: 8): 99.10 SP/SC bulk
> > enq/dequeue (size: 32): 26.63 MP/MC bulk enq/dequeue (size: 32): 29.91
> > Test OK
> > RTE>>
> >
> 
> As I can see, there is copy&paste bug in patch #3 (that's why it probably
> produced some weird numbers for me first).
Apologies on this. In the hindsight, should have added the unit tests.

> After fix applied (see patch below), things look pretty good on my box.
> As I can see there are only 3 results noticably lower:
>    SP/SC (size=8) over 2 physical cores same numa socket
>    MP/MC (size=8) over 2 physical cores on different numa sockets.
Is this ok for you?

> All others seems about same or better.
> Anyway I went ahead and reworked code a bit (as I suggested before) to get
> rid of these huge ENQUEUE/DEQUEUE macros.
> Results are very close to fixed patch #3 version (patch is also attached).
> Though I suggest people hold on to re-run perf tests till we'll make ring
> functional test to run for _elem_ functions too.
> I started to work on that, but not sure I'll finish today (most likely Monday).
> Perf results from my box, plus patches below.
> Konstantin
> 
> perf results
> ==========
> 
> Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> 
> A - ring_perf_autotest
> B - ring_perf_elem_autotest + patch #3 + fix C - B + update
> 
> ### Testing using a single lcore ###	A	B	C
> SP/SC bulk enq/dequeue (size: 8): 	4.06	3.06	3.22
> MP/MC bulk enq/dequeue (size: 8): 	10.05	9.04	9.38
> SP/SC bulk enq/dequeue (size: 32): 	2.93	1.91	1.84
> MP/MC bulk enq/dequeue (size: 32): 	4.12	3.39	3.35
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 	9.24	8.92	8.89
> MP/MC bulk enq/dequeue (size: 8): 	15.47	15.39	16.02
> SP/SC bulk enq/dequeue (size: 32): 	5.78	3.87	3.86
> MP/MC bulk enq/dequeue (size: 32): 	6.41	4.57	4.45
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 	24.14	29.89	27.05
> MP/MC bulk enq/dequeue (size: 8): 	68.61	70.55	69.85
> SP/SC bulk enq/dequeue (size: 32): 	12.11	12.99	13.04
> MP/MC bulk enq/dequeue (size: 32): 	22.14	17.86	18.25
> 
> ### Testing using two NUMA nodes ###
> SP/SC bulk enq/dequeue (size: 8): 	48.78	31.98	33.57
> MP/MC bulk enq/dequeue (size: 8): 	167.53	197.29	192.13
> SP/SC bulk enq/dequeue (size: 32): 	31.28	21.68	21.61
> MP/MC bulk enq/dequeue (size: 32): 	53.45	49.94	48.81
> 
> fix patch
> =======
> 
> From a2be5a9b136333a56d466ef042c655e522ca7012 Mon Sep 17 00:00:00
> 2001
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Date: Fri, 18 Oct 2019 15:50:43 +0100
> Subject: [PATCH] fix1
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 92e92f150..5e1819069 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -118,7 +118,7 @@ struct rte_ring *rte_ring_create_elem(const char
> *name, unsigned count,
>         uint32_t sz = n * (esize / sizeof(uint32_t)); \
>         if (likely(idx + n < size)) { \
>                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (ring + i, obj + i, 8 * sizeof (uint32_t)); \
> +                       memcpy (ring + idx, obj + i, 8 * sizeof
> + (uint32_t)); \
>                 } \
>                 switch (n & 0x7) { \
>                 case 7: \
> @@ -153,7 +153,7 @@ struct rte_ring *rte_ring_create_elem(const char
> *name, unsigned count,
>         uint32_t sz = n * (esize / sizeof(uint32_t)); \
>         if (likely(idx + n < size)) { \
>                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (obj + i, ring + i, 8 * sizeof (uint32_t)); \
> +                       memcpy (obj + i, ring + idx, 8 * sizeof
> + (uint32_t)); \
>                 } \
>                 switch (n & 0x7) { \
>                 case 7: \
> --
> 2.17.1
> 
> update patch (remove macros)
> =========================
> 
> From 18b388e877b97e243f807f27a323e876b30869dd Mon Sep 17 00:00:00
> 2001
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Date: Fri, 18 Oct 2019 17:35:43 +0100
> Subject: [PATCH] update1
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 141 ++++++++++++++++----------------
>  1 file changed, 70 insertions(+), 71 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 5e1819069..eb706b12f 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -109,75 +109,74 @@ __rte_experimental  struct rte_ring
> *rte_ring_create_elem(const char *name, unsigned count,
>                                 unsigned esize, int socket_id, unsigned flags);
> 
> -#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n)
> do { \
> -       unsigned int i; \
> -       const uint32_t size = (r)->size; \
> -       uint32_t idx = prod_head & (r)->mask; \
> -       uint32_t *ring = (uint32_t *)ring_start; \
> -       uint32_t *obj = (uint32_t *)obj_table; \
> -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> -       if (likely(idx + n < size)) { \
> -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (ring + idx, obj + i, 8 * sizeof (uint32_t)); \
> -               } \
> -               switch (n & 0x7) { \
> -               case 7: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 6: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 5: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 4: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 3: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 2: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 1: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               } \
> -       } else { \
> -               for (i = 0; idx < size; i++, idx++)\
> -                       ring[idx] = obj[i]; \
> -               for (idx = 0; i < n; i++, idx++) \
> -                       ring[idx] = obj[i]; \
> -       } \
> -} while (0)
> -
> -#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n)
> do { \
> -       unsigned int i; \
> -       uint32_t idx = cons_head & (r)->mask; \
> -       const uint32_t size = (r)->size; \
> -       uint32_t *ring = (uint32_t *)ring_start; \
> -       uint32_t *obj = (uint32_t *)obj_table; \
> -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> -       if (likely(idx + n < size)) { \
> -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (obj + i, ring + idx, 8 * sizeof (uint32_t)); \
> -               } \
> -               switch (n & 0x7) { \
> -               case 7: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 6: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 5: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 4: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 3: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 2: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 1: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               } \
> -       } else { \
> -               for (i = 0; idx < size; i++, idx++) \
> -                       obj[i] = ring[idx]; \
> -               for (idx = 0; i < n; i++, idx++) \
> -                       obj[i] = ring[idx]; \
> -       } \
> -} while (0)
> +static __rte_always_inline void
> +copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> +uint32_t esize) {
> +       uint32_t i, sz;
> +
> +       sz = (num * esize) / sizeof(uint32_t);
> +
> +       for (i = 0; i < (sz & ~7); i += 8)
> +               memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> +
> +       switch (sz & 7) {
> +       case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> +       case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> +       case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> +       case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> +       case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> +       case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> +       case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> +       }
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> +               void *obj_table, uint32_t num, uint32_t esize) {
> +       uint32_t idx, n;
> +       uint32_t *du32;
> +       const uint32_t *su32;
> +
> +       const uint32_t size = r->size;
> +
> +       idx = prod_head & (r)->mask;
> +
> +       du32 = (uint32_t *)ring_start + idx;
> +       su32 = obj_table;
> +
> +       if (idx + num < size)
> +               copy_elems(du32, su32, num, esize);
> +       else {
> +               n = size - idx;
> +               copy_elems(du32, su32, n, esize);
> +               copy_elems(ring_start, su32 + n, num - n, esize);
> +       }
> +}
> +
> +static __rte_always_inline void
> +dequeue_elems(struct rte_ring *r, void *ring_start, uint32_t cons_head,
> +               void *obj_table, uint32_t num, uint32_t esize) {
> +       uint32_t idx, n;
> +       uint32_t *du32;
> +       const uint32_t *su32;
> +
> +       const uint32_t size = r->size;
> +
> +       idx = cons_head & (r)->mask;
> +
> +       su32 = (uint32_t *)ring_start + idx;
> +       du32 = obj_table;
> +
> +       if (idx + num < size)
> +               copy_elems(du32, su32, num, esize);
> +       else {
> +               n = size - idx;
> +               copy_elems(du32, su32, n, esize);
> +               copy_elems(du32 + n, ring_start, num - n, esize);
> +       }
> +}
> 
>  /* Between load and load. there might be cpu reorder in weak model
>   * (powerpc/arm).
> @@ -232,7 +231,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void
> * const obj_table,
>         if (n == 0)
>                 goto end;
> 
> -       ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
> +       enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> 
>         update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
>  end:
> @@ -279,7 +278,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void
> *obj_table,
>         if (n == 0)
>                 goto end;
> 
> -       DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
> +       dequeue_elems(r, &r[1], cons_head, obj_table, n, esize);
> 
>         update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> 
> --
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (10 preceding siblings ...)
  2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-10-21  0:22   ` Honnappa Nagarahalli
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
                       ` (6 more replies)
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                     ` (3 subsequent siblings)
  15 siblings, 7 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation. The patch adds same performance tests that are run
for existing APIs. This allows for performance comparison.

I also tested with memcpy. x86 shows significant improvements on bulk
and burst tests. On the Arm platform, I used, there is a drop of
4% to 6% in few tests. May be this is something that we can explore
later.

Note that this version skips changes to other libraries as I would
like to get an agreement on the implementation from the community.
They will be added once there is agreement on the rte_ring changes.

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for configurable element size ring
  test/ring: add perf tests for configurable element size ring
  lib/ring: copy ring elements using memcpy partially
  lib/ring: improved copy function to copy ring elements

 app/test/Makefile                    |   2 +
 app/test/meson.build                 |   2 +
 app/test/test_ring_elem.c            | 859 +++++++++++++++++++++++++++
 app/test/test_ring_perf.c            |  22 +-
 app/test/test_ring_perf_elem.c       | 419 +++++++++++++
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   4 +
 lib/librte_ring/rte_ring.c           |  34 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 818 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 11 files changed, 2147 insertions(+), 19 deletions(-)
 create mode 100644 app/test/test_ring_elem.c
 create mode 100644 app/test/test_ring_perf_elem.c
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-10-21  0:22     ` Honnappa Nagarahalli
  2019-10-23  9:49       ` Olivier Matz
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Use division instead of modulo operation to calculate more
accurate cycle count.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring_perf.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index b6ad703bb..e3e17f251 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -284,10 +284,10 @@ test_single_enqueue_dequeue(struct rte_ring *r)
 	}
 	const uint64_t mc_end = rte_rdtsc();
 
-	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
-			(sc_end-sc_start) >> iter_shift);
-	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
-			(mc_end-mc_start) >> iter_shift);
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
 }
 
 /*
@@ -322,13 +322,15 @@ test_burst_enqueue_dequeue(struct rte_ring *r)
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
-		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) / bulk_sizes[sz];
-		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) / bulk_sizes[sz];
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
 
-		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				mc_avg);
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2019-10-21  0:22     ` Honnappa Nagarahalli
  2019-10-23  9:59       ` Olivier Matz
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |   3 +-
 lib/librte_ring/meson.build          |   4 +
 lib/librte_ring/rte_ring.c           |  44 +-
 lib/librte_ring/rte_ring.h           |   1 +
 lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |   2 +
 6 files changed, 991 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 21a36770d..515a967bb 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ab8b0b469..7ebaba919 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -4,5 +4,9 @@
 version = 2
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..e95285259 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,41 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned count, unsigned esize)
 {
 	ssize_t sz;
 
+	/* Supported esize values are 4/8/16.
+	 * Others can be added on need basis.
+	 */
+	if (esize != 4 && esize != 8 && esize != 16) {
+		RTE_LOG(ERR, RING,
+			"Unsupported esize value. Supported values are 4, 8 and 16\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be power of 2, and not exceed %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(count, sizeof(void *));
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +133,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned count, unsigned esize,
+		int socket_id, unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +154,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(count, esize);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +201,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, count, sizeof(void *), socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..7e9914567
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,946 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with flexible element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned int count, unsigned int esize);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported.
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned int count,
+			unsigned int esize, int socket_id, unsigned int flags);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 8) \
+		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
+	else if (esize == 16) \
+		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
+} while (0)
+
+#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(uint32_t)0x7))); i += 8, idx += 8) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+			ring[idx + 4] = obj[i + 4]; \
+			ring[idx + 5] = obj[i + 5]; \
+			ring[idx + 6] = obj[i + 6]; \
+			ring[idx + 7] = obj[i + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 6: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 5: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 4: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(uint32_t)0x3))); i += 4, idx += 4) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+			ring[idx + 2] = obj[i + 2]; \
+			ring[idx + 3] = obj[i + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 2: \
+			ring[idx++] = obj[i++]; /* fallthrough */ \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
+	unsigned int i; \
+	const uint32_t size = (r)->size; \
+	uint32_t idx = prod_head & (r)->mask; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			ring[idx] = obj[i]; \
+			ring[idx + 1] = obj[i + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			ring[idx++] = obj[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			ring[idx] = obj[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			ring[idx] = obj[i]; \
+	} \
+} while (0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions.
+ */
+#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
+	if (esize == 4) \
+		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 8) \
+		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
+	else if (esize == 16) \
+		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
+} while (0)
+
+#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint32_t *ring = (uint32_t *)ring_start; \
+	uint32_t *obj = (uint32_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(uint32_t)0x7)); i += 8, idx += 8) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+			obj[i + 4] = ring[idx + 4]; \
+			obj[i + 5] = ring[idx + 5]; \
+			obj[i + 6] = ring[idx + 6]; \
+			obj[i + 7] = ring[idx + 7]; \
+		} \
+		switch (n & 0x7) { \
+		case 7: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 6: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 5: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 4: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	uint64_t *ring = (uint64_t *)ring_start; \
+	uint64_t *obj = (uint64_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(uint32_t)0x3)); i += 4, idx += 4) {\
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+			obj[i + 2] = ring[idx + 2]; \
+			obj[i + 3] = ring[idx + 3]; \
+		} \
+		switch (n & 0x3) { \
+		case 3: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 2: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		case 1: \
+			obj[i++] = ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
+	unsigned int i; \
+	uint32_t idx = cons_head & (r)->mask; \
+	const uint32_t size = (r)->size; \
+	__uint128_t *ring = (__uint128_t *)ring_start; \
+	__uint128_t *obj = (__uint128_t *)obj_table; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
+			obj[i] = ring[idx]; \
+			obj[i + 1] = ring[idx + 1]; \
+		} \
+		switch (n & 0x1) { \
+		case 1: \
+			obj[i++] = ring[idx++]; /* fallthrough */ \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj[i] = ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj[i] = ring[idx]; \
+	} \
+} while (0)
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, void * const obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   Currently, sizes 4, 8 and 16 are supported. This should be the same
+ *   as passed while creating the ring, otherwise the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 510c1386e..e410a7503 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -21,6 +21,8 @@ DPDK_2.2 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2019-10-21  0:22     ` Honnappa Nagarahalli
  2019-10-23 10:01       ` Olivier Matz
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 4/6] test/ring: add perf " Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Add functional tests for rte_ring_xxx_elem APIs. At this point these
are derived mainly from existing rte_ring_xxx test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile         |   1 +
 app/test/meson.build      |   1 +
 app/test/test_ring_elem.c | 859 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 861 insertions(+)
 create mode 100644 app/test/test_ring_elem.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..483865b4a 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,6 +77,7 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_elem.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..1ca25c00a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,6 +100,7 @@ test_sources = files('commands.c',
 	'test_red.c',
 	'test_reorder.c',
 	'test_ring.c',
+	'test_ring_elem.c',
 	'test_ring_perf.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_elem.c b/app/test/test_ring_elem.c
new file mode 100644
index 000000000..54ae35a71
--- /dev/null
+++ b/app/test/test_ring_elem.c
@@ -0,0 +1,859 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <string.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_launch.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+#include <rte_random.h>
+#include <rte_errno.h>
+#include <rte_hexdump.h>
+
+#include "test.h"
+
+/*
+ * Ring
+ * ====
+ *
+ * #. Basic tests: done on one core:
+ *
+ *    - Using single producer/single consumer functions:
+ *
+ *      - Enqueue one object, two objects, MAX_BULK objects
+ *      - Dequeue one object, two objects, MAX_BULK objects
+ *      - Check that dequeued pointers are correct
+ *
+ *    - Using multi producers/multi consumers functions:
+ *
+ *      - Enqueue one object, two objects, MAX_BULK objects
+ *      - Dequeue one object, two objects, MAX_BULK objects
+ *      - Check that dequeued pointers are correct
+ *
+ * #. Performance tests.
+ *
+ * Tests done in test_ring_perf.c
+ */
+
+#define RING_SIZE 4096
+#define MAX_BULK 32
+
+static rte_atomic32_t synchro;
+
+#define	TEST_RING_VERIFY(exp)						\
+	if (!(exp)) {							\
+		printf("error at %s:%d\tcondition " #exp " failed\n",	\
+		    __func__, __LINE__);				\
+		rte_ring_dump(stdout, r);				\
+		return -1;						\
+	}
+
+#define	TEST_RING_FULL_EMTPY_ITER	8
+
+/*
+ * helper routine for test_ring_basic
+ */
+static int
+test_ring_basic_full_empty(struct rte_ring *r, void * const src, void *dst)
+{
+	unsigned i, rand;
+	const unsigned rsz = RING_SIZE - 1;
+
+	printf("Basic full/empty test\n");
+
+	for (i = 0; TEST_RING_FULL_EMTPY_ITER != i; i++) {
+
+		/* random shift in the ring */
+		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
+		printf("%s: iteration %u, random shift: %u;\n",
+		    __func__, i, rand);
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk_elem(r, src, 8, rand,
+				NULL) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk_elem(r, dst, 8, rand,
+				NULL) == rand);
+
+		/* fill the ring */
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk_elem(r, src, 8, rsz, NULL) != 0);
+		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
+		TEST_RING_VERIFY(rsz == rte_ring_count(r));
+		TEST_RING_VERIFY(rte_ring_full(r));
+		TEST_RING_VERIFY(0 == rte_ring_empty(r));
+
+		/* empty the ring */
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk_elem(r, dst, 8, rsz,
+				NULL) == rsz);
+		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
+		TEST_RING_VERIFY(0 == rte_ring_count(r));
+		TEST_RING_VERIFY(0 == rte_ring_full(r));
+		TEST_RING_VERIFY(rte_ring_empty(r));
+
+		/* check data */
+		TEST_RING_VERIFY(0 == memcmp(src, dst, rsz));
+		rte_ring_dump(stdout, r);
+	}
+	return 0;
+}
+
+static int
+test_ring_basic(struct rte_ring *r)
+{
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned i, num_elems;
+
+	/* alloc dummy object pointers */
+	src = malloc(RING_SIZE*2*sizeof(void *));
+	if (src == NULL)
+		goto fail;
+
+	for (i = 0; i < RING_SIZE*2 ; i++) {
+		src[i] = (void *)(unsigned long)i;
+	}
+	cur_src = src;
+
+	/* alloc some room for copied objects */
+	dst = malloc(RING_SIZE*2*sizeof(void *));
+	if (dst == NULL)
+		goto fail;
+
+	memset(dst, 0, RING_SIZE*2*sizeof(void *));
+	cur_dst = dst;
+
+	printf("enqueue 1 obj\n");
+	ret = rte_ring_sp_enqueue_bulk_elem(r, cur_src, 8, 1, NULL);
+	cur_src += 1;
+	if (ret == 0)
+		goto fail;
+
+	printf("enqueue 2 objs\n");
+	ret = rte_ring_sp_enqueue_bulk_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret == 0)
+		goto fail;
+
+	printf("enqueue MAX_BULK objs\n");
+	ret = rte_ring_sp_enqueue_bulk_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue 1 obj\n");
+	ret = rte_ring_sc_dequeue_bulk_elem(r, cur_dst, 8, 1, NULL);
+	cur_dst += 1;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue 2 objs\n");
+	ret = rte_ring_sc_dequeue_bulk_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue MAX_BULK objs\n");
+	ret = rte_ring_sc_dequeue_bulk_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK;
+	if (ret == 0)
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("enqueue 1 obj\n");
+	ret = rte_ring_mp_enqueue_bulk_elem(r, cur_src, 8, 1, NULL);
+	cur_src += 1;
+	if (ret == 0)
+		goto fail;
+
+	printf("enqueue 2 objs\n");
+	ret = rte_ring_mp_enqueue_bulk_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret == 0)
+		goto fail;
+
+	printf("enqueue MAX_BULK objs\n");
+	ret = rte_ring_mp_enqueue_bulk_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue 1 obj\n");
+	ret = rte_ring_mc_dequeue_bulk_elem(r, cur_dst, 8, 1, NULL);
+	cur_dst += 1;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue 2 objs\n");
+	ret = rte_ring_mc_dequeue_bulk_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret == 0)
+		goto fail;
+
+	printf("dequeue MAX_BULK objs\n");
+	ret = rte_ring_mc_dequeue_bulk_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK;
+	if (ret == 0)
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("fill and empty the ring\n");
+	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
+		ret = rte_ring_mp_enqueue_bulk_elem(r, cur_src, 8, MAX_BULK, NULL);
+		cur_src += MAX_BULK;
+		if (ret == 0)
+			goto fail;
+		ret = rte_ring_mc_dequeue_bulk_elem(r, cur_dst, 8, MAX_BULK, NULL);
+		cur_dst += MAX_BULK;
+		if (ret == 0)
+			goto fail;
+	}
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	if (test_ring_basic_full_empty(r, src, dst) != 0)
+		goto fail;
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("test default bulk enqueue / dequeue\n");
+	num_elems = 16;
+
+	cur_src = src;
+	cur_dst = dst;
+
+	ret = rte_ring_enqueue_bulk_elem(r, cur_src, 8, num_elems, NULL);
+	cur_src += num_elems;
+	if (ret == 0) {
+		printf("Cannot enqueue\n");
+		goto fail;
+	}
+	ret = rte_ring_enqueue_bulk_elem(r, cur_src, 8, num_elems, NULL);
+	cur_src += num_elems;
+	if (ret == 0) {
+		printf("Cannot enqueue\n");
+		goto fail;
+	}
+	ret = rte_ring_dequeue_bulk_elem(r, cur_dst, 8, num_elems, NULL);
+	cur_dst += num_elems;
+	if (ret == 0) {
+		printf("Cannot dequeue\n");
+		goto fail;
+	}
+	ret = rte_ring_dequeue_bulk_elem(r, cur_dst, 8, num_elems, NULL);
+	cur_dst += num_elems;
+	if (ret == 0) {
+		printf("Cannot dequeue2\n");
+		goto fail;
+	}
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	ret = rte_ring_mp_enqueue_elem(r, cur_src, 8);
+	if (ret != 0)
+		goto fail;
+
+	ret = rte_ring_mc_dequeue_elem(r, cur_dst, 8);
+	if (ret != 0)
+		goto fail;
+
+	free(src);
+	free(dst);
+	return 0;
+
+ fail:
+	free(src);
+	free(dst);
+	return -1;
+}
+
+static int
+test_ring_burst_basic(struct rte_ring *r)
+{
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned i;
+
+	/* alloc dummy object pointers */
+	src = malloc(RING_SIZE*2*sizeof(void *));
+	if (src == NULL)
+		goto fail;
+
+	for (i = 0; i < RING_SIZE*2 ; i++) {
+		src[i] = (void *)(unsigned long)i;
+	}
+	cur_src = src;
+
+	/* alloc some room for copied objects */
+	dst = malloc(RING_SIZE*2*sizeof(void *));
+	if (dst == NULL)
+		goto fail;
+
+	memset(dst, 0, RING_SIZE*2*sizeof(void *));
+	cur_dst = dst;
+
+	printf("Test SP & SC basic functions \n");
+	printf("enqueue 1 obj\n");
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, 1, NULL);
+	cur_src += 1;
+	if (ret != 1)
+		goto fail;
+
+	printf("enqueue 2 objs\n");
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret != 2)
+		goto fail;
+
+	printf("enqueue MAX_BULK objs\n");
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK;
+	if (ret != MAX_BULK)
+		goto fail;
+
+	printf("dequeue 1 obj\n");
+	ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, 1, NULL);
+	cur_dst += 1;
+	if (ret != 1)
+		goto fail;
+
+	printf("dequeue 2 objs\n");
+	ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret != 2)
+		goto fail;
+
+	printf("dequeue MAX_BULK objs\n");
+	ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK;
+	if (ret != MAX_BULK)
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("Test enqueue without enough memory space \n");
+	for (i = 0; i < (RING_SIZE/MAX_BULK - 1); i++) {
+		ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+		cur_src += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+	}
+
+	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret != 2)
+		goto fail;
+
+	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
+	/* Always one free entry left */
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK - 3;
+	if (ret != MAX_BULK - 3)
+		goto fail;
+
+	printf("Test if ring is full  \n");
+	if (rte_ring_full(r) != 1)
+		goto fail;
+
+	printf("Test enqueue for a full entry  \n");
+	ret = rte_ring_sp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+	if (ret != 0)
+		goto fail;
+
+	printf("Test dequeue without enough objects \n");
+	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
+		ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+		cur_dst += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+	}
+
+	/* Available memory space for the exact MAX_BULK entries */
+	ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret != 2)
+		goto fail;
+
+	ret = rte_ring_sc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK - 3;
+	if (ret != MAX_BULK - 3)
+		goto fail;
+
+	printf("Test if ring is empty \n");
+	/* Check if ring is empty */
+	if (1 != rte_ring_empty(r))
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("Test MP & MC basic functions \n");
+
+	printf("enqueue 1 obj\n");
+	ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, 1, NULL);
+	cur_src += 1;
+	if (ret != 1)
+		goto fail;
+
+	printf("enqueue 2 objs\n");
+	ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret != 2)
+		goto fail;
+
+	printf("enqueue MAX_BULK objs\n");
+	ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK;
+	if (ret != MAX_BULK)
+		goto fail;
+
+	printf("dequeue 1 obj\n");
+	ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, 1, NULL);
+	cur_dst += 1;
+	if (ret != 1)
+		goto fail;
+
+	printf("dequeue 2 objs\n");
+	ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret != 2)
+		goto fail;
+
+	printf("dequeue MAX_BULK objs\n");
+	ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK;
+	if (ret != MAX_BULK)
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("fill and empty the ring\n");
+	for (i = 0; i < RING_SIZE/MAX_BULK; i++) {
+		ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+		cur_src += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+		ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+		cur_dst += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+	}
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("Test enqueue without enough memory space \n");
+	for (i = 0; i < RING_SIZE/MAX_BULK - 1; i++) {
+		ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+		cur_src += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+	}
+
+	/* Available memory space for the exact MAX_BULK objects */
+	ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret != 2)
+		goto fail;
+
+	ret = rte_ring_mp_enqueue_burst_elem(r, cur_src, 8, MAX_BULK, NULL);
+	cur_src += MAX_BULK - 3;
+	if (ret != MAX_BULK - 3)
+		goto fail;
+
+
+	printf("Test dequeue without enough objects \n");
+	for (i = 0; i < RING_SIZE/MAX_BULK - 1; i++) {
+		ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+		cur_dst += MAX_BULK;
+		if (ret != MAX_BULK)
+			goto fail;
+	}
+
+	/* Available objects - the exact MAX_BULK */
+	ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret != 2)
+		goto fail;
+
+	ret = rte_ring_mc_dequeue_burst_elem(r, cur_dst, 8, MAX_BULK, NULL);
+	cur_dst += MAX_BULK - 3;
+	if (ret != MAX_BULK - 3)
+		goto fail;
+
+	/* check data */
+	if (memcmp(src, dst, cur_dst - dst)) {
+		rte_hexdump(stdout, "src", src, cur_src - src);
+		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+		printf("data after dequeue is not the same\n");
+		goto fail;
+	}
+
+	cur_src = src;
+	cur_dst = dst;
+
+	printf("Covering rte_ring_enqueue_burst functions \n");
+
+	ret = rte_ring_enqueue_burst_elem(r, cur_src, 8, 2, NULL);
+	cur_src += 2;
+	if (ret != 2)
+		goto fail;
+
+	ret = rte_ring_dequeue_burst_elem(r, cur_dst, 8, 2, NULL);
+	cur_dst += 2;
+	if (ret != 2)
+		goto fail;
+
+	/* Free memory before test completed */
+	free(src);
+	free(dst);
+	return 0;
+
+ fail:
+	free(src);
+	free(dst);
+	return -1;
+}
+
+/*
+ * it will always fail to create ring with a wrong ring size number in this function
+ */
+static int
+test_ring_creation_with_wrong_size(void)
+{
+	struct rte_ring * rp = NULL;
+
+	/* Test if ring size is not power of 2 */
+	rp = rte_ring_create_elem("test_bad_ring_size", RING_SIZE + 1, 8, SOCKET_ID_ANY, 0);
+	if (NULL != rp) {
+		return -1;
+	}
+
+	/* Test if ring size is exceeding the limit */
+	rp = rte_ring_create_elem("test_bad_ring_size", (RTE_RING_SZ_MASK + 1), 8, SOCKET_ID_ANY, 0);
+	if (NULL != rp) {
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * it tests if it would always fail to create ring with an used ring name
+ */
+static int
+test_ring_creation_with_an_used_name(void)
+{
+	struct rte_ring * rp;
+
+	rp = rte_ring_create_elem("test", RING_SIZE, 8, SOCKET_ID_ANY, 0);
+	if (NULL != rp)
+		return -1;
+
+	return 0;
+}
+
+/*
+ * Test to if a non-power of 2 count causes the create
+ * function to fail correctly
+ */
+static int
+test_create_count_odd(void)
+{
+	struct rte_ring *r = rte_ring_create_elem("test_ring_count",
+			4097, 8, SOCKET_ID_ANY, 0 );
+	if(r != NULL){
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * it tests some more basic ring operations
+ */
+static int
+test_ring_basic_ex(void)
+{
+	int ret = -1;
+	unsigned i;
+	struct rte_ring *rp = NULL;
+	void **obj = NULL;
+
+	obj = rte_calloc("test_ring_basic_ex_malloc", RING_SIZE, sizeof(void *), 0);
+	if (obj == NULL) {
+		printf("test_ring_basic_ex fail to rte_malloc\n");
+		goto fail_test;
+	}
+
+	rp = rte_ring_create_elem("test_ring_basic_ex", RING_SIZE, 8, SOCKET_ID_ANY,
+			RING_F_SP_ENQ | RING_F_SC_DEQ);
+	if (rp == NULL) {
+		printf("test_ring_basic_ex fail to create ring\n");
+		goto fail_test;
+	}
+
+	if (rte_ring_lookup("test_ring_basic_ex") != rp) {
+		goto fail_test;
+	}
+
+	if (rte_ring_empty(rp) != 1) {
+		printf("test_ring_basic_ex ring is not empty but it should be\n");
+		goto fail_test;
+	}
+
+	printf("%u ring entries are now free\n", rte_ring_free_count(rp));
+
+	for (i = 0; i < RING_SIZE; i ++) {
+		rte_ring_enqueue_elem(rp, &obj[i], 8);
+	}
+
+	if (rte_ring_full(rp) != 1) {
+		printf("test_ring_basic_ex ring is not full but it should be\n");
+		goto fail_test;
+	}
+
+	for (i = 0; i < RING_SIZE; i ++) {
+		rte_ring_dequeue_elem(rp, &obj[i], 8);
+	}
+
+	if (rte_ring_empty(rp) != 1) {
+		printf("test_ring_basic_ex ring is not empty but it should be\n");
+		goto fail_test;
+	}
+
+	/* Covering the ring burst operation */
+	ret = rte_ring_enqueue_burst_elem(rp, obj, 8, 2, NULL);
+	if (ret != 2) {
+		printf("test_ring_basic_ex: rte_ring_enqueue_burst fails \n");
+		goto fail_test;
+	}
+
+	ret = rte_ring_dequeue_burst_elem(rp, obj, 8, 2, NULL);
+	if (ret != 2) {
+		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
+		goto fail_test;
+	}
+
+	ret = 0;
+fail_test:
+	rte_ring_free(rp);
+	if (obj != NULL)
+		rte_free(obj);
+
+	return ret;
+}
+
+static int
+test_ring_with_exact_size(void)
+{
+	struct rte_ring *std_ring = NULL, *exact_sz_ring = NULL;
+	void *ptr_array[16];
+	static const unsigned int ring_sz = RTE_DIM(ptr_array);
+	unsigned int i;
+	int ret = -1;
+
+	std_ring = rte_ring_create_elem("std", ring_sz, 8, rte_socket_id(),
+			RING_F_SP_ENQ | RING_F_SC_DEQ);
+	if (std_ring == NULL) {
+		printf("%s: error, can't create std ring\n", __func__);
+		goto end;
+	}
+	exact_sz_ring = rte_ring_create_elem("exact sz", ring_sz, 8, rte_socket_id(),
+			RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
+	if (exact_sz_ring == NULL) {
+		printf("%s: error, can't create exact size ring\n", __func__);
+		goto end;
+	}
+
+	/*
+	 * Check that the exact size ring is bigger than the standard ring
+	 */
+	if (rte_ring_get_size(std_ring) >= rte_ring_get_size(exact_sz_ring)) {
+		printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
+				__func__,
+				rte_ring_get_size(std_ring),
+				rte_ring_get_size(exact_sz_ring));
+		goto end;
+	}
+	/*
+	 * check that the exact_sz_ring can hold one more element than the
+	 * standard ring. (16 vs 15 elements)
+	 */
+	for (i = 0; i < ring_sz - 1; i++) {
+		rte_ring_enqueue_elem(std_ring, ptr_array, 8);
+		rte_ring_enqueue_elem(exact_sz_ring, ptr_array, 8);
+	}
+	if (rte_ring_enqueue_elem(std_ring, ptr_array, 8) != -ENOBUFS) {
+		printf("%s: error, unexpected successful enqueue\n", __func__);
+		goto end;
+	}
+	if (rte_ring_enqueue_elem(exact_sz_ring, ptr_array, 8) == -ENOBUFS) {
+		printf("%s: error, enqueue failed\n", __func__);
+		goto end;
+	}
+
+	/* check that dequeue returns the expected number of elements */
+	if (rte_ring_dequeue_burst_elem(exact_sz_ring, ptr_array, 8,
+			RTE_DIM(ptr_array), NULL) != ring_sz) {
+		printf("%s: error, failed to dequeue expected nb of elements\n",
+				__func__);
+		goto end;
+	}
+
+	/* check that the capacity function returns expected value */
+	if (rte_ring_get_capacity(exact_sz_ring) != ring_sz) {
+		printf("%s: error, incorrect ring capacity reported\n",
+				__func__);
+		goto end;
+	}
+
+	ret = 0; /* all ok if we get here */
+end:
+	rte_ring_free(std_ring);
+	rte_ring_free(exact_sz_ring);
+	return ret;
+}
+
+static int
+test_ring(void)
+{
+	struct rte_ring *r = NULL;
+
+	/* some more basic operations */
+	if (test_ring_basic_ex() < 0)
+		goto test_fail;
+
+	rte_atomic32_init(&synchro);
+
+	r = rte_ring_create_elem("test", RING_SIZE, 8, SOCKET_ID_ANY, 0);
+	if (r == NULL)
+		goto test_fail;
+
+	/* retrieve the ring from its name */
+	if (rte_ring_lookup("test") != r) {
+		printf("Cannot lookup ring from its name\n");
+		goto test_fail;
+	}
+
+	/* burst operations */
+	if (test_ring_burst_basic(r) < 0)
+		goto test_fail;
+
+	/* basic operations */
+	if (test_ring_basic(r) < 0)
+		goto test_fail;
+
+	/* basic operations */
+	if ( test_create_count_odd() < 0){
+		printf("Test failed to detect odd count\n");
+		goto test_fail;
+	} else
+		printf("Test detected odd count\n");
+
+	/* test of creating ring with wrong size */
+	if (test_ring_creation_with_wrong_size() < 0)
+		goto test_fail;
+
+	/* test of creation ring with an used name */
+	if (test_ring_creation_with_an_used_name() < 0)
+		goto test_fail;
+
+	if (test_ring_with_exact_size() < 0)
+		goto test_fail;
+
+	/* dump the ring status */
+	rte_ring_list_dump(stdout);
+
+	rte_ring_free(r);
+
+	return 0;
+
+test_fail:
+	rte_ring_free(r);
+
+	return -1;
+}
+
+REGISTER_TEST_COMMAND(ring_elem_autotest, test_ring);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 4/6] test/ring: add perf tests for configurable element size ring
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring Honnappa Nagarahalli
@ 2019-10-21  0:22     ` Honnappa Nagarahalli
  2019-10-23 10:02       ` Olivier Matz
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 5/6] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Add performance tests for rte_ring_xxx_elem APIs. At this point these
are derived mainly from existing rte_ring_xxx test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile              |   1 +
 app/test/meson.build           |   1 +
 app/test/test_ring_perf_elem.c | 419 +++++++++++++++++++++++++++++++++
 3 files changed, 421 insertions(+)
 create mode 100644 app/test/test_ring_perf_elem.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 483865b4a..6f168881c 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_elem.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_perf_elem.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 1ca25c00a..634cbbf26 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_elem.c',
 	'test_ring_perf.c',
+	'test_ring_perf_elem.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_perf_elem.c b/app/test/test_ring_perf_elem.c
new file mode 100644
index 000000000..402b7877a
--- /dev/null
+++ b/app/test/test_ring_perf_elem.c
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+
+#include <stdio.h>
+#include <inttypes.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+
+#include "test.h"
+
+/*
+ * Ring
+ * ====
+ *
+ * Measures performance of various operations using rdtsc
+ *  * Empty ring dequeue
+ *  * Enqueue/dequeue of bursts in 1 threads
+ *  * Enqueue/dequeue of bursts in 2 threads
+ */
+
+#define RING_NAME "RING_PERF"
+#define RING_SIZE 4096
+#define MAX_BURST 64
+
+/*
+ * the sizes to enqueue and dequeue in testing
+ * (marked volatile so they won't be seen as compile-time constants)
+ */
+static const volatile unsigned bulk_sizes[] = { 8, 32 };
+
+struct lcore_pair {
+	unsigned c1, c2;
+};
+
+static volatile unsigned lcore_count;
+
+/**** Functions to analyse our core mask to get cores for different tests ***/
+
+static int
+get_two_hyperthreads(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		/* inner loop just re-reads all id's. We could skip the
+		 * first few elements, but since number of cores is small
+		 * there is little point
+		 */
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 == c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_cores(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned c1, c2, s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+
+			c1 = rte_lcore_to_cpu_id(id1);
+			c2 = rte_lcore_to_cpu_id(id2);
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if ((c1 != c2) && (s1 == s2)) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+static int
+get_two_sockets(struct lcore_pair *lcp)
+{
+	unsigned id1, id2;
+	unsigned s1, s2;
+	RTE_LCORE_FOREACH(id1) {
+		RTE_LCORE_FOREACH(id2) {
+			if (id1 == id2)
+				continue;
+			s1 = rte_lcore_to_socket_id(id1);
+			s2 = rte_lcore_to_socket_id(id2);
+			if (s1 != s2) {
+				lcp->c1 = id1;
+				lcp->c2 = id2;
+				return 0;
+			}
+		}
+	}
+	return 1;
+}
+
+/* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
+static void
+test_empty_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 26;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[MAX_BURST];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_sc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		rte_ring_mc_dequeue_bulk_elem(r, burst, 8, bulk_sizes[0], NULL);
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SC empty dequeue: %.2F\n",
+			(double)(sc_end-sc_start) / iterations);
+	printf("MC empty dequeue: %.2F\n",
+			(double)(mc_end-mc_start) / iterations);
+}
+
+/*
+ * for the separate enqueue and dequeue threads they take in one param
+ * and return two. Input = burst size, output = cycle average for sp/sc & mp/mc
+ */
+struct thread_params {
+	struct rte_ring *r;
+	unsigned size;        /* input value, the burst size */
+	double spsc, mpmc;    /* output value, the single or multi timings */
+};
+
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sp_end = rte_rdtsc();
+
+	const uint64_t mp_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mp_enqueue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mp_end = rte_rdtsc();
+
+	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
+	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
+ * thread running enqueue_bulk function
+ */
+static int
+dequeue_bulk(void *p)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	struct thread_params *params = p;
+	struct rte_ring *r = params->r;
+	const unsigned size = params->size;
+	unsigned i;
+	uint32_t burst[MAX_BURST] = {0};
+
+#ifdef RTE_USE_C11_MEM_MODEL
+	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
+#else
+	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
+#endif
+		while (lcore_count != 2)
+			rte_pause();
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_sc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++)
+		while (rte_ring_mc_dequeue_bulk_elem(r, burst, 8, size, NULL)
+				== 0)
+			rte_pause();
+	const uint64_t mc_end = rte_rdtsc();
+
+	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
+	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
+	return 0;
+}
+
+/*
+ * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
+ * used to measure ring perf between hyperthreads, cores and sockets.
+ */
+static void
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
+		lcore_function_t f1, lcore_function_t f2)
+{
+	struct thread_params param1 = {0}, param2 = {0};
+	unsigned i;
+	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
+		lcore_count = 0;
+		param1.size = param2.size = bulk_sizes[i];
+		param1.r = param2.r = r;
+		if (cores->c1 == rte_get_master_lcore()) {
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			f1(&param1);
+			rte_eal_wait_lcore(cores->c2);
+		} else {
+			rte_eal_remote_launch(f1, &param1, cores->c1);
+			rte_eal_remote_launch(f2, &param2, cores->c2);
+			rte_eal_wait_lcore(cores->c1);
+			rte_eal_wait_lcore(cores->c2);
+		}
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.spsc + param2.spsc);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[i], param1.mpmc + param2.mpmc);
+	}
+}
+
+/*
+ * Test function that determines how long an enqueue + dequeue of a single item
+ * takes on a single lcore. Result is for comparison with the bulk enq+deq.
+ */
+static void
+test_single_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 24;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned i = 0;
+	uint32_t burst[2];
+
+	const uint64_t sc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_sp_enqueue_elem(r, burst, 8);
+		rte_ring_sc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t sc_end = rte_rdtsc();
+
+	const uint64_t mc_start = rte_rdtsc();
+	for (i = 0; i < iterations; i++) {
+		rte_ring_mp_enqueue_elem(r, burst, 8);
+		rte_ring_mc_dequeue_elem(r, burst, 8);
+	}
+	const uint64_t mc_end = rte_rdtsc();
+
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
+}
+
+/*
+ * Test that does both enqueue and dequeue on a core using the burst() API calls
+ * instead of the bulk() calls used in other tests. Results should be the same
+ * as for the bulk function called on a single lcore.
+ */
+static void
+test_burst_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_burst_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
+
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+/* Times enqueue and dequeue on a single lcore */
+static void
+test_bulk_enqueue_dequeue(struct rte_ring *r)
+{
+	const unsigned iter_shift = 23;
+	const unsigned iterations = 1<<iter_shift;
+	unsigned sz, i = 0;
+	uint32_t burst[MAX_BURST] = {0};
+
+	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
+		const uint64_t sc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_sp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_sc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t sc_end = rte_rdtsc();
+
+		const uint64_t mc_start = rte_rdtsc();
+		for (i = 0; i < iterations; i++) {
+			rte_ring_mp_enqueue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+			rte_ring_mc_dequeue_bulk_elem(r, burst, 8,
+					bulk_sizes[sz], NULL);
+		}
+		const uint64_t mc_end = rte_rdtsc();
+
+		double sc_avg = ((double)(sc_end-sc_start) /
+				(iterations * bulk_sizes[sz]));
+		double mc_avg = ((double)(mc_end-mc_start) /
+				(iterations * bulk_sizes[sz]));
+
+		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
+	}
+}
+
+static int
+test_ring_perf_elem(void)
+{
+	struct lcore_pair cores;
+	struct rte_ring *r = NULL;
+
+	r = rte_ring_create_elem(RING_NAME, RING_SIZE, 8, rte_socket_id(), 0);
+	if (r == NULL)
+		return -1;
+
+	printf("### Testing single element and burst enq/deq ###\n");
+	test_single_enqueue_dequeue(r);
+	test_burst_enqueue_dequeue(r);
+
+	printf("\n### Testing empty dequeue ###\n");
+	test_empty_dequeue(r);
+
+	printf("\n### Testing using a single lcore ###\n");
+	test_bulk_enqueue_dequeue(r);
+
+	if (get_two_hyperthreads(&cores) == 0) {
+		printf("\n### Testing using two hyperthreads ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_cores(&cores) == 0) {
+		printf("\n### Testing using two physical cores ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	if (get_two_sockets(&cores) == 0) {
+		printf("\n### Testing using two NUMA nodes ###\n");
+		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+	}
+	rte_ring_free(r);
+	return 0;
+}
+
+REGISTER_TEST_COMMAND(ring_perf_elem_autotest, test_ring_perf_elem);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 5/6] lib/ring: copy ring elements using memcpy partially
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 4/6] test/ring: add perf " Honnappa Nagarahalli
@ 2019-10-21  0:22     ` Honnappa Nagarahalli
  2019-10-21  0:23     ` [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements Honnappa Nagarahalli
  2019-10-23  9:48     ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Olivier Matz
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:22 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Copy of ring elements uses memcpy for 32B chunks. The remaining
bytes are copied using assignments.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_ring/rte_ring.c      |  10 --
 lib/librte_ring/rte_ring_elem.h | 229 +++++++-------------------------
 2 files changed, 49 insertions(+), 190 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index e95285259..0f7f4b598 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -51,16 +51,6 @@ rte_ring_get_memsize_elem(unsigned count, unsigned esize)
 {
 	ssize_t sz;
 
-	/* Supported esize values are 4/8/16.
-	 * Others can be added on need basis.
-	 */
-	if (esize != 4 && esize != 8 && esize != 16) {
-		RTE_LOG(ERR, RING,
-			"Unsupported esize value. Supported values are 4, 8 and 16\n");
-
-		return -EINVAL;
-	}
-
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 7e9914567..0ce5f2be7 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -24,6 +24,7 @@ extern "C" {
 #include <stdint.h>
 #include <sys/queue.h>
 #include <errno.h>
+#include <string.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include <rte_memory.h>
@@ -108,215 +109,83 @@ __rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned int count,
 			unsigned int esize, int socket_id, unsigned int flags);
 
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions.
- */
-#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
-	if (esize == 4) \
-		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
-	else if (esize == 8) \
-		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
-	else if (esize == 16) \
-		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
-} while (0)
-
-#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n) do { \
-	unsigned int i; \
+#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n) do { \
+	unsigned int i, j; \
 	const uint32_t size = (r)->size; \
 	uint32_t idx = prod_head & (r)->mask; \
 	uint32_t *ring = (uint32_t *)ring_start; \
 	uint32_t *obj = (uint32_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(uint32_t)0x7))); i += 8, idx += 8) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-			ring[idx + 2] = obj[i + 2]; \
-			ring[idx + 3] = obj[i + 3]; \
-			ring[idx + 4] = obj[i + 4]; \
-			ring[idx + 5] = obj[i + 5]; \
-			ring[idx + 6] = obj[i + 6]; \
-			ring[idx + 7] = obj[i + 7]; \
+	uint32_t nr_n = n * (esize / sizeof(uint32_t)); \
+	uint32_t nr_idx = idx * (esize / sizeof(uint32_t)); \
+	uint32_t seg0 = size - idx; \
+	if (likely(n < seg0)) { \
+		for (i = 0; i < (nr_n & ((~(unsigned)0x7))); \
+						i += 8, nr_idx += 8) { \
+			memcpy(ring + nr_idx, obj + i, 8 * sizeof (uint32_t)); \
 		} \
-		switch (n & 0x7) { \
+		switch (nr_n & 0x7) { \
 		case 7: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 6: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 5: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 4: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 3: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 2: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		case 1: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj[i]; \
-	} \
-} while (0)
-
-#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	uint64_t *ring = (uint64_t *)ring_start; \
-	uint64_t *obj = (uint64_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(uint32_t)0x3))); i += 4, idx += 4) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-			ring[idx + 2] = obj[i + 2]; \
-			ring[idx + 3] = obj[i + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		case 2: \
-			ring[idx++] = obj[i++]; /* fallthrough */ \
-		case 1: \
-			ring[idx++] = obj[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj[i]; \
-	} \
-} while (0)
-
-#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	__uint128_t *ring = (__uint128_t *)ring_start; \
-	__uint128_t *obj = (__uint128_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
-			ring[idx] = obj[i]; \
-			ring[idx + 1] = obj[i + 1]; \
-		} \
-		switch (n & 0x1) { \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		case 1: \
-			ring[idx++] = obj[i++]; \
+			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
 		} \
 	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj[i]; \
+		uint32_t nr_seg0 = seg0 * (esize / sizeof(uint32_t)); \
+		uint32_t nr_seg1 = nr_n - nr_seg0; \
+		for (i = 0; i < nr_seg0; i++, nr_idx++)\
+			ring[nr_idx] = obj[i]; \
+		for (j = 0; j < nr_seg1; i++, j++) \
+			ring[j] = obj[i]; \
 	} \
 } while (0)
 
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions.
- */
-#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table, esize, n) do { \
-	if (esize == 4) \
-		DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n); \
-	else if (esize == 8) \
-		DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n); \
-	else if (esize == 16) \
-		DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n); \
-} while (0)
-
-#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) do { \
-	unsigned int i; \
+#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n) do { \
+	unsigned int i, j; \
 	uint32_t idx = cons_head & (r)->mask; \
 	const uint32_t size = (r)->size; \
 	uint32_t *ring = (uint32_t *)ring_start; \
 	uint32_t *obj = (uint32_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(uint32_t)0x7)); i += 8, idx += 8) {\
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-			obj[i + 2] = ring[idx + 2]; \
-			obj[i + 3] = ring[idx + 3]; \
-			obj[i + 4] = ring[idx + 4]; \
-			obj[i + 5] = ring[idx + 5]; \
-			obj[i + 6] = ring[idx + 6]; \
-			obj[i + 7] = ring[idx + 7]; \
+	uint32_t nr_n = n * (esize / sizeof(uint32_t)); \
+	uint32_t nr_idx = idx * (esize / sizeof(uint32_t)); \
+	uint32_t seg0 = size - idx; \
+	if (likely(n < seg0)) { \
+		for (i = 0; i < (nr_n & ((~(unsigned)0x7))); \
+						i += 8, nr_idx += 8) { \
+			memcpy(obj + i, ring + nr_idx, 8 * sizeof (uint32_t)); \
 		} \
-		switch (n & 0x7) { \
+		switch (nr_n & 0x7) { \
 		case 7: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 6: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 5: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 4: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 3: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 2: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		case 1: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj[i] = ring[idx]; \
-	} \
-} while (0)
-
-#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	uint64_t *ring = (uint64_t *)ring_start; \
-	uint64_t *obj = (uint64_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(uint32_t)0x3)); i += 4, idx += 4) {\
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-			obj[i + 2] = ring[idx + 2]; \
-			obj[i + 3] = ring[idx + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		case 2: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
-		case 1: \
-			obj[i++] = ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj[i] = ring[idx]; \
-	} \
-} while (0)
-
-#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table, n) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	__uint128_t *ring = (__uint128_t *)ring_start; \
-	__uint128_t *obj = (__uint128_t *)obj_table; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n >> 1); i += 2, idx += 2) { \
-			obj[i] = ring[idx]; \
-			obj[i + 1] = ring[idx + 1]; \
-		} \
-		switch (n & 0x1) { \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		case 1: \
-			obj[i++] = ring[idx++]; /* fallthrough */ \
+			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
 		} \
 	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj[i] = ring[idx]; \
+		uint32_t nr_seg0 = seg0 * (esize / sizeof(uint32_t)); \
+		uint32_t nr_seg1 = nr_n - nr_seg0; \
+		for (i = 0; i < nr_seg0; i++, nr_idx++)\
+			obj[i] = ring[nr_idx];\
+		for (j = 0; j < nr_seg1; i++, j++) \
+			obj[i] = ring[j]; \
 	} \
 } while (0)
 
@@ -373,7 +242,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
 	if (n == 0)
 		goto end;
 
-	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize, n);
+	ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
 
 	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
 end:
@@ -420,7 +289,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
 	if (n == 0)
 		goto end;
 
-	DEQUEUE_PTRS_ELEM(r, &r[1], cons_head, obj_table, esize, n);
+	DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
 
 	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 5/6] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
@ 2019-10-21  0:23     ` Honnappa Nagarahalli
  2019-10-23 10:05       ` Olivier Matz
  2019-10-23  9:48     ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Olivier Matz
  6 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:23 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, drc, hemant.agrawal,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu

Improved copy function to copy to/from ring elements.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/rte_ring_elem.h | 165 ++++++++++++++++----------------
 1 file changed, 84 insertions(+), 81 deletions(-)

diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 0ce5f2be7..80ec3c562 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -109,85 +109,88 @@ __rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned int count,
 			unsigned int esize, int socket_id, unsigned int flags);
 
-#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n) do { \
-	unsigned int i, j; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	uint32_t *ring = (uint32_t *)ring_start; \
-	uint32_t *obj = (uint32_t *)obj_table; \
-	uint32_t nr_n = n * (esize / sizeof(uint32_t)); \
-	uint32_t nr_idx = idx * (esize / sizeof(uint32_t)); \
-	uint32_t seg0 = size - idx; \
-	if (likely(n < seg0)) { \
-		for (i = 0; i < (nr_n & ((~(unsigned)0x7))); \
-						i += 8, nr_idx += 8) { \
-			memcpy(ring + nr_idx, obj + i, 8 * sizeof (uint32_t)); \
-		} \
-		switch (nr_n & 0x7) { \
-		case 7: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 6: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 5: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 4: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 3: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 2: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		case 1: \
-			ring[nr_idx++] = obj[i++]; /* fallthrough */ \
-		} \
-	} else { \
-		uint32_t nr_seg0 = seg0 * (esize / sizeof(uint32_t)); \
-		uint32_t nr_seg1 = nr_n - nr_seg0; \
-		for (i = 0; i < nr_seg0; i++, nr_idx++)\
-			ring[nr_idx] = obj[i]; \
-		for (j = 0; j < nr_seg1; i++, j++) \
-			ring[j] = obj[i]; \
-	} \
-} while (0)
-
-#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n) do { \
-	unsigned int i, j; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	uint32_t *ring = (uint32_t *)ring_start; \
-	uint32_t *obj = (uint32_t *)obj_table; \
-	uint32_t nr_n = n * (esize / sizeof(uint32_t)); \
-	uint32_t nr_idx = idx * (esize / sizeof(uint32_t)); \
-	uint32_t seg0 = size - idx; \
-	if (likely(n < seg0)) { \
-		for (i = 0; i < (nr_n & ((~(unsigned)0x7))); \
-						i += 8, nr_idx += 8) { \
-			memcpy(obj + i, ring + nr_idx, 8 * sizeof (uint32_t)); \
-		} \
-		switch (nr_n & 0x7) { \
-		case 7: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 6: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 5: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 4: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 3: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 2: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		case 1: \
-			obj[i++] = ring[nr_idx++]; /* fallthrough */ \
-		} \
-	} else { \
-		uint32_t nr_seg0 = seg0 * (esize / sizeof(uint32_t)); \
-		uint32_t nr_seg1 = nr_n - nr_seg0; \
-		for (i = 0; i < nr_seg0; i++, nr_idx++)\
-			obj[i] = ring[nr_idx];\
-		for (j = 0; j < nr_seg1; i++, j++) \
-			obj[i] = ring[j]; \
-	} \
-} while (0)
+static __rte_always_inline void
+copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t nr_num)
+{
+	uint32_t i;
+
+	for (i = 0; i < (nr_num & ~7); i += 8)
+		memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
+
+	switch (nr_num & 7) {
+	case 7: du32[nr_num - 7] = su32[nr_num - 7]; /* fallthrough */
+	case 6: du32[nr_num - 6] = su32[nr_num - 6]; /* fallthrough */
+	case 5: du32[nr_num - 5] = su32[nr_num - 5]; /* fallthrough */
+	case 4: du32[nr_num - 4] = su32[nr_num - 4]; /* fallthrough */
+	case 3: du32[nr_num - 3] = su32[nr_num - 3]; /* fallthrough */
+	case 2: du32[nr_num - 2] = su32[nr_num - 2]; /* fallthrough */
+	case 1: du32[nr_num - 1] = su32[nr_num - 1]; /* fallthrough */
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
+		void *obj_table, uint32_t num, uint32_t esize)
+{
+	uint32_t idx, nr_idx, nr_num;
+	uint32_t *du32;
+	const uint32_t *su32;
+
+	const uint32_t size = r->size;
+	uint32_t s0, nr_s0, nr_s1;
+
+	idx = prod_head & (r)->mask;
+	/* Normalize the idx to uint32_t */
+	nr_idx = (idx * esize) / sizeof(uint32_t);
+
+	du32 = (uint32_t *)ring_start + nr_idx;
+	su32 = obj_table;
+
+	/* Normalize the number of elements to uint32_t */
+	nr_num = (num * esize) / sizeof(uint32_t);
+
+	s0 = size - idx;
+	if (num < s0)
+		copy_elems(du32, su32, nr_num);
+	else {
+		nr_s0 = (s0 * esize) / sizeof(uint32_t);
+		nr_s1 = nr_num - nr_s0;
+		copy_elems(du32, su32, nr_s0);
+		copy_elems(ring_start, su32 + nr_s0, nr_s1);
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, void *ring_start, uint32_t cons_head,
+		void *obj_table, uint32_t num, uint32_t esize)
+{
+	uint32_t idx, nr_idx, nr_num;
+	uint32_t *du32;
+	const uint32_t *su32;
+
+	const uint32_t size = r->size;
+	uint32_t s0, nr_s0, nr_s1;
+
+	idx = cons_head & (r)->mask;
+	/* Normalize the idx to uint32_t */
+	nr_idx = (idx * esize) / sizeof(uint32_t);
+
+	su32 = (uint32_t *)ring_start + nr_idx;
+	du32 = obj_table;
+
+	/* Normalize the number of elements to uint32_t */
+	nr_num = (num * esize) / sizeof(uint32_t);
+
+	s0 = size - idx;
+	if (num < s0)
+		copy_elems(du32, su32, nr_num);
+	else {
+		nr_s0 = (s0 * esize) / sizeof(uint32_t);
+		nr_s1 = nr_num - nr_s0;
+		copy_elems(du32, su32, nr_s0);
+		copy_elems(du32 + nr_s0, ring_start, nr_s1);
+	}
+}
 
 /* Between load and load. there might be cpu reorder in weak model
  * (powerpc/arm).
@@ -242,7 +245,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void * const obj_table,
 	if (n == 0)
 		goto end;
 
-	ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
+	enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
 
 	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
 end:
@@ -289,7 +292,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
 	if (n == 0)
 		goto end;
 
-	DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
+	dequeue_elems(r, &r[1], cons_head, obj_table, n, esize);
 
 	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18 16:11                           ` Jerin Jacob
@ 2019-10-21  0:27                             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:27 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: David Christensen, Ananyev, Konstantin, olivier.matz, sthemmin,
	jerinj, Richardson, Bruce, david.marchand, pbhagavatula, dev,
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, Honnappa Nagarahalli, nd

> > >
> > > > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable
> > > > element size
> > > >
> > > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the
> > > > >>> results are as
> > > > >> follows. The numbers in brackets are with the code on master.
> > > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > > >>>
> > > > >>> RTE>>ring_perf_elem_autotest
> > > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6
> > > > >>> SP/SC burst enq/dequeue (size: 32): 1 (2) MP/MC burst
> enq/dequeue (size:
> > > > >>> 32): 2
> > > > >>>
> > > > >>> ### Testing empty dequeue ###
> > > > >>> SC empty dequeue: 2.11
> > > > >>> MC empty dequeue: 1.41 (2.11)
> > > > >>>
> > > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > > >>>
> > > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10
> > > > >>> (71.27) SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC
> > > > >>> bulk enq/dequeue
> > > > >>> (size: 32): 25.74 (20.91)
> > > > >>>
> > > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02
> > > > >>> (173.43) SP/SC bulk enq/dequeue (size:
> > > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17
> > > > >>> (46.74)
> > > > >>>
> > > > >>> On one of the Arm platform
> > > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the
> > > > >>> rest are
> > > > >>> ok)
> > > >
> > > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and
> > > > 16 cores/node (SMT=4).  Applied all 3 patches in v5, test results
> > > > are as
> > > > follows:
> > > >
> > > > RTE>>ring_perf_elem_autotest
> > > > ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue:
> > > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8):
> > > > 5 MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue
> > > > (size: 32): 2 MP/MC burst enq/dequeue (size: 32): 2
> > > >
> > > > ### Testing empty dequeue ###
> > > > SC empty dequeue: 7.81
> > > > MC empty dequeue: 7.81
> > > >
> > > > ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > 8): 5.76 MP/MC bulk enq/dequeue (size: 8): 7.66 SP/SC bulk
> > > > enq/dequeue (size: 32): 2.10 MP/MC bulk enq/dequeue (size: 32):
> > > > 2.57
> > > >
> > > > ### Testing using two hyperthreads ### SP/SC bulk enq/dequeue
> > > > (size: 8): 13.13 MP/MC bulk enq/dequeue (size: 8): 13.98 SP/SC
> > > > bulk enq/dequeue (size: 32): 3.41 MP/MC bulk enq/dequeue (size:
> > > > 32): 4.45
> > > >
> > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> 8):
> > > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk
> > > > enq/dequeue
> > > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > > >
> > > > ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > > > 8): 63.41 MP/MC bulk enq/dequeue (size: 8): 62.70 SP/SC bulk
> > > > enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > > > 32): 22.96
> > > >
> > > Thanks for running this. There is another test 'ring_perf_autotest' which
> provides the numbers with the original implementation. The goal is to make
> sure the numbers with the original implementation are the same as these.
> Can you please run that as well?
> >
> > Honnappa,
> >
> > Your earlier perf report shows the cycles are in less than 1. That's
> > is due to it is using 50 or 100MHz clock in EL0.
> > Please check with PMU counter. See "ARM64 profiling" in
> >
> > http://doc.dpdk.org/guides/prog_guide/profile_app.html
I am aware of this. Unfortunately, it does not work on all the platforms. The kernel team discourages using cycle counter for this purpose.
I have replaced the modulo operation with division (in v6) which adds couple of decimal points to the results.

> >
> >
> > Here is the octeontx2 values. There is a regression in two core cases
> > as you reported earlier in x86.
> >
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 288 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 21
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size:
> > 8): 67.36 SP/SC bulk enq/dequeue (size: 32): 13.10 MP/MC bulk
> > enq/dequeue (size: 32): 21.64
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 75.94 MP/MC bulk enq/dequeue (size: 8): 107.66 SP/SC bulk
> > enq/dequeue (size: 32): 24.51 MP/MC bulk enq/dequeue (size: 32): 33.23
> > Test OK
> > RTE>>
> >
> > ---- after applying v5 of the patch ------
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 289 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 40 MP/MC burst enq/dequeue (size: 8): 64 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 39.73 MP/MC bulk enq/dequeue (size:
> > 8): 69.13 SP/SC bulk enq/dequeue (size: 32): 13.44 MP/MC bulk
> > enq/dequeue (size: 32): 22.00
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 76.02 MP/MC bulk enq/dequeue (size: 8): 112.50 SP/SC bulk
> > enq/dequeue (size: 32): 24.71 MP/MC bulk enq/dequeue (size: 32): 33.34
> > Test OK
> > RTE>>
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 290 MP/MC single enq/dequeue: 503 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 63 SP/SC burst
> > enq/dequeue (size: 32): 11 MP/MC burst enq/dequeue (size: 32): 19
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.92 MP/MC bulk enq/dequeue (size:
> > 8): 62.54 SP/SC bulk enq/dequeue (size: 32): 11.46 MP/MC bulk
> > enq/dequeue (size: 32): 19.89
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 87.55 MP/MC bulk enq/dequeue (size: 8): 99.10 SP/SC bulk
> > enq/dequeue (size: 32): 26.63 MP/MC bulk enq/dequeue (size: 32): 29.91
> > Test OK
> > RTE>>
> 
> it looks like removal of 3/3 and keeping only 1/3 and 2/3 shows better
> results in some cases
> 
> 
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 288
> MP/MC single enq/dequeue: 439
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 61
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 22
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.35
> MP/MC bulk enq/dequeue (size: 8): 67.48
> SP/SC bulk enq/dequeue (size: 32): 13.40
> MP/MC bulk enq/dequeue (size: 32): 22.03
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 75.94
> MP/MC bulk enq/dequeue (size: 8): 105.84
> SP/SC bulk enq/dequeue (size: 32): 25.11
> MP/MC bulk enq/dequeue (size: 32): 33.48
> Test OK
> RTE>>
> 
> 
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 288
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 61
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 22
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.35
> MP/MC bulk enq/dequeue (size: 8): 67.46
> SP/SC bulk enq/dequeue (size: 32): 13.42
> MP/MC bulk enq/dequeue (size: 32): 22.01
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 76.04
> MP/MC bulk enq/dequeue (size: 8): 104.88
> SP/SC bulk enq/dequeue (size: 32): 24.75
> MP/MC bulk enq/dequeue (size: 32): 34.66
> Test OK
> RTE>>
> 
> 
> >
> >
> >
> > > > Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-18 16:44                           ` Ananyev, Konstantin
  2019-10-18 19:03                             ` Honnappa Nagarahalli
@ 2019-10-21  0:36                             ` Honnappa Nagarahalli
  2019-10-21  9:04                               ` Ananyev, Konstantin
  1 sibling, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21  0:36 UTC (permalink / raw)
  To: Ananyev, Konstantin, Jerin Jacob
  Cc: David Christensen, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, Honnappa Nagarahalli, nd

> 
> Hi everyone,
> 
> 
> > > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the
> > > > >>> results are as
> > > > >> follows. The numbers in brackets are with the code on master.
> > > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > > >>>
> > > > >>> RTE>>ring_perf_elem_autotest
> > > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6
> > > > >>> SP/SC burst enq/dequeue (size: 32): 1 (2) MP/MC burst
> enq/dequeue (size:
> > > > >>> 32): 2
> > > > >>>
> > > > >>> ### Testing empty dequeue ###
> > > > >>> SC empty dequeue: 2.11
> > > > >>> MC empty dequeue: 1.41 (2.11)
> > > > >>>
> > > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > > >>>
> > > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10
> > > > >>> (71.27) SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC
> > > > >>> bulk enq/dequeue
> > > > >>> (size: 32): 25.74 (20.91)
> > > > >>>
> > > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue
> (size:
> > > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02
> > > > >>> (173.43) SP/SC bulk enq/dequeue (size:
> > > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17
> > > > >>> (46.74)
> > > > >>>
> > > > >>> On one of the Arm platform
> > > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the
> > > > >>> rest are
> > > > >>> ok)
> > > >
> > > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and
> > > > 16 cores/node (SMT=4).  Applied all 3 patches in v5, test results
> > > > are as
> > > > follows:
> > > >
> > > > RTE>>ring_perf_elem_autotest
> > > > ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue:
> > > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8):
> > > > 5 MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue
> > > > (size: 32): 2 MP/MC burst enq/dequeue (size: 32): 2
> > > >
> > > > ### Testing empty dequeue ###
> > > > SC empty dequeue: 7.81
> > > > MC empty dequeue: 7.81
> > > >
> > > > ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size:
> > > > 8): 5.76 MP/MC bulk enq/dequeue (size: 8): 7.66 SP/SC bulk
> > > > enq/dequeue (size: 32): 2.10 MP/MC bulk enq/dequeue (size: 32):
> > > > 2.57
> > > >
> > > > ### Testing using two hyperthreads ### SP/SC bulk enq/dequeue
> > > > (size: 8): 13.13 MP/MC bulk enq/dequeue (size: 8): 13.98 SP/SC
> > > > bulk enq/dequeue (size: 32): 3.41 MP/MC bulk enq/dequeue (size:
> > > > 32): 4.45
> > > >
> > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> 8):
> > > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk
> > > > enq/dequeue
> > > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > > >
> > > > ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size:
> > > > 8): 63.41 MP/MC bulk enq/dequeue (size: 8): 62.70 SP/SC bulk
> > > > enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size:
> > > > 32): 22.96
> > > >
> > > Thanks for running this. There is another test 'ring_perf_autotest'
> > > which provides the numbers with the original implementation. The
> > > goal
> > is to make sure the numbers with the original implementation are the same
> as these. Can you please run that as well?
> >
> > Honnappa,
> >
> > Your earlier perf report shows the cycles are in less than 1. That's
> > is due to it is using 50 or 100MHz clock in EL0.
> > Please check with PMU counter. See "ARM64 profiling" in
> >
> > http://doc.dpdk.org/guides/prog_guide/profile_app.html
> >
> >
> > Here is the octeontx2 values. There is a regression in two core cases
> > as you reported earlier in x86.
> >
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 288 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 21
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size:
> > 8): 67.36 SP/SC bulk enq/dequeue (size: 32): 13.10 MP/MC bulk
> > enq/dequeue (size: 32): 21.64
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 75.94 MP/MC bulk enq/dequeue (size: 8): 107.66 SP/SC bulk
> > enq/dequeue (size: 32): 24.51 MP/MC bulk enq/dequeue (size: 32): 33.23
> > Test OK
> > RTE>>
> >
> > ---- after applying v5 of the patch ------
> >
> > RTE>>ring_perf_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 289 MP/MC single enq/dequeue: 452 SP/SC burst
> enq/dequeue
> > (size: 8): 40 MP/MC burst enq/dequeue (size: 8): 64 SP/SC burst
> > enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 39.73 MP/MC bulk enq/dequeue (size:
> > 8): 69.13 SP/SC bulk enq/dequeue (size: 32): 13.44 MP/MC bulk
> > enq/dequeue (size: 32): 22.00
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 76.02 MP/MC bulk enq/dequeue (size: 8): 112.50 SP/SC bulk
> > enq/dequeue (size: 32): 24.71 MP/MC bulk enq/dequeue (size: 32): 33.34
> > Test OK
> > RTE>>
> >
> > RTE>>ring_perf_elem_autotest
> > ### Testing single element and burst enq/deq ### SP/SC single
> > enq/dequeue: 290 MP/MC single enq/dequeue: 503 SP/SC burst
> enq/dequeue
> > (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 63 SP/SC burst
> > enq/dequeue (size: 32): 11 MP/MC burst enq/dequeue (size: 32): 19
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 6.33
> > MC empty dequeue: 6.67
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 38.92 MP/MC bulk enq/dequeue (size:
> > 8): 62.54 SP/SC bulk enq/dequeue (size: 32): 11.46 MP/MC bulk
> > enq/dequeue (size: 32): 19.89
> >
> > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size:
> > 8): 87.55 MP/MC bulk enq/dequeue (size: 8): 99.10 SP/SC bulk
> > enq/dequeue (size: 32): 26.63 MP/MC bulk enq/dequeue (size: 32): 29.91
> > Test OK
> > RTE>>
> >
> 
> As I can see, there is copy&paste bug in patch #3 (that's why it probably
> produced some weird numbers for me first).
> After fix applied (see patch below), things look pretty good on my box.
> As I can see there are only 3 results noticably lower:
>    SP/SC (size=8) over 2 physical cores same numa socket
>    MP/MC (size=8) over 2 physical cores on different numa sockets.
> All others seems about same or better.
> Anyway I went ahead and reworked code a bit (as I suggested before) to get
> rid of these huge ENQUEUE/DEQUEUE macros.
> Results are very close to fixed patch #3 version (patch is also attached).
> Though I suggest people hold on to re-run perf tests till we'll make ring
> functional test to run for _elem_ functions too.
> I started to work on that, but not sure I'll finish today (most likely Monday).
I have sent V6. This has the test cases added for 'rte_ring_xxx_elem' APIs. All issues are fixed in both the methods of copy, more info below. I will post the performance info soon.

> Perf results from my box, plus patches below.
> Konstantin
> 
> perf results
> ==========
> 
> Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> 
> A - ring_perf_autotest
> B - ring_perf_elem_autotest + patch #3 + fix C - B + update
> 
> ### Testing using a single lcore ###	A	B	C
> SP/SC bulk enq/dequeue (size: 8): 	4.06	3.06	3.22
> MP/MC bulk enq/dequeue (size: 8): 	10.05	9.04	9.38
> SP/SC bulk enq/dequeue (size: 32): 	2.93	1.91	1.84
> MP/MC bulk enq/dequeue (size: 32): 	4.12	3.39	3.35
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 	9.24	8.92	8.89
> MP/MC bulk enq/dequeue (size: 8): 	15.47	15.39	16.02
> SP/SC bulk enq/dequeue (size: 32): 	5.78	3.87	3.86
> MP/MC bulk enq/dequeue (size: 32): 	6.41	4.57	4.45
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 	24.14	29.89	27.05
> MP/MC bulk enq/dequeue (size: 8): 	68.61	70.55	69.85
> SP/SC bulk enq/dequeue (size: 32): 	12.11	12.99	13.04
> MP/MC bulk enq/dequeue (size: 32): 	22.14	17.86	18.25
> 
> ### Testing using two NUMA nodes ###
> SP/SC bulk enq/dequeue (size: 8): 	48.78	31.98	33.57
> MP/MC bulk enq/dequeue (size: 8): 	167.53	197.29	192.13
> SP/SC bulk enq/dequeue (size: 32): 	31.28	21.68	21.61
> MP/MC bulk enq/dequeue (size: 32): 	53.45	49.94	48.81
> 
> fix patch
> =======
> 
> From a2be5a9b136333a56d466ef042c655e522ca7012 Mon Sep 17 00:00:00
> 2001
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Date: Fri, 18 Oct 2019 15:50:43 +0100
> Subject: [PATCH] fix1
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 92e92f150..5e1819069 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -118,7 +118,7 @@ struct rte_ring *rte_ring_create_elem(const char
> *name, unsigned count,
>         uint32_t sz = n * (esize / sizeof(uint32_t)); \
>         if (likely(idx + n < size)) { \
>                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (ring + i, obj + i, 8 * sizeof (uint32_t)); \
> +                       memcpy (ring + idx, obj + i, 8 * sizeof
> + (uint32_t)); \
>                 } \
>                 switch (n & 0x7) { \
>                 case 7: \
> @@ -153,7 +153,7 @@ struct rte_ring *rte_ring_create_elem(const char
> *name, unsigned count,
>         uint32_t sz = n * (esize / sizeof(uint32_t)); \
>         if (likely(idx + n < size)) { \
>                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (obj + i, ring + i, 8 * sizeof (uint32_t)); \
> +                       memcpy (obj + i, ring + idx, 8 * sizeof
Actually, this fix alone is not enough. 'idx' needs to be normalized to elements of type 'uint32_t'.

> + (uint32_t)); \
>                 } \
>                 switch (n & 0x7) { \
>                 case 7: \
> --
> 2.17.1
> 
> update patch (remove macros)
> =========================
> 
> From 18b388e877b97e243f807f27a323e876b30869dd Mon Sep 17 00:00:00
> 2001
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Date: Fri, 18 Oct 2019 17:35:43 +0100
> Subject: [PATCH] update1
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 141 ++++++++++++++++----------------
>  1 file changed, 70 insertions(+), 71 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 5e1819069..eb706b12f 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -109,75 +109,74 @@ __rte_experimental  struct rte_ring
> *rte_ring_create_elem(const char *name, unsigned count,
>                                 unsigned esize, int socket_id, unsigned flags);
> 
> -#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n)
> do { \
> -       unsigned int i; \
> -       const uint32_t size = (r)->size; \
> -       uint32_t idx = prod_head & (r)->mask; \
> -       uint32_t *ring = (uint32_t *)ring_start; \
> -       uint32_t *obj = (uint32_t *)obj_table; \
> -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> -       if (likely(idx + n < size)) { \
> -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (ring + idx, obj + i, 8 * sizeof (uint32_t)); \
> -               } \
> -               switch (n & 0x7) { \
> -               case 7: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 6: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 5: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 4: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 3: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 2: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               case 1: \
> -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> -               } \
> -       } else { \
> -               for (i = 0; idx < size; i++, idx++)\
> -                       ring[idx] = obj[i]; \
> -               for (idx = 0; i < n; i++, idx++) \
> -                       ring[idx] = obj[i]; \
> -       } \
> -} while (0)
> -
> -#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n)
> do { \
> -       unsigned int i; \
> -       uint32_t idx = cons_head & (r)->mask; \
> -       const uint32_t size = (r)->size; \
> -       uint32_t *ring = (uint32_t *)ring_start; \
> -       uint32_t *obj = (uint32_t *)obj_table; \
> -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> -       if (likely(idx + n < size)) { \
> -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> -                       memcpy (obj + i, ring + idx, 8 * sizeof (uint32_t)); \
> -               } \
> -               switch (n & 0x7) { \
> -               case 7: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 6: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 5: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 4: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 3: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 2: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               case 1: \
> -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> -               } \
> -       } else { \
> -               for (i = 0; idx < size; i++, idx++) \
> -                       obj[i] = ring[idx]; \
> -               for (idx = 0; i < n; i++, idx++) \
> -                       obj[i] = ring[idx]; \
> -       } \
> -} while (0)
> +static __rte_always_inline void
> +copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> +uint32_t esize) {
> +       uint32_t i, sz;
> +
> +       sz = (num * esize) / sizeof(uint32_t);
> +
> +       for (i = 0; i < (sz & ~7); i += 8)
> +               memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> +
> +       switch (sz & 7) {
> +       case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> +       case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> +       case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> +       case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> +       case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> +       case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> +       case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> +       }
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> +               void *obj_table, uint32_t num, uint32_t esize) {
> +       uint32_t idx, n;
> +       uint32_t *du32;
> +       const uint32_t *su32;
> +
> +       const uint32_t size = r->size;
> +
> +       idx = prod_head & (r)->mask;
Same here, 'idx' needs to be normalized to elements of type 'uint32_t' and similar fixes on other variables. I have applied your suggestion in 6/6 in v6 along with my corrections. The rte_ring_elem test cases are added in 3/6. I have verified that they are running fine (they are done for 64b alone, will add more). Hopefully, there are no more errors.

> +
> +       du32 = (uint32_t *)ring_start + idx;
> +       su32 = obj_table;
> +
> +       if (idx + num < size)
> +               copy_elems(du32, su32, num, esize);
> +       else {
> +               n = size - idx;
> +               copy_elems(du32, su32, n, esize);
> +               copy_elems(ring_start, su32 + n, num - n, esize);
> +       }
> +}
> +
> +static __rte_always_inline void
> +dequeue_elems(struct rte_ring *r, void *ring_start, uint32_t cons_head,
> +               void *obj_table, uint32_t num, uint32_t esize) {
> +       uint32_t idx, n;
> +       uint32_t *du32;
> +       const uint32_t *su32;
> +
> +       const uint32_t size = r->size;
> +
> +       idx = cons_head & (r)->mask;
> +
> +       su32 = (uint32_t *)ring_start + idx;
> +       du32 = obj_table;
> +
> +       if (idx + num < size)
> +               copy_elems(du32, su32, num, esize);
> +       else {
> +               n = size - idx;
> +               copy_elems(du32, su32, n, esize);
> +               copy_elems(du32 + n, ring_start, num - n, esize);
> +       }
> +}
> 
>  /* Between load and load. there might be cpu reorder in weak model
>   * (powerpc/arm).
> @@ -232,7 +231,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void
> * const obj_table,
>         if (n == 0)
>                 goto end;
> 
> -       ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
> +       enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> 
>         update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
>  end:
> @@ -279,7 +278,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void
> *obj_table,
>         if (n == 0)
>                 goto end;
> 
> -       DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
> +       dequeue_elems(r, &r[1], cons_head, obj_table, n, esize);
> 
>         update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> 
> --
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-21  0:36                             ` Honnappa Nagarahalli
@ 2019-10-21  9:04                               ` Ananyev, Konstantin
  2019-10-22 15:59                                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-21  9:04 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob
  Cc: David Christensen, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	stephen, nd, nd



> >
> > fix patch
> > =======
> >
> > From a2be5a9b136333a56d466ef042c655e522ca7012 Mon Sep 17 00:00:00
> > 2001
> > From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > Date: Fri, 18 Oct 2019 15:50:43 +0100
> > Subject: [PATCH] fix1
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_ring/rte_ring_elem.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> > index 92e92f150..5e1819069 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -118,7 +118,7 @@ struct rte_ring *rte_ring_create_elem(const char
> > *name, unsigned count,
> >         uint32_t sz = n * (esize / sizeof(uint32_t)); \
> >         if (likely(idx + n < size)) { \
> >                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > -                       memcpy (ring + i, obj + i, 8 * sizeof (uint32_t)); \
> > +                       memcpy (ring + idx, obj + i, 8 * sizeof
> > + (uint32_t)); \
> >                 } \
> >                 switch (n & 0x7) { \
> >                 case 7: \
> > @@ -153,7 +153,7 @@ struct rte_ring *rte_ring_create_elem(const char
> > *name, unsigned count,
> >         uint32_t sz = n * (esize / sizeof(uint32_t)); \
> >         if (likely(idx + n < size)) { \
> >                 for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > -                       memcpy (obj + i, ring + i, 8 * sizeof (uint32_t)); \
> > +                       memcpy (obj + i, ring + idx, 8 * sizeof
> Actually, this fix alone is not enough. 'idx' needs to be normalized to elements of type 'uint32_t'.
> 
> > + (uint32_t)); \
> >                 } \
> >                 switch (n & 0x7) { \
> >                 case 7: \
> > --
> > 2.17.1
> >
> > update patch (remove macros)
> > =========================
> >
> > From 18b388e877b97e243f807f27a323e876b30869dd Mon Sep 17 00:00:00
> > 2001
> > From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > Date: Fri, 18 Oct 2019 17:35:43 +0100
> > Subject: [PATCH] update1
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_ring/rte_ring_elem.h | 141 ++++++++++++++++----------------
> >  1 file changed, 70 insertions(+), 71 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> > index 5e1819069..eb706b12f 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -109,75 +109,74 @@ __rte_experimental  struct rte_ring
> > *rte_ring_create_elem(const char *name, unsigned count,
> >                                 unsigned esize, int socket_id, unsigned flags);
> >
> > -#define ENQUEUE_PTRS_GEN(r, ring_start, prod_head, obj_table, esize, n)
> > do { \
> > -       unsigned int i; \
> > -       const uint32_t size = (r)->size; \
> > -       uint32_t idx = prod_head & (r)->mask; \
> > -       uint32_t *ring = (uint32_t *)ring_start; \
> > -       uint32_t *obj = (uint32_t *)obj_table; \
> > -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> > -       if (likely(idx + n < size)) { \
> > -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > -                       memcpy (ring + idx, obj + i, 8 * sizeof (uint32_t)); \
> > -               } \
> > -               switch (n & 0x7) { \
> > -               case 7: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 6: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 5: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 4: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 3: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 2: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               case 1: \
> > -                       ring[idx++] = obj[i++]; /* fallthrough */ \
> > -               } \
> > -       } else { \
> > -               for (i = 0; idx < size; i++, idx++)\
> > -                       ring[idx] = obj[i]; \
> > -               for (idx = 0; i < n; i++, idx++) \
> > -                       ring[idx] = obj[i]; \
> > -       } \
> > -} while (0)
> > -
> > -#define DEQUEUE_PTRS_GEN(r, ring_start, cons_head, obj_table, esize, n)
> > do { \
> > -       unsigned int i; \
> > -       uint32_t idx = cons_head & (r)->mask; \
> > -       const uint32_t size = (r)->size; \
> > -       uint32_t *ring = (uint32_t *)ring_start; \
> > -       uint32_t *obj = (uint32_t *)obj_table; \
> > -       uint32_t sz = n * (esize / sizeof(uint32_t)); \
> > -       if (likely(idx + n < size)) { \
> > -               for (i = 0; i < (sz & ((~(unsigned)0x7))); i += 8, idx += 8) { \
> > -                       memcpy (obj + i, ring + idx, 8 * sizeof (uint32_t)); \
> > -               } \
> > -               switch (n & 0x7) { \
> > -               case 7: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 6: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 5: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 4: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 3: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 2: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               case 1: \
> > -                       obj[i++] = ring[idx++]; /* fallthrough */ \
> > -               } \
> > -       } else { \
> > -               for (i = 0; idx < size; i++, idx++) \
> > -                       obj[i] = ring[idx]; \
> > -               for (idx = 0; i < n; i++, idx++) \
> > -                       obj[i] = ring[idx]; \
> > -       } \
> > -} while (0)
> > +static __rte_always_inline void
> > +copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > +uint32_t esize) {
> > +       uint32_t i, sz;
> > +
> > +       sz = (num * esize) / sizeof(uint32_t);
> > +
> > +       for (i = 0; i < (sz & ~7); i += 8)
> > +               memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> > +
> > +       switch (sz & 7) {
> > +       case 7: du32[sz - 7] = su32[sz - 7]; /* fallthrough */
> > +       case 6: du32[sz - 6] = su32[sz - 6]; /* fallthrough */
> > +       case 5: du32[sz - 5] = su32[sz - 5]; /* fallthrough */
> > +       case 4: du32[sz - 4] = su32[sz - 4]; /* fallthrough */
> > +       case 3: du32[sz - 3] = su32[sz - 3]; /* fallthrough */
> > +       case 2: du32[sz - 2] = su32[sz - 2]; /* fallthrough */
> > +       case 1: du32[sz - 1] = su32[sz - 1]; /* fallthrough */
> > +       }
> > +}
> > +
> > +static __rte_always_inline void
> > +enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_head,
> > +               void *obj_table, uint32_t num, uint32_t esize) {
> > +       uint32_t idx, n;
> > +       uint32_t *du32;
> > +       const uint32_t *su32;
> > +
> > +       const uint32_t size = r->size;
> > +
> > +       idx = prod_head & (r)->mask;
> Same here, 'idx' needs to be normalized to elements of type 'uint32_t' and similar fixes on other variables.

Ups true, my bad.

> I have applied your
> suggestion in 6/6 in v6 along with my corrections. The rte_ring_elem test cases are added in 3/6. I have verified that they are running
> fine (they are done for 64b alone, will add more). Hopefully, there are no more errors.

Cool, we'll re-run perf test om my box.
Thanks
Konstantin

> 
> > +
> > +       du32 = (uint32_t *)ring_start + idx;
> > +       su32 = obj_table;
> > +
> > +       if (idx + num < size)
> > +               copy_elems(du32, su32, num, esize);
> > +       else {
> > +               n = size - idx;
> > +               copy_elems(du32, su32, n, esize);
> > +               copy_elems(ring_start, su32 + n, num - n, esize);
> > +       }
> > +}
> > +
> > +static __rte_always_inline void
> > +dequeue_elems(struct rte_ring *r, void *ring_start, uint32_t cons_head,
> > +               void *obj_table, uint32_t num, uint32_t esize) {
> > +       uint32_t idx, n;
> > +       uint32_t *du32;
> > +       const uint32_t *su32;
> > +
> > +       const uint32_t size = r->size;
> > +
> > +       idx = cons_head & (r)->mask;
> > +
> > +       su32 = (uint32_t *)ring_start + idx;
> > +       du32 = obj_table;
> > +
> > +       if (idx + num < size)
> > +               copy_elems(du32, su32, num, esize);
> > +       else {
> > +               n = size - idx;
> > +               copy_elems(du32, su32, n, esize);
> > +               copy_elems(du32 + n, ring_start, num - n, esize);
> > +       }
> > +}
> >
> >  /* Between load and load. there might be cpu reorder in weak model
> >   * (powerpc/arm).
> > @@ -232,7 +231,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, void
> > * const obj_table,
> >         if (n == 0)
> >                 goto end;
> >
> > -       ENQUEUE_PTRS_GEN(r, &r[1], prod_head, obj_table, esize, n);
> > +       enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> >
> >         update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> >  end:
> > @@ -279,7 +278,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void
> > *obj_table,
> >         if (n == 0)
> >                 goto end;
> >
> > -       DEQUEUE_PTRS_GEN(r, &r[1], cons_head, obj_table, esize, n);
> > +       dequeue_elems(r, &r[1], cons_head, obj_table, n, esize);
> >
> >         update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> >
> > --
> > 2.17.1
> >


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-21  9:04                               ` Ananyev, Konstantin
@ 2019-10-22 15:59                                 ` Ananyev, Konstantin
  2019-10-22 17:57                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-22 15:59 UTC (permalink / raw)
  To: 'Honnappa Nagarahalli', 'Jerin Jacob'
  Cc: 'David Christensen', 'olivier.matz@6wind.com',
	'sthemmin@microsoft.com', 'jerinj@marvell.com',
	Richardson, Bruce, 'david.marchand@redhat.com',
	'pbhagavatula@marvell.com', 'dev@dpdk.org',
	'Dharmik Thakkar',
	'Ruifeng Wang (Arm Technology China)',
	'Gavin Hu (Arm Technology China)',
	'stephen@networkplumber.org', 'nd', 'nd'



> > I have applied your
> > suggestion in 6/6 in v6 along with my corrections. The rte_ring_elem test cases are added in 3/6. I have verified that they are running
> > fine (they are done for 64b alone, will add more). Hopefully, there are no more errors.

Applied v6 and re-run the tests. 
Functional test passes ok on my boxes.
Pert-tests numbers below.
As I can see pretty much same pattern as in v5 remains:
MP/MC on 2 different cores and SP/SC single enq/deq
show lower numbers for _elem_.
For others _elem_ numbers are about the same or higher.
Personally, I am ok to go ahead with these changes. 
Konstantin

A - ring_perf_autotes
B - ring_perf_elem_autotest

 ### Testing single element and burst enq/deq ###	A	B
SP/SC single enq/dequeue: 				8.27	10.94	
MP/MC single enq/dequeue: 				56.11	47.43
SP/SC burst enq/dequeue (size: 8): 			4.20	3.50
MP/MC burst enq/dequeue (size: 8): 			9.93	9.29
SP/SC burst enq/dequeue (size: 32): 			2.93	1.94
MP/MC burst enq/dequeue (size: 32): 			4.10	3.35

### Testing empty dequeue ###
SC empty dequeue: 					2.00	3.00
MC empty dequeue: 					3.00	2.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 			4.06	3.30	
MP/MC bulk enq/dequeue (size: 8): 			9.84	9.28
SP/SC bulk enq/dequeue (size: 32): 			2.93	1.88
MP/MC bulk enq/dequeue (size: 32): 			4.10	3.32

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 			9.22	8.83
MP/MC bulk enq/dequeue (size: 8): 			15.73	15.86
SP/SC bulk enq/dequeue (size: 32): 			5.78	3.83
MP/MC bulk enq/dequeue (size: 32): 			6.33	4.53

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 			23.78	19.32
MP/MC bulk enq/dequeue (size: 8): 			68.54	71.97
SP/SC bulk enq/dequeue (size: 32): 			11.99	10.77
MP/MC bulk enq/dequeue (size: 32): 			21.96	18.66

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 			50.13	33.92
MP/MC bulk enq/dequeue (size: 8): 			177.98	195.87
SP/SC bulk enq/dequeue (size: 32): 			32.98	23.12
MP/MC bulk enq/dequeue (size: 32): 			55.86	48.76


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-22 15:59                                 ` Ananyev, Konstantin
@ 2019-10-22 17:57                                   ` Ananyev, Konstantin
  2019-10-23 18:58                                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-22 17:57 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'Honnappa Nagarahalli',
	'Jerin Jacob'
  Cc: 'David Christensen', 'olivier.matz@6wind.com',
	'sthemmin@microsoft.com', 'jerinj@marvell.com',
	Richardson, Bruce, 'david.marchand@redhat.com',
	'pbhagavatula@marvell.com', 'dev@dpdk.org',
	'Dharmik Thakkar',
	'Ruifeng Wang (Arm Technology China)',
	'Gavin Hu (Arm Technology China)',
	'stephen@networkplumber.org', 'nd', 'nd'



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ananyev, Konstantin
> Sent: Tuesday, October 22, 2019 5:00 PM
> To: 'Honnappa Nagarahalli' <Honnappa.Nagarahalli@arm.com>; 'Jerin Jacob' <jerinjacobk@gmail.com>
> Cc: 'David Christensen' <drc@linux.vnet.ibm.com>; 'olivier.matz@6wind.com' <olivier.matz@6wind.com>; 'sthemmin@microsoft.com'
> <sthemmin@microsoft.com>; 'jerinj@marvell.com' <jerinj@marvell.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> 'david.marchand@redhat.com' <david.marchand@redhat.com>; 'pbhagavatula@marvell.com' <pbhagavatula@marvell.com>;
> 'dev@dpdk.org' <dev@dpdk.org>; 'Dharmik Thakkar' <Dharmik.Thakkar@arm.com>; 'Ruifeng Wang (Arm Technology China)'
> <Ruifeng.Wang@arm.com>; 'Gavin Hu (Arm Technology China)' <Gavin.Hu@arm.com>; 'stephen@networkplumber.org'
> <stephen@networkplumber.org>; 'nd' <nd@arm.com>; 'nd' <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
> 
> 
> 
> > > I have applied your
> > > suggestion in 6/6 in v6 along with my corrections. The rte_ring_elem test cases are added in 3/6. I have verified that they are running
> > > fine (they are done for 64b alone, will add more). Hopefully, there are no more errors.
> 
> Applied v6 and re-run the tests.
> Functional test passes ok on my boxes.
> Pert-tests numbers below.
> As I can see pretty much same pattern as in v5 remains:
> MP/MC on 2 different cores

Forgot to add: for 8 elems, for 32 - new ones always better. 

> and SP/SC single enq/deq
> show lower numbers for _elem_.
> For others _elem_ numbers are about the same or higher.
> Personally, I am ok to go ahead with these changes.
> Konstantin
> 
> A - ring_perf_autotes
> B - ring_perf_elem_autotest
> 
>  ### Testing single element and burst enq/deq ###	A	B
> SP/SC single enq/dequeue: 				8.27	10.94
> MP/MC single enq/dequeue: 				56.11	47.43
> SP/SC burst enq/dequeue (size: 8): 			4.20	3.50
> MP/MC burst enq/dequeue (size: 8): 			9.93	9.29
> SP/SC burst enq/dequeue (size: 32): 			2.93	1.94
> MP/MC burst enq/dequeue (size: 32): 			4.10	3.35
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 					2.00	3.00
> MC empty dequeue: 					3.00	2.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 			4.06	3.30
> MP/MC bulk enq/dequeue (size: 8): 			9.84	9.28
> SP/SC bulk enq/dequeue (size: 32): 			2.93	1.88
> MP/MC bulk enq/dequeue (size: 32): 			4.10	3.32
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 			9.22	8.83
> MP/MC bulk enq/dequeue (size: 8): 			15.73	15.86
> SP/SC bulk enq/dequeue (size: 32): 			5.78	3.83
> MP/MC bulk enq/dequeue (size: 32): 			6.33	4.53
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 			23.78	19.32
> MP/MC bulk enq/dequeue (size: 8): 			68.54	71.97
> SP/SC bulk enq/dequeue (size: 32): 			11.99	10.77
> MP/MC bulk enq/dequeue (size: 32): 			21.96	18.66
> 
> ### Testing using two NUMA nodes ###
> SP/SC bulk enq/dequeue (size: 8): 			50.13	33.92
> MP/MC bulk enq/dequeue (size: 8): 			177.98	195.87
> SP/SC bulk enq/dequeue (size: 32): 			32.98	23.12
> MP/MC bulk enq/dequeue (size: 32): 			55.86	48.76


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (5 preceding siblings ...)
  2019-10-21  0:23     ` [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements Honnappa Nagarahalli
@ 2019-10-23  9:48     ` Olivier Matz
  6 siblings, 0 replies; 173+ messages in thread
From: Olivier Matz @ 2019-10-23  9:48 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

Hi Honnappa,

On Sun, Oct 20, 2019 at 07:22:54PM -0500, Honnappa Nagarahalli wrote:
> The current rte_ring hard-codes the type of the ring element to 'void *',
> hence the size of the element is hard-coded to 32b/64b. Since the ring
> element type is not an input to rte_ring APIs, it results in couple
> of issues:
> 
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
> 
> This patch adds new APIs to support configurable ring element size.
> The APIs support custom element sizes by allowing to define the ring
> element to be a multiple of 32b.
> 
> The aim is to achieve same performance as the existing ring
> implementation. The patch adds same performance tests that are run
> for existing APIs. This allows for performance comparison.
> 
> I also tested with memcpy. x86 shows significant improvements on bulk
> and burst tests. On the Arm platform, I used, there is a drop of
> 4% to 6% in few tests. May be this is something that we can explore
> later.
> 
> Note that this version skips changes to other libraries as I would
> like to get an agreement on the implementation from the community.
> They will be added once there is agreement on the rte_ring changes.
> 
> v6
>  - Labelled as RFC to indicate the better status
>  - Added unit tests to test the rte_ring_xxx_elem APIs
>  - Corrected 'macro based partial memcpy' (5/6) patch
>  - Added Konstantin's method after correction (6/6)
>  - Check Patch shows significant warnings and errors mainly due
>    copying code from existing test cases. None of them are harmful.
>    I will fix them once we have an agreement.
> 
> v5
>  - Use memcpy for chunks of 32B (Konstantin).
>  - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
>    to compare the results easily.
>  - Copying without memcpy is also available in 1/3, if anyone wants to
>    experiment on their platform.
>  - Added other platform owners to test on their respective platforms.
> 
> v4
>  - Few fixes after more performance testing
> 
> v3
>  - Removed macro-fest and used inline functions
>    (Stephen, Bruce)
> 
> v2
>  - Change Event Ring implementation to use ring templates
>    (Jerin, Pavan)
> 
> Honnappa Nagarahalli (6):
>   test/ring: use division for cycle count calculation
>   lib/ring: apis to support configurable element size
>   test/ring: add functional tests for configurable element size ring
>   test/ring: add perf tests for configurable element size ring
>   lib/ring: copy ring elements using memcpy partially
>   lib/ring: improved copy function to copy ring elements
> 
>  app/test/Makefile                    |   2 +
>  app/test/meson.build                 |   2 +
>  app/test/test_ring_elem.c            | 859 +++++++++++++++++++++++++++
>  app/test/test_ring_perf.c            |  22 +-
>  app/test/test_ring_perf_elem.c       | 419 +++++++++++++
>  lib/librte_ring/Makefile             |   3 +-
>  lib/librte_ring/meson.build          |   4 +
>  lib/librte_ring/rte_ring.c           |  34 +-
>  lib/librte_ring/rte_ring.h           |   1 +
>  lib/librte_ring/rte_ring_elem.h      | 818 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |   2 +
>  11 files changed, 2147 insertions(+), 19 deletions(-)
>  create mode 100644 app/test/test_ring_elem.c
>  create mode 100644 app/test/test_ring_perf_elem.c
>  create mode 100644 lib/librte_ring/rte_ring_elem.h

Sorry, I come a day after the fair.

I have only few comments on the shape (I'll reply to individual
patches). On the substance, it looks good to me. I also feel this
version is much better than the template-based versions.

Thanks
Olivier

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2019-10-23  9:49       ` Olivier Matz
  0 siblings, 0 replies; 173+ messages in thread
From: Olivier Matz @ 2019-10-23  9:49 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

On Sun, Oct 20, 2019 at 07:22:55PM -0500, Honnappa Nagarahalli wrote:
> Use division instead of modulo operation to calculate more
> accurate cycle count.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2019-10-23  9:59       ` Olivier Matz
  2019-10-23 19:12         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Olivier Matz @ 2019-10-23  9:59 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

On Sun, Oct 20, 2019 at 07:22:56PM -0500, Honnappa Nagarahalli wrote:
> Current APIs assume ring elements to be pointers. However, in many
> use cases, the size can be different. Add new APIs to support
> configurable ring element sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/Makefile             |   3 +-
>  lib/librte_ring/meson.build          |   4 +
>  lib/librte_ring/rte_ring.c           |  44 +-
>  lib/librte_ring/rte_ring.h           |   1 +
>  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |   2 +
>  6 files changed, 991 insertions(+), 9 deletions(-)
>  create mode 100644 lib/librte_ring/rte_ring_elem.h

(...)

> +/* the actual enqueue of pointers on the ring.
> + * Placed here since identical code needed in both
> + * single and multi producer enqueue functions.
> + */
> +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n) do { \
> +	if (esize == 4) \
> +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> +	else if (esize == 8) \
> +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> +	else if (esize == 16) \
> +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \
> +} while (0)

My initial thinking was that it could be a static inline functions
instead of macros. I see that patches 5 and 6 are changing it. I wonder
however if patches 5 and 6 shouldn't be merged and moved before this
one: it would avoid to introduce new macros that will be removed after.

(...)

> +/**
> + * @internal Enqueue several objects on the ring
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> + *   as passed while creating the ring, otherwise the results are undefined.

The comment "It must be a multiple of 4" and "Currently, sizes 4, 8 and 16 are
supported" are redundant (it appears several times in the file). The second one
should be removed by patch 5 (I think it is missing?).

But if patch 5 and 6 are moved before this one, only "It must be a multiple of
4" would be needed I think, and there would be no transition with only 3
supported sizes.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring Honnappa Nagarahalli
@ 2019-10-23 10:01       ` Olivier Matz
  2019-10-23 11:12         ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Olivier Matz @ 2019-10-23 10:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

On Sun, Oct 20, 2019 at 07:22:57PM -0500, Honnappa Nagarahalli wrote:
> Add functional tests for rte_ring_xxx_elem APIs. At this point these
> are derived mainly from existing rte_ring_xxx test cases.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  app/test/Makefile         |   1 +
>  app/test/meson.build      |   1 +
>  app/test/test_ring_elem.c | 859 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 861 insertions(+)
>  create mode 100644 app/test/test_ring_elem.c

Given the few differences between test_ring_elem.c and test_ring.c, wouldn't
it be possible to have both tests in the same file?

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 4/6] test/ring: add perf tests for configurable element size ring
  2019-10-21  0:22     ` [dpdk-dev] [RFC v6 4/6] test/ring: add perf " Honnappa Nagarahalli
@ 2019-10-23 10:02       ` Olivier Matz
  0 siblings, 0 replies; 173+ messages in thread
From: Olivier Matz @ 2019-10-23 10:02 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

On Sun, Oct 20, 2019 at 07:22:58PM -0500, Honnappa Nagarahalli wrote:
> Add performance tests for rte_ring_xxx_elem APIs. At this point these
> are derived mainly from existing rte_ring_xxx test cases.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  app/test/Makefile              |   1 +
>  app/test/meson.build           |   1 +
>  app/test/test_ring_perf_elem.c | 419 +++++++++++++++++++++++++++++++++
>  3 files changed, 421 insertions(+)
>  create mode 100644 app/test/test_ring_perf_elem.c

Same question than for previous patch: can it be merged with test_ring_perf.c ?

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements
  2019-10-21  0:23     ` [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements Honnappa Nagarahalli
@ 2019-10-23 10:05       ` Olivier Matz
  0 siblings, 0 replies; 173+ messages in thread
From: Olivier Matz @ 2019-10-23 10:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu

On Sun, Oct 20, 2019 at 07:23:00PM -0500, Honnappa Nagarahalli wrote:
> Improved copy function to copy to/from ring elements.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/rte_ring_elem.h | 165 ++++++++++++++++----------------
>  1 file changed, 84 insertions(+), 81 deletions(-)

(...)

> +static __rte_always_inline void
> +copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t nr_num)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < (nr_num & ~7); i += 8)
> +		memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> +
> +	switch (nr_num & 7) {
> +	case 7: du32[nr_num - 7] = su32[nr_num - 7]; /* fallthrough */
> +	case 6: du32[nr_num - 6] = su32[nr_num - 6]; /* fallthrough */
> +	case 5: du32[nr_num - 5] = su32[nr_num - 5]; /* fallthrough */
> +	case 4: du32[nr_num - 4] = su32[nr_num - 4]; /* fallthrough */
> +	case 3: du32[nr_num - 3] = su32[nr_num - 3]; /* fallthrough */
> +	case 2: du32[nr_num - 2] = su32[nr_num - 2]; /* fallthrough */
> +	case 1: du32[nr_num - 1] = su32[nr_num - 1]; /* fallthrough */
> +	}
> +}

minor comment: I suggest src32 and dst32 instead of su32 and du32.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring
  2019-10-23 10:01       ` Olivier Matz
@ 2019-10-23 11:12         ` Ananyev, Konstantin
  0 siblings, 0 replies; 173+ messages in thread
From: Ananyev, Konstantin @ 2019-10-23 11:12 UTC (permalink / raw)
  To: Olivier Matz, Honnappa Nagarahalli
  Cc: sthemmin, jerinj, Richardson, Bruce, david.marchand,
	pbhagavatula, drc, hemant.agrawal, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu


> 
> On Sun, Oct 20, 2019 at 07:22:57PM -0500, Honnappa Nagarahalli wrote:
> > Add functional tests for rte_ring_xxx_elem APIs. At this point these
> > are derived mainly from existing rte_ring_xxx test cases.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  app/test/Makefile         |   1 +
> >  app/test/meson.build      |   1 +
> >  app/test/test_ring_elem.c | 859 ++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 861 insertions(+)
> >  create mode 100644 app/test/test_ring_elem.c
> 
> Given the few differences between test_ring_elem.c and test_ring.c, wouldn't
> it be possible to have both tests in the same file?

+1 to reduce duplication...
Might be move common code into .h file and have actual enqueue/dequeue
calls as defines. 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size
  2019-10-22 17:57                                   ` Ananyev, Konstantin
@ 2019-10-23 18:58                                     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-23 18:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'Jerin Jacob'
  Cc: 'David Christensen', 'olivier.matz@6wind.com',
	'sthemmin@microsoft.com',
	jerinj, Richardson, Bruce, 'david.marchand@redhat.com',
	'pbhagavatula@marvell.com', 'dev@dpdk.org',
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	'stephen@networkplumber.org',
	nd, Honnappa Nagarahalli, nd

<snip>
> >
> > > > I have applied your
> > > > suggestion in 6/6 in v6 along with my corrections. The
> > > > rte_ring_elem test cases are added in 3/6. I have verified that they are
> running fine (they are done for 64b alone, will add more). Hopefully, there are
> no more errors.
> >
> > Applied v6 and re-run the tests.
> > Functional test passes ok on my boxes.
> > Pert-tests numbers below.
> > As I can see pretty much same pattern as in v5 remains:
> > MP/MC on 2 different cores
> 
> Forgot to add: for 8 elems, for 32 - new ones always better.
> 
> > and SP/SC single enq/deq
> > show lower numbers for _elem_.
> > For others _elem_ numbers are about the same or higher.
> > Personally, I am ok to go ahead with these changes.
> > Konstantin
> >
> > A - ring_perf_autotes
> > B - ring_perf_elem_autotest
> >
> >  ### Testing single element and burst enq/deq ###	A	B
> > SP/SC single enq/dequeue: 				8.27	10.94
> > MP/MC single enq/dequeue: 				56.11	47.43
> > SP/SC burst enq/dequeue (size: 8): 			4.20	3.50
> > MP/MC burst enq/dequeue (size: 8): 			9.93	9.29
> > SP/SC burst enq/dequeue (size: 32): 			2.93	1.94
> > MP/MC burst enq/dequeue (size: 32): 			4.10	3.35
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 					2.00	3.00
> > MC empty dequeue: 					3.00	2.00
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 			4.06	3.30
> > MP/MC bulk enq/dequeue (size: 8): 			9.84	9.28
> > SP/SC bulk enq/dequeue (size: 32): 			2.93	1.88
> > MP/MC bulk enq/dequeue (size: 32): 			4.10	3.32
> >
> > ### Testing using two hyperthreads ###
> > SP/SC bulk enq/dequeue (size: 8): 			9.22	8.83
> > MP/MC bulk enq/dequeue (size: 8): 			15.73	15.86
> > SP/SC bulk enq/dequeue (size: 32): 			5.78	3.83
> > MP/MC bulk enq/dequeue (size: 32): 			6.33	4.53
> >
> > ### Testing using two physical cores ###
> > SP/SC bulk enq/dequeue (size: 8): 			23.78	19.32
> > MP/MC bulk enq/dequeue (size: 8): 			68.54	71.97
> > SP/SC bulk enq/dequeue (size: 32): 			11.99	10.77
> > MP/MC bulk enq/dequeue (size: 32): 			21.96	18.66
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8): 			50.13	33.92
> > MP/MC bulk enq/dequeue (size: 8): 			177.98	195.87
> > SP/SC bulk enq/dequeue (size: 32): 			32.98	23.12
> > MP/MC bulk enq/dequeue (size: 32): 			55.86	48.76

Thanks Konstantin. The performance of 5/6 is mostly worst than 6/6. So, we should not consider 5/6 (will not be included in the future).
A - ring_perf_autotest (existing code)
B - ring_perf_elem_autotest (6/6)

Numbers from my side:
On one Arm platform:
### Testing single element and burst enq/deq ###	A	B
SP/SC single enq/dequeue:				1.04	1.06 (1.92)
MP/MC single enq/dequeue: 				1.46	1.51 (3.42)
SP/SC burst enq/dequeue (size: 8): 			0.18	0.17 (-5.55)
MP/MC burst enq/dequeue (size: 8): 			0.23	0.22 (-4.34)
SP/SC burst enq/dequeue (size: 32): 			0.05	0.05 (0)
MP/MC burst enq/dequeue (size: 32): 			0.07	0.06 (-14.28)
	
### Testing empty dequeue ###	
SC empty dequeue: 					0.27	0.27 (0)
MC empty dequeue: 					0.27	0.27 (0)
	
### Testing using a single lcore ###	
SP/SC bulk enq/dequeue (size: 8): 			0.18	0.17 (-5.55)
MP/MC bulk enq/dequeue (size: 8): 			0.23	0.23 (0)
SP/SC bulk enq/dequeue (size: 32): 			0.05	0.05 (0)
MP/MC bulk enq/dequeue (size: 32): 			0.07	0.06 (0)
	
### Testing using two physical cores ###	
SP/SC bulk enq/dequeue (size: 8): 			0.79	0.79 (0)
MP/MC bulk enq/dequeue (size: 8): 			1.42	1.37 (-3.52)
SP/SC bulk enq/dequeue (size: 32): 			0.20	0.20 (0)
MP/MC bulk enq/dequeue (size: 32): 			0.33	0.35 (6.06)

On another Arm platform:

### Testing single element and burst enq/deq ###	A	B	
SP/SC single enq/dequeue:				11.54	11.79 (2.16)
MP/MC single enq/dequeue: 				11.84	12.54 (5.91)
SP/SC burst enq/dequeue (size: 8): 			1.51	1.33   (-11.92)
MP/MC burst enq/dequeue (size: 8): 			1.91	1.73   (-9.42)
SP/SC burst enq/dequeue (size: 32): 			0.62	0.42   (-32.25)
MP/MC burst enq/dequeue (size: 32): 			0.72	0.52   (-27.77)
	
### Testing empty dequeue ###	
SC empty dequeue: 					2.48	2.48 (0)
MC empty dequeue: 					2.48	2.48 (0)
	
### Testing using a single lcore ###	
SP/SC bulk enq/dequeue (size: 8): 			1.52	1.33 (-12.5)
MP/MC bulk enq/dequeue (size: 8): 			1.92	1.73 (-9.89)
SP/SC bulk enq/dequeue (size: 32): 			0.62	0.42 (-32.25)
MP/MC bulk enq/dequeue (size: 32): 			0.72	0.52 (-27.77)
	
### Testing using two physical cores ###	
SP/SC bulk enq/dequeue (size: 8): 			6.30	6.57   (4.28)
MP/MC bulk enq/dequeue (size: 8): 			10.59	10.45 (-1.32)
SP/SC bulk enq/dequeue (size: 32): 			1.92	1.58   (-17.70)
MP/MC bulk enq/dequeue (size: 32): 			2.51	2.47   (-1.59)

From my side, I would say let us just go with patch 2/6.

Jerin/David, any opinion on your side?

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size
  2019-10-23  9:59       ` Olivier Matz
@ 2019-10-23 19:12         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-23 19:12 UTC (permalink / raw)
  To: Olivier Matz
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, drc, hemant.agrawal, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

> 
> On Sun, Oct 20, 2019 at 07:22:56PM -0500, Honnappa Nagarahalli wrote:
> > Current APIs assume ring elements to be pointers. However, in many use
> > cases, the size can be different. Add new APIs to support configurable
> > ring element sizes.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/Makefile             |   3 +-
> >  lib/librte_ring/meson.build          |   4 +
> >  lib/librte_ring/rte_ring.c           |  44 +-
> >  lib/librte_ring/rte_ring.h           |   1 +
> >  lib/librte_ring/rte_ring_elem.h      | 946 +++++++++++++++++++++++++++
> >  lib/librte_ring/rte_ring_version.map |   2 +
> >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > 100644 lib/librte_ring/rte_ring_elem.h
> 
> (...)
> 
> > +/* the actual enqueue of pointers on the ring.
> > + * Placed here since identical code needed in both
> > + * single and multi producer enqueue functions.
> > + */
> > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table, esize, n)
> do { \
> > +	if (esize == 4) \
> > +		ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n); \
> > +	else if (esize == 8) \
> > +		ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n); \
> > +	else if (esize == 16) \
> > +		ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table, n); \ }
> while
> > +(0)
> 
> My initial thinking was that it could be a static inline functions instead of
> macros. I see that patches 5 and 6 are changing it. I wonder however if patches
> 5 and 6 shouldn't be merged and moved before this
> one: it would avoid to introduce new macros that will be removed after.
Patch 2, 5 and 6 implement different methods to do the copy of elements. We can drop 5, as 6 proves to be better than 5 in my tests. The question on choosing between 2 and 6 is still open. If we go with 2, I will convert the macros into inline functions.

> 
> (...)
> 
> > +/**
> > + * @internal Enqueue several objects on the ring
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param esize
> > + *   The size of ring element, in bytes. It must be a multiple of 4.
> > + *   Currently, sizes 4, 8 and 16 are supported. This should be the same
> > + *   as passed while creating the ring, otherwise the results are undefined.
> 
> The comment "It must be a multiple of 4" and "Currently, sizes 4, 8 and 16 are
> supported" are redundant (it appears several times in the file). The second one
> should be removed by patch 5 (I think it is missing?).
> 
> But if patch 5 and 6 are moved before this one, only "It must be a multiple of
> 4" would be needed I think, and there would be no transition with only 3
> supported sizes.
(refer to the comment above) if 2 is chosen, then, I would like to remove the restriction of limited sizes by adding a for loop around the 32b copy. 64b and 128b will remain the same to meet the existing performance.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 00/17] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (11 preceding siblings ...)
  2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2019-12-20  4:45   ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation Honnappa Nagarahalli
                       ` (16 more replies)
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                     ` (2 subsequent siblings)
  15 siblings, 17 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation.

The changes to test cases are significant. The patches 3/17 to 15/17
are created to help with the review. Otherwise, they can be quashed
into a single commit.

v7
 - Merged the test cases to test both legacy APIs and rte_ring_xxx_elem APIs
   without code duplication (Konstantin, Olivier)
 - Performance test cases are merged as well (Konstantin, Olivier)
 - Macros to copy elements are converted into inline functions (Olivier)
 - Added back the changes to hash and event libraries

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (17):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for rte_ring_xxx_elem APIs
  test/ring: test burst APIs with random empty-full test case
  test/ring: add default, single element test cases
  test/ring: rte_ring_xxx_elem test cases for exact size ring
  test/ring: negative test cases for rte_ring_xxx_elem APIs
  test/ring: remove duplicate test cases
  test/ring: removed unused variable synchro
  test/ring: modify single element enq/deq perf test cases
  test/ring: modify burst enq/deq perf test cases
  test/ring: modify bulk enq/deq perf test cases
  test/ring: modify bulk empty deq perf test cases
  test/ring: modify multi-lcore perf test cases
  test/ring: adjust run-on-all-cores perf test cases
  lib/hash: use ring with 32b element size to save memory
  lib/eventdev: use custom element size ring for event rings

 app/test/test_ring.c                 | 1227 +++++++++++---------------
 app/test/test_ring.h                 |  203 +++++
 app/test/test_ring_perf.c            |  434 +++++----
 lib/librte_eventdev/rte_event_ring.c |  147 +--
 lib/librte_eventdev/rte_event_ring.h |   45 +-
 lib/librte_hash/rte_cuckoo_hash.c    |   97 +-
 lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1002 +++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 13 files changed, 2102 insertions(+), 1106 deletions(-)
 create mode 100644 app/test/test_ring.h
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size Honnappa Nagarahalli
                       ` (15 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use division instead of modulo operation to calculate more
accurate cycle count.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test/test_ring_perf.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 70ee46ffe..6c2aca483 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -357,10 +357,10 @@ test_single_enqueue_dequeue(struct rte_ring *r)
 	}
 	const uint64_t mc_end = rte_rdtsc();
 
-	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
-			(sc_end-sc_start) >> iter_shift);
-	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
-			(mc_end-mc_start) >> iter_shift);
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
 }
 
 /*
@@ -395,13 +395,15 @@ test_burst_enqueue_dequeue(struct rte_ring *r)
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
-		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) / bulk_sizes[sz];
-		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) / bulk_sizes[sz];
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
 
-		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				mc_avg);
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2020-01-02 16:42       ` Ananyev, Konstantin
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
                       ` (14 subsequent siblings)
  16 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1002 ++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 6 files changed, 1044 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 22454b084..917c560ad 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca8a435e9..f2f3ccc88 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,5 +3,9 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..3e15dc398 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,38 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
 {
 	ssize_t sz;
 
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		RTE_LOG(ERR, RING, "element size is not a multiple of 4\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be power of 2, and not exceed %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(sizeof(void *), count);
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +130,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +151,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(esize, count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +198,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, sizeof(void *), count, socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..fc7fe127c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,1002 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with user defined element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
+			unsigned int count, int socket_id, unsigned int flags);
+
+static __rte_always_inline void
+enqueue_elems_32(struct rte_ring *r, uint32_t idx,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t *ring = (uint32_t *)&r[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+			ring[idx + 4] = obj[i + 4];
+			ring[idx + 5] = obj[i + 5];
+			ring[idx + 6] = obj[i + 6];
+			ring[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	const uint64_t *obj = (const uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++];
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	__uint128_t *ring = (__uint128_t *)&r[1];
+	const __uint128_t *obj = (const __uint128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			ring[idx++] = obj[i++];
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+/* the actual enqueue of elements on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	uint32_t idx, nr_idx, nr_num;
+
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		enqueue_elems_64(r, prod_head, obj_table, num);
+	else if (esize == 16)
+		enqueue_elems_128(r, prod_head, obj_table, num);
+	else {
+		/* Normalize to uint32_t */
+		uint32_t scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = prod_head & r->mask;
+		nr_idx = idx * scale;
+		enqueue_elems_32(r, nr_idx, obj_table, nr_num);
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_32(struct rte_ring *r, uint32_t idx,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t *ring = (uint32_t *)&r[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+			obj[i + 4] = ring[idx + 4];
+			obj[i + 5] = ring[idx + 5];
+			obj[i + 6] = ring[idx + 6];
+			obj[i + 7] = ring[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	uint64_t *obj = (uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	__uint128_t *ring = (__uint128_t *)&r[1];
+	__uint128_t *obj = (__uint128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+		}
+		switch (n & 0x1) {
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+/* the actual dequeue of elements from the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, uint32_t cons_head, void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	uint32_t idx, nr_idx, nr_num;
+
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		dequeue_elems_64(r, cons_head, obj_table, num);
+	else if (esize == 16)
+		dequeue_elems_128(r, cons_head, obj_table, num);
+	else {
+		/* Normalize to uint32_t */
+		uint32_t scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = cons_head & r->mask;
+		nr_idx = idx * scale;
+		dequeue_elems_32(r, nr_idx, obj_table, nr_num);
+	}
+}
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	enqueue_elems(r, prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	dequeue_elems(r, cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 89d84bcf4..7a5328dd5 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,6 +15,8 @@ DPDK_20.0 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2020-01-02 16:31       ` Ananyev, Konstantin
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 04/17] test/ring: test burst APIs with random empty-full test case Honnappa Nagarahalli
                       ` (13 subsequent siblings)
  16 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add basic infrastructure to test rte_ring_xxx_elem APIs. Add
test cases for testing burst and bulk tests.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 466 ++++++++++++++++++++-----------------------
 app/test/test_ring.h | 203 +++++++++++++++++++
 2 files changed, 419 insertions(+), 250 deletions(-)
 create mode 100644 app/test/test_ring.h

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index aaf1e70ad..e7a8b468b 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -23,11 +23,13 @@
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_random.h>
 #include <rte_errno.h>
 #include <rte_hexdump.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -67,6 +69,50 @@ static rte_atomic32_t synchro;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
+static int esize[] = {-1, 4, 8, 16};
+
+static void
+test_ring_mem_init(void *obj, unsigned int count, int esize)
+{
+	unsigned int i;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		for (i = 0; i < count; i++)
+			((void **)obj)[i] = (void *)(unsigned long)i;
+	else
+		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+			((uint32_t *)obj)[i] = i;
+}
+
+static void
+test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
+{
+	printf("\n%s: ", istr);
+
+	if (esize == -1)
+		printf("legacy APIs: ");
+	else
+		printf("elem APIs: element size %dB ", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if ((api_type & TEST_RING_N) == TEST_RING_N)
+		printf(": default enqueue/dequeue: ");
+	else if ((api_type & TEST_RING_S) == TEST_RING_S)
+		printf(": SP/SC: ");
+	else if ((api_type & TEST_RING_M) == TEST_RING_M)
+		printf(": MP/MC: ");
+
+	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
+		printf("single\n");
+	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
+		printf("bulk\n");
+	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
+		printf("burst\n");
+}
+
 /*
  * helper routine for test_ring_basic
  */
@@ -314,286 +360,203 @@ test_ring_basic(struct rte_ring *r)
 	return -1;
 }
 
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ */
 static int
-test_ring_burst_basic(struct rte_ring *r)
+test_ring_burst_bulk_tests(unsigned int api_type)
 {
+	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
 	int ret;
-	unsigned i;
+	unsigned int i, j;
+	unsigned int num_elems;
 
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
+		/* Create the ring */
+		TEST_RING_CREATE("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0, r);
 
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("Test SP & SC basic functions \n");
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		printf("enqueue 1 obj\n");
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], 1, ret, api_type);
+		if (ret != 1)
+			goto fail;
+		TEST_RING_INCP(cur_src, esize[i], 1);
 
-	cur_src = src;
-	cur_dst = dst;
+		printf("enqueue 2 objs\n");
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
+		if (ret != 2)
+			goto fail;
+		TEST_RING_INCP(cur_src, esize[i], 2);
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
-		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
+		printf("enqueue MAX_BULK objs\n");
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK, ret,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
-
-	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
+		TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
 
-	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
-	/* Always one free entry left */
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is full  \n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
+		printf("dequeue 1 obj\n");
+		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 1, ret, api_type);
+		if (ret != 1)
+			goto fail;
+		TEST_RING_INCP(cur_dst, esize[i], 1);
 
-	printf("Test enqueue for a full entry  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	if (ret != 0)
-		goto fail;
+		printf("dequeue 2 objs\n");
+		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
+		if (ret != 2)
+			goto fail;
+		TEST_RING_INCP(cur_dst, esize[i], 2);
 
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
+		printf("dequeue MAX_BULK objs\n");
+		TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK, ret,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
-
-	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is empty \n");
-	/* Check if ring is empty */
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test MP & MC basic functions \n");
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		cur_src = src;
+		cur_dst = dst;
+
+		printf("fill and empty the ring\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
+			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
+							ret, api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
+
+			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
+							ret, api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
+		}
 
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		}
+
+		cur_src = src;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
+			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
+							ret, api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
+		if (ret != 2)
 			goto fail;
-	}
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		TEST_RING_INCP(cur_src, esize[i], 2);
+
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], num_elems,
+						ret, api_type);
+		if (ret != MAX_BULK - 3)
 			goto fail;
-	}
-
-	/* Available memory space for the exact MAX_BULK objects */
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
+		TEST_RING_INCP(cur_src, esize[i], MAX_BULK - 3);
 
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
+		printf("Test if ring is full\n");
+		if (rte_ring_full(r) != 1)
+			goto fail;
 
+		printf("Test enqueue for a full entry\n");
+		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
+						ret, api_type);
+		if (ret != 0)
+			goto fail;
 
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
+			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
+							ret, api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
+		if (ret != 2)
 			goto fail;
-	}
+		TEST_RING_INCP(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		TEST_RING_DEQUEUE(r, cur_dst, esize[i], num_elems,
+						ret, api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK - 3);
 
-	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
+		printf("Test if ring is empty\n");
+		/* Check if ring is empty */
+		if (rte_ring_empty(r) != 1)
+			goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
 	}
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Covering rte_ring_enqueue_burst functions \n");
-
-	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
@@ -810,6 +773,7 @@ test_ring_with_exact_size(void)
 static int
 test_ring(void)
 {
+	unsigned int i, j;
 	struct rte_ring *r = NULL;
 
 	/* some more basic operations */
@@ -828,9 +792,11 @@ test_ring(void)
 		goto test_fail;
 	}
 
-	/* burst operations */
-	if (test_ring_burst_basic(r) < 0)
-		goto test_fail;
+	/* Burst and bulk operations with sp/sc, mp/mc and default */
+	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
+		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
+			if (test_ring_burst_bulk_tests(i | j) < 0)
+				goto test_fail;
 
 	/* basic operations */
 	if (test_ring_basic(r) < 0)
diff --git a/app/test/test_ring.h b/app/test/test_ring.h
new file mode 100644
index 000000000..19ef1b399
--- /dev/null
+++ b/app/test/test_ring.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+/* API type to call
+ * N - Calls default APIs
+ * S - Calls SP or SC API
+ * M - Calls MP or MC API
+ */
+#define TEST_RING_N 1
+#define TEST_RING_S 2
+#define TEST_RING_M 4
+
+/* API type to call
+ * SL - Calls single element APIs
+ * BL - Calls bulk APIs
+ * BR - Calls burst APIs
+ */
+#define TEST_RING_SL 8
+#define TEST_RING_BL 16
+#define TEST_RING_BR 32
+
+#define TEST_RING_IGNORE_API_TYPE ~0U
+
+#define TEST_RING_INCP(obj, esize, n) do { \
+	/* Legacy queue APIs? */ \
+	if ((esize) == -1) \
+		obj = ((void **)obj) + n; \
+	else \
+		obj = (void **)(((uint32_t *)obj) + \
+					(n * esize / sizeof(uint32_t))); \
+} while (0)
+
+#define TEST_RING_CREATE(name, esize, count, socket_id, flags, r) do { \
+	/* Legacy queue APIs? */ \
+	if ((esize) == -1) \
+		r = rte_ring_create((name), (count), (socket_id), (flags)); \
+	else \
+		r = rte_ring_create_elem((name), (esize), (count), \
+						(socket_id), (flags)); \
+} while (0)
+
+#define TEST_RING_ENQUEUE(r, obj, esize, n, ret, api_type) do { \
+	/* Legacy queue APIs? */ \
+	if ((esize) == -1) \
+		switch (api_type) { \
+		case (TEST_RING_N | TEST_RING_SL): \
+			ret = rte_ring_enqueue(r, obj); \
+			break; \
+		case (TEST_RING_S | TEST_RING_SL): \
+			ret = rte_ring_sp_enqueue(r, obj); \
+			break; \
+		case (TEST_RING_M | TEST_RING_SL): \
+			ret = rte_ring_mp_enqueue(r, obj); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BL): \
+			ret = rte_ring_enqueue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BL): \
+			ret = rte_ring_sp_enqueue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BL): \
+			ret = rte_ring_mp_enqueue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BR): \
+			ret = rte_ring_enqueue_burst(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BR): \
+			ret = rte_ring_sp_enqueue_burst(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BR): \
+			ret = rte_ring_mp_enqueue_burst(r, obj, n, NULL); \
+		} \
+	else \
+		switch (api_type) { \
+		case (TEST_RING_N | TEST_RING_SL): \
+			ret = rte_ring_enqueue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_S | TEST_RING_SL): \
+			ret = rte_ring_sp_enqueue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_M | TEST_RING_SL): \
+			ret = rte_ring_mp_enqueue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BL): \
+			ret = rte_ring_enqueue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BL): \
+			ret = rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BL): \
+			ret = rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BR): \
+			ret = rte_ring_enqueue_burst_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BR): \
+			ret = rte_ring_sp_enqueue_burst_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BR): \
+			ret = rte_ring_mp_enqueue_burst_elem(r, obj, esize, n, \
+								NULL); \
+		} \
+} while (0)
+
+#define TEST_RING_DEQUEUE(r, obj, esize, n, ret, api_type) do { \
+	/* Legacy queue APIs? */ \
+	if ((esize) == -1) \
+		switch (api_type) { \
+		case (TEST_RING_N | TEST_RING_SL): \
+			ret = rte_ring_dequeue(r, obj); \
+			break; \
+		case (TEST_RING_S | TEST_RING_SL): \
+			ret = rte_ring_sc_dequeue(r, obj); \
+			break; \
+		case (TEST_RING_M | TEST_RING_SL): \
+			ret = rte_ring_mc_dequeue(r, obj); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BL): \
+			ret = rte_ring_dequeue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BL): \
+			ret = rte_ring_sc_dequeue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BL): \
+			ret = rte_ring_mc_dequeue_bulk(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BR): \
+			ret = rte_ring_dequeue_burst(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BR): \
+			ret = rte_ring_sc_dequeue_burst(r, obj, n, NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BR): \
+			ret = rte_ring_mc_dequeue_burst(r, obj, n, NULL); \
+		} \
+	else \
+		switch (api_type) { \
+		case (TEST_RING_N | TEST_RING_SL): \
+			ret = rte_ring_dequeue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_S | TEST_RING_SL): \
+			ret = rte_ring_sc_dequeue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_M | TEST_RING_SL): \
+			ret = rte_ring_mc_dequeue_elem(r, obj, esize); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BL): \
+			ret = rte_ring_dequeue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BL): \
+			ret = rte_ring_sc_dequeue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BL): \
+			ret = rte_ring_mc_dequeue_bulk_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_N | TEST_RING_BR): \
+			ret = rte_ring_dequeue_burst_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_S | TEST_RING_BR): \
+			ret = rte_ring_sc_dequeue_burst_elem(r, obj, esize, n, \
+								NULL); \
+			break; \
+		case (TEST_RING_M | TEST_RING_BR): \
+			ret = rte_ring_mc_dequeue_burst_elem(r, obj, esize, n, \
+								NULL); \
+		} \
+} while (0)
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static __rte_always_inline void *
+test_ring_calloc(unsigned int rsize, int esize)
+{
+	unsigned int sz;
+	void *p;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		sz = sizeof(void *);
+	else
+		sz = esize;
+
+	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 04/17] test/ring: test burst APIs with random empty-full test case
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 05/17] test/ring: add default, single element test cases Honnappa Nagarahalli
                       ` (12 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The random empty-full test case should be tested with burst APIs
as well. Hence the test case is consolidated in
test_ring_burst_bulk_tests function.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 91 +++++++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 48 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index e7a8b468b..d4f40ad20 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -113,50 +113,6 @@ test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
 		printf("burst\n");
 }
 
-/*
- * helper routine for test_ring_basic
- */
-static int
-test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[])
-{
-	unsigned i, rand;
-	const unsigned rsz = RING_SIZE - 1;
-
-	printf("Basic full/empty test\n");
-
-	for (i = 0; TEST_RING_FULL_EMTPY_ITER != i; i++) {
-
-		/* random shift in the ring */
-		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
-		printf("%s: iteration %u, random shift: %u;\n",
-		    __func__, i, rand);
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
-				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
-				NULL) == rand);
-
-		/* fill the ring */
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
-		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
-		TEST_RING_VERIFY(rsz == rte_ring_count(r));
-		TEST_RING_VERIFY(rte_ring_full(r));
-		TEST_RING_VERIFY(0 == rte_ring_empty(r));
-
-		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
-				NULL) == rsz);
-		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_full(r));
-		TEST_RING_VERIFY(rte_ring_empty(r));
-
-		/* check data */
-		TEST_RING_VERIFY(0 == memcmp(src, dst, rsz));
-		rte_ring_dump(stdout, r);
-	}
-	return 0;
-}
-
 static int
 test_ring_basic(struct rte_ring *r)
 {
@@ -294,9 +250,6 @@ test_ring_basic(struct rte_ring *r)
 		goto fail;
 	}
 
-	if (test_ring_basic_full_empty(r, src, dst) != 0)
-		goto fail;
-
 	cur_src = src;
 	cur_dst = dst;
 
@@ -371,6 +324,8 @@ test_ring_burst_bulk_tests(unsigned int api_type)
 	int ret;
 	unsigned int i, j;
 	unsigned int num_elems;
+	int rand;
+	const unsigned int rsz = RING_SIZE - 1;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
 		test_ring_print_test_string("Test standard ring", api_type,
@@ -483,7 +438,6 @@ test_ring_burst_bulk_tests(unsigned int api_type)
 			goto fail;
 		TEST_RING_INCP(cur_src, esize[i], 2);
 
-
 		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
 		/* Bulk APIs enqueue exact number of elements */
 		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
@@ -546,6 +500,47 @@ test_ring_burst_bulk_tests(unsigned int api_type)
 			goto fail;
 		}
 
+		printf("Random full/empty test\n");
+		cur_src = src;
+		cur_dst = dst;
+
+		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
+			/* random shift in the ring */
+			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+			    __func__, i, rand);
+			TEST_RING_ENQUEUE(r, cur_src, esize[i], rand,
+							ret, api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			TEST_RING_DEQUEUE(r, cur_dst, esize[i], rand,
+							ret, api_type);
+			TEST_RING_VERIFY(ret == rand);
+
+			/* fill the ring */
+			TEST_RING_ENQUEUE(r, cur_src, esize[i], rsz,
+							ret, api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
+			TEST_RING_VERIFY(rsz == rte_ring_count(r));
+			TEST_RING_VERIFY(rte_ring_full(r));
+			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
+
+			/* empty the ring */
+			TEST_RING_DEQUEUE(r, cur_dst, esize[i], rsz,
+							ret, api_type);
+			TEST_RING_VERIFY(ret == (int)rsz);
+			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
+			TEST_RING_VERIFY(rte_ring_count(r) == 0);
+			TEST_RING_VERIFY(rte_ring_full(r) == 0);
+			TEST_RING_VERIFY(rte_ring_empty(r));
+
+			/* check data */
+			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
+			rte_ring_dump(stdout, r);
+		}
+
 		/* Free memory before test completed */
 		rte_ring_free(r);
 		rte_free(src);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 05/17] test/ring: add default, single element test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 04/17] test/ring: test burst APIs with random empty-full test case Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 06/17] test/ring: rte_ring_xxx_elem test cases for exact size ring Honnappa Nagarahalli
                       ` (11 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add default, single element test cases for rte_ring_xxx_elem
APIs. The burst APIs are kept as is since they are being tested
with a ring created with SP/SC flags. They are further enhanced
with bulk APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 129 +++++++++++++++++++++++++++----------------
 1 file changed, 81 insertions(+), 48 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index d4f40ad20..1025097c8 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -620,78 +620,111 @@ test_lookup_null(void)
 }
 
 /*
- * it tests some more basic ring operations
+ * Test default, single element, bulk and burst APIs
  */
 static int
 test_ring_basic_ex(void)
 {
 	int ret = -1;
-	unsigned i;
+	unsigned int i, j;
 	struct rte_ring *rp = NULL;
-	void **obj = NULL;
+	void *obj = NULL;
 
-	obj = rte_calloc("test_ring_basic_ex_malloc", RING_SIZE, sizeof(void *), 0);
-	if (obj == NULL) {
-		printf("test_ring_basic_ex fail to rte_malloc\n");
-		goto fail_test;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		obj = test_ring_calloc(RING_SIZE, esize[i]);
+		if (obj == NULL) {
+			printf("test_ring_basic_ex fail to rte_malloc\n");
+			goto fail_test;
+		}
 
-	rp = rte_ring_create("test_ring_basic_ex", RING_SIZE, SOCKET_ID_ANY,
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (rp == NULL) {
-		printf("test_ring_basic_ex fail to create ring\n");
-		goto fail_test;
-	}
+		TEST_RING_CREATE("test_ring_basic_ex", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ, rp);
+		if (rp == NULL) {
+			printf("test_ring_basic_ex fail to create ring\n");
+			goto fail_test;
+		}
 
-	if (rte_ring_lookup("test_ring_basic_ex") != rp) {
-		goto fail_test;
-	}
+		if (rte_ring_lookup("test_ring_basic_ex") != rp) {
+			printf("test_ring_basic_ex ring is not found\n");
+			goto fail_test;
+		}
 
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_basic_ex ring is not empty but it should be\n");
+			goto fail_test;
+		}
 
-	printf("%u ring entries are now free\n", rte_ring_free_count(rp));
+		printf("%u ring entries are now free\n",
+			rte_ring_free_count(rp));
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_enqueue(rp, obj[i]);
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			TEST_RING_ENQUEUE(rp, obj, esize[i], 1, ret,
+						TEST_RING_N | TEST_RING_SL);
+		}
 
-	if (rte_ring_full(rp) != 1) {
-		printf("test_ring_basic_ex ring is not full but it should be\n");
-		goto fail_test;
-	}
+		if (rte_ring_full(rp) != 1) {
+			printf("test_ring_basic_ex ring is not full but it should be\n");
+			goto fail_test;
+		}
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_dequeue(rp, &obj[i]);
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			TEST_RING_DEQUEUE(rp, obj, esize[i], 1, ret,
+						TEST_RING_N | TEST_RING_SL);
+		}
 
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_basic_ex ring is not empty but it should be\n");
+			goto fail_test;
+		}
 
-	/* Covering the ring burst operation */
-	ret = rte_ring_enqueue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_enqueue_burst fails \n");
-		goto fail_test;
-	}
+		/* Following tests use the configured flags to decide
+		 * SP/SC or MP/MC.
+		 */
+		/* Covering the ring burst operation */
+		TEST_RING_ENQUEUE(rp, obj, esize[i], 2, ret,
+					TEST_RING_N | TEST_RING_BR);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_enqueue_burst fails\n");
+			goto fail_test;
+		}
+
+		TEST_RING_DEQUEUE(rp, obj, esize[i], 2, ret,
+					TEST_RING_N | TEST_RING_BR);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_dequeue_burst fails\n");
+			goto fail_test;
+		}
+
+		/* Covering the ring bulk operation */
+		TEST_RING_ENQUEUE(rp, obj, esize[i], 2, ret,
+					TEST_RING_N | TEST_RING_BL);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_enqueue_bulk fails\n");
+			goto fail_test;
+		}
+
+		TEST_RING_DEQUEUE(rp, obj, esize[i], 2, ret,
+					TEST_RING_N | TEST_RING_BL);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_dequeue_bulk fails\n");
+			goto fail_test;
+		}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
-		goto fail_test;
+		rte_ring_free(rp);
+		rte_free(obj);
+		rp = NULL;
+		obj = NULL;
 	}
 
-	ret = 0;
+	return 0;
+
 fail_test:
 	rte_ring_free(rp);
 	if (obj != NULL)
 		rte_free(obj);
 
-	return ret;
+	return -1;
 }
 
 static int
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 06/17] test/ring: rte_ring_xxx_elem test cases for exact size ring
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 05/17] test/ring: add default, single element test cases Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 07/17] test/ring: negative test cases for rte_ring_xxx_elem APIs Honnappa Nagarahalli
                       ` (10 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Test cases for the exact size ring are changed to test
rte_ring_xxx_elem APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 147 ++++++++++++++++++++++++++-----------------
 1 file changed, 89 insertions(+), 58 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 1025097c8..294e3ee10 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -727,75 +727,106 @@ test_ring_basic_ex(void)
 	return -1;
 }
 
+/*
+ * Basic test cases with exact size ring.
+ */
 static int
 test_ring_with_exact_size(void)
 {
-	struct rte_ring *std_ring = NULL, *exact_sz_ring = NULL;
-	void *ptr_array[16];
-	static const unsigned int ring_sz = RTE_DIM(ptr_array);
-	unsigned int i;
+	struct rte_ring *std_r = NULL, *exact_sz_r = NULL;
+	void *obj;
+	const unsigned int ring_sz = 16;
+	unsigned int i, j;
 	int ret = -1;
 
-	std_ring = rte_ring_create("std", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (std_ring == NULL) {
-		printf("%s: error, can't create std ring\n", __func__);
-		goto end;
-	}
-	exact_sz_ring = rte_ring_create("exact sz", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
-	if (exact_sz_ring == NULL) {
-		printf("%s: error, can't create exact size ring\n", __func__);
-		goto end;
-	}
-
-	/*
-	 * Check that the exact size ring is bigger than the standard ring
-	 */
-	if (rte_ring_get_size(std_ring) >= rte_ring_get_size(exact_sz_ring)) {
-		printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
-				__func__,
-				rte_ring_get_size(std_ring),
-				rte_ring_get_size(exact_sz_ring));
-		goto end;
-	}
-	/*
-	 * check that the exact_sz_ring can hold one more element than the
-	 * standard ring. (16 vs 15 elements)
-	 */
-	for (i = 0; i < ring_sz - 1; i++) {
-		rte_ring_enqueue(std_ring, NULL);
-		rte_ring_enqueue(exact_sz_ring, NULL);
-	}
-	if (rte_ring_enqueue(std_ring, NULL) != -ENOBUFS) {
-		printf("%s: error, unexpected successful enqueue\n", __func__);
-		goto end;
-	}
-	if (rte_ring_enqueue(exact_sz_ring, NULL) == -ENOBUFS) {
-		printf("%s: error, enqueue failed\n", __func__);
-		goto end;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test exact size ring",
+				TEST_RING_IGNORE_API_TYPE,
+				esize[i]);
+
+		/* alloc object pointers */
+		obj = test_ring_calloc(16, esize[i]);
+		if (obj == NULL)
+			goto test_fail;
+
+		TEST_RING_CREATE("std", esize[i], ring_sz, rte_socket_id(),
+					RING_F_SP_ENQ | RING_F_SC_DEQ, std_r);
+		if (std_r == NULL) {
+			printf("%s: error, can't create std ring\n", __func__);
+			goto test_fail;
+		}
+		TEST_RING_CREATE("exact sz", esize[i], ring_sz, rte_socket_id(),
+				RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ,
+				exact_sz_r);
+		if (exact_sz_r == NULL) {
+			printf("%s: error, can't create exact size ring\n",
+					__func__);
+			goto test_fail;
+		}
 
-	/* check that dequeue returns the expected number of elements */
-	if (rte_ring_dequeue_burst(exact_sz_ring, ptr_array,
-			RTE_DIM(ptr_array), NULL) != ring_sz) {
-		printf("%s: error, failed to dequeue expected nb of elements\n",
+		/*
+		 * Check that the exact size ring is bigger than the
+		 * standard ring
+		 */
+		if (rte_ring_get_size(std_r) >= rte_ring_get_size(exact_sz_r)) {
+			printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
+					__func__,
+					rte_ring_get_size(std_r),
+					rte_ring_get_size(exact_sz_r));
+			goto test_fail;
+		}
+		/*
+		 * check that the exact_sz_ring can hold one more element
+		 * than the standard ring. (16 vs 15 elements)
+		 */
+		for (j = 0; j < ring_sz - 1; j++) {
+			TEST_RING_ENQUEUE(std_r, obj, esize[i], 1, ret,
+						TEST_RING_N | TEST_RING_SL);
+			TEST_RING_ENQUEUE(exact_sz_r, obj, esize[i], 1,
+					ret, TEST_RING_N | TEST_RING_SL);
+		}
+		TEST_RING_ENQUEUE(std_r, obj, esize[i], 1, ret,
+						TEST_RING_N | TEST_RING_SL);
+		if (ret != -ENOBUFS) {
+			printf("%s: error, unexpected successful enqueue\n",
 				__func__);
-		goto end;
-	}
+			goto test_fail;
+		}
+		TEST_RING_ENQUEUE(exact_sz_r, obj, esize[i], 1, ret,
+						TEST_RING_N | TEST_RING_SL);
+		if (ret == -ENOBUFS) {
+			printf("%s: error, enqueue failed\n", __func__);
+			goto test_fail;
+		}
 
-	/* check that the capacity function returns expected value */
-	if (rte_ring_get_capacity(exact_sz_ring) != ring_sz) {
-		printf("%s: error, incorrect ring capacity reported\n",
+		/* check that dequeue returns the expected number of elements */
+		TEST_RING_DEQUEUE(exact_sz_r, obj, esize[i], ring_sz,
+					ret, TEST_RING_N | TEST_RING_BR);
+		if (ret != (int)ring_sz) {
+			printf("%s: error, failed to dequeue expected nb of elements\n",
 				__func__);
-		goto end;
+			goto test_fail;
+		}
+
+		/* check that the capacity function returns expected value */
+		if (rte_ring_get_capacity(exact_sz_r) != ring_sz) {
+			printf("%s: error, incorrect ring capacity reported\n",
+					__func__);
+			goto test_fail;
+		}
+
+		rte_free(obj);
+		rte_ring_free(std_r);
+		rte_ring_free(exact_sz_r);
 	}
 
-	ret = 0; /* all ok if we get here */
-end:
-	rte_ring_free(std_ring);
-	rte_ring_free(exact_sz_ring);
-	return ret;
+	return 0;
+
+test_fail:
+	rte_free(obj);
+	rte_ring_free(std_r);
+	rte_ring_free(exact_sz_r);
+	return -1;
 }
 
 static int
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 07/17] test/ring: negative test cases for rte_ring_xxx_elem APIs
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (5 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 06/17] test/ring: rte_ring_xxx_elem test cases for exact size ring Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 08/17] test/ring: remove duplicate test cases Honnappa Nagarahalli
                       ` (9 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

All the negative test cases are consolidated into a single
function. This provides the ability to add test cases for
rte_ring_xxx_elem APIs easily.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 176 ++++++++++++++++++++++---------------------
 1 file changed, 91 insertions(+), 85 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 294e3ee10..552e8b53a 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -113,6 +113,93 @@ test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
 		printf("burst\n");
 }
 
+/*
+ * Various negative test cases.
+ */
+static int
+test_ring_negative_tests(void)
+{
+	struct rte_ring *rp = NULL;
+	struct rte_ring *rt = NULL;
+	unsigned int i;
+
+	/* Test with esize not a multiple of 4 */
+	TEST_RING_CREATE("test_bad_element_size", 23,
+				RING_SIZE + 1, SOCKET_ID_ANY, 0, rp);
+	if (rp != NULL) {
+		printf("Test failed to detect invalid element size\n");
+		goto test_fail;
+	}
+
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		/* Test if ring size is not power of 2 */
+		TEST_RING_CREATE("test_bad_ring_size", esize[i],
+					RING_SIZE + 1, SOCKET_ID_ANY, 0, rp);
+		if (rp != NULL) {
+			printf("Test failed to detect odd count\n");
+			goto test_fail;
+		}
+
+		/* Test if ring size is exceeding the limit */
+		TEST_RING_CREATE("test_bad_ring_size", esize[i],
+					RTE_RING_SZ_MASK + 1, SOCKET_ID_ANY,
+					0, rp);
+		if (rp != NULL) {
+			printf("Test failed to detect limits\n");
+			goto test_fail;
+		}
+
+		/* Tests if lookup returns NULL on non-existing ring */
+		rp = rte_ring_lookup("ring_not_found");
+		if (rp != NULL && rte_errno != ENOENT) {
+			printf("Test failed to detect NULL ring lookup\n");
+			goto test_fail;
+		}
+
+		/* Test to if a non-power of 2 count causes the create
+		 * function to fail correctly
+		 */
+		TEST_RING_CREATE("test_ring_count", esize[i], 4097,
+					SOCKET_ID_ANY, 0, rp);
+		if (rp != NULL)
+			goto test_fail;
+
+		TEST_RING_CREATE("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ, rp);
+		if (rp == NULL) {
+			printf("test_ring_negative fail to create ring\n");
+			goto test_fail;
+		}
+
+		if (rte_ring_lookup("test_ring_negative") != rp)
+			goto test_fail;
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_nagative ring is not empty but it should be\n");
+			goto test_fail;
+		}
+
+		/* Tests if it would always fail to create ring with an used
+		 * ring name.
+		 */
+		TEST_RING_CREATE("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY, 0, rt);
+		if (rt != NULL)
+			goto test_fail;
+
+		rte_ring_free(rp);
+	}
+
+	return 0;
+
+test_fail:
+
+	rte_ring_free(rp);
+	return -1;
+}
+
 static int
 test_ring_basic(struct rte_ring *r)
 {
@@ -555,70 +642,6 @@ test_ring_burst_bulk_tests(unsigned int api_type)
 	return -1;
 }
 
-/*
- * it will always fail to create ring with a wrong ring size number in this function
- */
-static int
-test_ring_creation_with_wrong_size(void)
-{
-	struct rte_ring * rp = NULL;
-
-	/* Test if ring size is not power of 2 */
-	rp = rte_ring_create("test_bad_ring_size", RING_SIZE + 1, SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
-
-	/* Test if ring size is exceeding the limit */
-	rp = rte_ring_create("test_bad_ring_size", (RTE_RING_SZ_MASK + 1), SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
-	return 0;
-}
-
-/*
- * it tests if it would always fail to create ring with an used ring name
- */
-static int
-test_ring_creation_with_an_used_name(void)
-{
-	struct rte_ring * rp;
-
-	rp = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (NULL != rp)
-		return -1;
-
-	return 0;
-}
-
-/*
- * Test to if a non-power of 2 count causes the create
- * function to fail correctly
- */
-static int
-test_create_count_odd(void)
-{
-	struct rte_ring *r = rte_ring_create("test_ring_count",
-			4097, SOCKET_ID_ANY, 0 );
-	if(r != NULL){
-		return -1;
-	}
-	return 0;
-}
-
-static int
-test_lookup_null(void)
-{
-	struct rte_ring *rlp = rte_ring_lookup("ring_not_found");
-	if (rlp ==NULL)
-	if (rte_errno != ENOENT){
-		printf( "test failed to returnn error on null pointer\n");
-		return -1;
-	}
-	return 0;
-}
-
 /*
  * Test default, single element, bulk and burst APIs
  */
@@ -835,6 +858,10 @@ test_ring(void)
 	unsigned int i, j;
 	struct rte_ring *r = NULL;
 
+	/* Negative test cases */
+	if (test_ring_negative_tests() < 0)
+		goto test_fail;
+
 	/* some more basic operations */
 	if (test_ring_basic_ex() < 0)
 		goto test_fail;
@@ -861,27 +888,6 @@ test_ring(void)
 	if (test_ring_basic(r) < 0)
 		goto test_fail;
 
-	/* basic operations */
-	if ( test_create_count_odd() < 0){
-		printf("Test failed to detect odd count\n");
-		goto test_fail;
-	} else
-		printf("Test detected odd count\n");
-
-	if ( test_lookup_null() < 0){
-		printf("Test failed to detect NULL ring lookup\n");
-		goto test_fail;
-	} else
-		printf("Test detected NULL ring lookup\n");
-
-	/* test of creating ring with wrong size */
-	if (test_ring_creation_with_wrong_size() < 0)
-		goto test_fail;
-
-	/* test of creation ring with an used name */
-	if (test_ring_creation_with_an_used_name() < 0)
-		goto test_fail;
-
 	if (test_ring_with_exact_size() < 0)
 		goto test_fail;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 08/17] test/ring: remove duplicate test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (6 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 07/17] test/ring: negative test cases for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 09/17] test/ring: removed unused variable synchro Honnappa Nagarahalli
                       ` (8 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The test cases in the function test_ring_basic are already covered
by the function test_ring_burst_bulk_tests and others.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 218 -------------------------------------------
 1 file changed, 218 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 552e8b53a..a082f0137 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -200,206 +200,6 @@ test_ring_negative_tests(void)
 	return -1;
 }
 
-static int
-test_ring_basic(struct rte_ring *r)
-{
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i, num_elems;
-
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-	}
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("test default bulk enqueue / dequeue\n");
-	num_elems = 16;
-
-	cur_src = src;
-	cur_dst = dst;
-
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue2\n");
-		goto fail;
-	}
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	ret = rte_ring_mp_enqueue(r, cur_src);
-	if (ret != 0)
-		goto fail;
-
-	ret = rte_ring_mc_dequeue(r, cur_dst);
-	if (ret != 0)
-		goto fail;
-
-	free(src);
-	free(dst);
-	return 0;
-
- fail:
-	free(src);
-	free(dst);
-	return -1;
-}
-
 /*
  * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
  */
@@ -856,7 +656,6 @@ static int
 test_ring(void)
 {
 	unsigned int i, j;
-	struct rte_ring *r = NULL;
 
 	/* Negative test cases */
 	if (test_ring_negative_tests() < 0)
@@ -868,38 +667,21 @@ test_ring(void)
 
 	rte_atomic32_init(&synchro);
 
-	r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (r == NULL)
-		goto test_fail;
-
-	/* retrieve the ring from its name */
-	if (rte_ring_lookup("test") != r) {
-		printf("Cannot lookup ring from its name\n");
-		goto test_fail;
-	}
-
 	/* Burst and bulk operations with sp/sc, mp/mc and default */
 	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
 		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
 			if (test_ring_burst_bulk_tests(i | j) < 0)
 				goto test_fail;
 
-	/* basic operations */
-	if (test_ring_basic(r) < 0)
-		goto test_fail;
-
 	if (test_ring_with_exact_size() < 0)
 		goto test_fail;
 
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-	rte_ring_free(r);
-
 	return 0;
 
 test_fail:
-	rte_ring_free(r);
 
 	return -1;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 09/17] test/ring: removed unused variable synchro
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (7 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 08/17] test/ring: remove duplicate test cases Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases Honnappa Nagarahalli
                       ` (7 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Remove unused variable synchro

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index a082f0137..57fbd897c 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -57,8 +57,6 @@
 #define RING_SIZE 4096
 #define MAX_BULK 32
 
-static rte_atomic32_t synchro;
-
 #define	TEST_RING_VERIFY(exp)						\
 	if (!(exp)) {							\
 		printf("error at %s:%d\tcondition " #exp " failed\n",	\
@@ -665,8 +663,6 @@ test_ring(void)
 	if (test_ring_basic_ex() < 0)
 		goto test_fail;
 
-	rte_atomic32_init(&synchro);
-
 	/* Burst and bulk operations with sp/sc, mp/mc and default */
 	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
 		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (8 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 09/17] test/ring: removed unused variable synchro Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2020-01-02 17:03       ` Ananyev, Konstantin
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst " Honnappa Nagarahalli
                       ` (6 subsequent siblings)
  16 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add test cases to test rte_ring_xxx_elem APIs for single
element enqueue/dequeue test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 100 ++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 20 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 6c2aca483..5829718c1 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -13,6 +13,7 @@
 #include <string.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -41,6 +42,35 @@ struct lcore_pair {
 
 static volatile unsigned lcore_count = 0;
 
+static void
+test_ring_print_test_string(unsigned int api_type, int esize,
+	unsigned int bsz, double value)
+{
+	if (esize == -1)
+		printf("legacy APIs");
+	else
+		printf("elem APIs: element size %dB", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if ((api_type & TEST_RING_N) == TEST_RING_N)
+		printf(": default enqueue/dequeue: ");
+	else if ((api_type & TEST_RING_S) == TEST_RING_S)
+		printf(": SP/SC: ");
+	else if ((api_type & TEST_RING_M) == TEST_RING_M)
+		printf(": MP/MC: ");
+
+	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
+		printf("single: ");
+	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
+		printf("bulk (size: %u): ", bsz);
+	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
+		printf("burst (size: %u): ", bsz);
+
+	printf("%.2F\n", value);
+}
+
 /**** Functions to analyse our core mask to get cores for different tests ***/
 
 static int
@@ -335,32 +365,35 @@ run_on_all_cores(struct rte_ring *r)
  * Test function that determines how long an enqueue + dequeue of a single item
  * takes on a single lcore. Result is for comparison with the bulk enq+deq.
  */
-static void
-test_single_enqueue_dequeue(struct rte_ring *r)
+static int
+test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 24;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	int ret;
+	const unsigned int iter_shift = 24;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst = NULL;
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++) {
-		rte_ring_sp_enqueue(r, burst);
-		rte_ring_sc_dequeue(r, &burst);
-	}
-	const uint64_t sc_end = rte_rdtsc();
+	(void)ret;
+	/* alloc dummy object pointers */
+	burst = test_ring_calloc(1, esize);
+	if (burst == NULL)
+		return -1;
 
-	const uint64_t mc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++) {
-		rte_ring_mp_enqueue(r, burst);
-		rte_ring_mc_dequeue(r, &burst);
+		TEST_RING_ENQUEUE(r, burst, esize, 1, ret, api_type);
+		TEST_RING_DEQUEUE(r, burst, esize, 1, ret, api_type);
 	}
-	const uint64_t mc_end = rte_rdtsc();
+	const uint64_t end = rte_rdtsc();
+
+	test_ring_print_test_string(api_type, esize, 1,
+					((double)(end - start)) / iterations);
+
+	rte_free(burst);
 
-	printf("SP/SC single enq/dequeue: %.2F\n",
-			((double)(sc_end-sc_start)) / iterations);
-	printf("MP/MC single enq/dequeue: %.2F\n",
-			((double)(mc_end-mc_start)) / iterations);
+	return 0;
 }
 
 /*
@@ -453,12 +486,39 @@ test_ring_perf(void)
 	struct lcore_pair cores;
 	struct rte_ring *r = NULL;
 
+	/*
+	 * Performance test for legacy/_elem APIs
+	 * SP-SC/MP-MC, single
+	 */
+	TEST_RING_CREATE(RING_NAME, -1, RING_SIZE, rte_socket_id(), 0, r);
+	if (r == NULL)
+		return -1;
+
+	printf("\n### Testing single element enq/deq ###\n");
+	if (test_single_enqueue_dequeue(r, -1, TEST_RING_S | TEST_RING_SL) < 0)
+		return -1;
+	if (test_single_enqueue_dequeue(r, -1, TEST_RING_M | TEST_RING_SL) < 0)
+		return -1;
+
+	rte_ring_free(r);
+
+	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
+	if (r == NULL)
+		return -1;
+
+	printf("\n### Testing single element enq/deq ###\n");
+	if (test_single_enqueue_dequeue(r, 16, TEST_RING_S | TEST_RING_SL) < 0)
+		return -1;
+	if (test_single_enqueue_dequeue(r, 16, TEST_RING_M | TEST_RING_SL) < 0)
+		return -1;
+
+	rte_ring_free(r);
+
 	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
 		return -1;
 
 	printf("### Testing single element and burst enq/deq ###\n");
-	test_single_enqueue_dequeue(r);
 	test_burst_enqueue_dequeue(r);
 
 	printf("\n### Testing empty dequeue ###\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst enq/deq perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (9 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2020-01-02 16:57       ` Ananyev, Konstantin
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 12/17] test/ring: modify bulk " Honnappa Nagarahalli
                       ` (5 subsequent siblings)
  16 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add test cases to test legacy and rte_ring_xxx_elem APIs for
burst enqueue/dequeue test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 78 ++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 5829718c1..508c688dc 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -397,47 +397,40 @@ test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
 }
 
 /*
- * Test that does both enqueue and dequeue on a core using the burst() API calls
- * instead of the bulk() calls used in other tests. Results should be the same
- * as for the bulk function called on a single lcore.
+ * Test that does both enqueue and dequeue on a core using the burst/bulk API
+ * calls Results should be the same as for the bulk function called on a
+ * single lcore.
  */
-static void
-test_burst_enqueue_dequeue(struct rte_ring *r)
+static int
+test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	int ret;
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int sz, i = 0;
+	void **burst = NULL;
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
+	(void)ret;
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
-		const uint64_t mc_start = rte_rdtsc();
+	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
+		const uint64_t start = rte_rdtsc();
 		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
+			TEST_RING_ENQUEUE(r, burst, esize, bulk_sizes[sz],
+						ret, api_type);
+			TEST_RING_DEQUEUE(r, burst, esize, bulk_sizes[sz],
+						ret, api_type);
 		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
-					bulk_sizes[sz];
-		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
-					bulk_sizes[sz];
+		const uint64_t end = rte_rdtsc();
 
-		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], mc_avg);
+		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
+					((double)(end - start)) / iterations);
 	}
+
+	return 0;
 }
 
 /* Times enqueue and dequeue on a single lcore */
@@ -499,7 +492,13 @@ test_ring_perf(void)
 		return -1;
 	if (test_single_enqueue_dequeue(r, -1, TEST_RING_M | TEST_RING_SL) < 0)
 		return -1;
-
+	printf("\n### Testing burst enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, -1,
+			TEST_RING_S | TEST_RING_BR) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, -1,
+			TEST_RING_M | TEST_RING_BR) < 0)
+		return -1;
 	rte_ring_free(r);
 
 	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
@@ -511,16 +510,19 @@ test_ring_perf(void)
 		return -1;
 	if (test_single_enqueue_dequeue(r, 16, TEST_RING_M | TEST_RING_SL) < 0)
 		return -1;
-
+	printf("\n### Testing burst enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, 16,
+			TEST_RING_S | TEST_RING_BR) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, 16,
+			TEST_RING_M | TEST_RING_BR) < 0)
+		return -1;
 	rte_ring_free(r);
 
 	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
 		return -1;
 
-	printf("### Testing single element and burst enq/deq ###\n");
-	test_burst_enqueue_dequeue(r);
-
 	printf("\n### Testing empty dequeue ###\n");
 	test_empty_dequeue(r);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 12/17] test/ring: modify bulk enq/deq perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (10 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 13/17] test/ring: modify bulk empty deq " Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Modify test cases to test legacy and rte_ring_xxx_elem APIs for
bulk enqueue/dequeue test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 57 ++++++++++-----------------------------
 1 file changed, 14 insertions(+), 43 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 508c688dc..8a543b6f0 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -433,46 +433,6 @@ test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
 	return 0;
 }
 
-/* Times enqueue and dequeue on a single lcore */
-static void
-test_bulk_enqueue_dequeue(struct rte_ring *r)
-{
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
-
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
-
-		const uint64_t mc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double sc_avg = ((double)(sc_end-sc_start) /
-				(iterations * bulk_sizes[sz]));
-		double mc_avg = ((double)(mc_end-mc_start) /
-				(iterations * bulk_sizes[sz]));
-
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				mc_avg);
-	}
-}
-
 static int
 test_ring_perf(void)
 {
@@ -499,6 +459,13 @@ test_ring_perf(void)
 	if (test_burst_bulk_enqueue_dequeue(r, -1,
 			TEST_RING_M | TEST_RING_BR) < 0)
 		return -1;
+	printf("\n### Testing bulk enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, -1,
+			TEST_RING_S | TEST_RING_BL) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, -1,
+			TEST_RING_M | TEST_RING_BL) < 0)
+		return -1;
 	rte_ring_free(r);
 
 	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
@@ -517,6 +484,13 @@ test_ring_perf(void)
 	if (test_burst_bulk_enqueue_dequeue(r, 16,
 			TEST_RING_M | TEST_RING_BR) < 0)
 		return -1;
+	printf("\n### Testing bulk enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, 16,
+			TEST_RING_S | TEST_RING_BL) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, 16,
+			TEST_RING_M | TEST_RING_BL) < 0)
+		return -1;
 	rte_ring_free(r);
 
 	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
@@ -526,9 +500,6 @@ test_ring_perf(void)
 	printf("\n### Testing empty dequeue ###\n");
 	test_empty_dequeue(r);
 
-	printf("\n### Testing using a single lcore ###\n");
-	test_bulk_enqueue_dequeue(r);
-
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
 		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 13/17] test/ring: modify bulk empty deq perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (11 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 12/17] test/ring: modify bulk " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 14/17] test/ring: modify multi-lcore " Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Modify test cases to test legacy and rte_ring_xxx_elem APIs for
empty bulk dequeue test cases.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 8a543b6f0..0f578c9ae 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -147,27 +147,24 @@ get_two_sockets(struct lcore_pair *lcp)
 
 /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
 static void
-test_empty_dequeue(struct rte_ring *r)
+test_empty_dequeue(struct rte_ring *r, const int esize,
+			const unsigned int api_type)
 {
-	const unsigned iter_shift = 26;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	int ret;
+	const unsigned int iter_shift = 26;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst[MAX_BURST];
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t sc_end = rte_rdtsc();
-
-	const uint64_t mc_start = rte_rdtsc();
+	(void)ret;
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t mc_end = rte_rdtsc();
+		TEST_RING_DEQUEUE(r, burst, esize, bulk_sizes[0],
+					ret, api_type);
+	const uint64_t end = rte_rdtsc();
 
-	printf("SC empty dequeue: %.2F\n",
-			(double)(sc_end-sc_start) / iterations);
-	printf("MC empty dequeue: %.2F\n",
-			(double)(mc_end-mc_start) / iterations);
+	test_ring_print_test_string(api_type, esize, bulk_sizes[0],
+					((double)(end - start)) / iterations);
 }
 
 /*
@@ -466,6 +463,9 @@ test_ring_perf(void)
 	if (test_burst_bulk_enqueue_dequeue(r, -1,
 			TEST_RING_M | TEST_RING_BL) < 0)
 		return -1;
+	printf("\n### Testing empty bulk deq ###\n");
+	test_empty_dequeue(r, -1, TEST_RING_S | TEST_RING_BL);
+	test_empty_dequeue(r, -1, TEST_RING_M | TEST_RING_BL);
 	rte_ring_free(r);
 
 	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
@@ -491,15 +491,15 @@ test_ring_perf(void)
 	if (test_burst_bulk_enqueue_dequeue(r, 16,
 			TEST_RING_M | TEST_RING_BL) < 0)
 		return -1;
+	printf("\n### Testing empty bulk deq ###\n");
+	test_empty_dequeue(r, 16, TEST_RING_S | TEST_RING_BL);
+	test_empty_dequeue(r, 16, TEST_RING_M | TEST_RING_BL);
 	rte_ring_free(r);
 
 	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
 		return -1;
 
-	printf("\n### Testing empty dequeue ###\n");
-	test_empty_dequeue(r);
-
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
 		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 14/17] test/ring: modify multi-lcore perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (12 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 13/17] test/ring: modify bulk empty deq " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores " Honnappa Nagarahalli
                       ` (2 subsequent siblings)
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Modify test cases to test the performance of legacy and
rte_ring_xxx_elem APIs for multi lcore scenarios.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 175 +++++++++++++++++++++++++-------------
 1 file changed, 115 insertions(+), 60 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 0f578c9ae..b893b5779 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -178,19 +178,21 @@ struct thread_params {
 };
 
 /*
- * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
- * thread running dequeue_bulk function
+ * Helper function to call bulk SP/MP enqueue functions.
+ * flag == 0 -> enqueue
+ * flag == 1 -> dequeue
  */
-static int
-enqueue_bulk(void *p)
+static __rte_always_inline int
+enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize,
+	struct thread_params *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
+	int ret;
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	struct rte_ring *r = p->r;
+	unsigned int bsize = p->size;
+	unsigned int i;
+	void *burst = NULL;
 
 #ifdef RTE_USE_C11_MEM_MODEL
 	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
@@ -200,23 +202,55 @@ enqueue_bulk(void *p)
 		while(lcore_count != 2)
 			rte_pause();
 
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
+
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				TEST_RING_ENQUEUE(r, burst, esize, bsize, ret,
+						TEST_RING_S | TEST_RING_BL);
+			else if (flag == 1)
+				TEST_RING_DEQUEUE(r, burst, esize, bsize, ret,
+						TEST_RING_S | TEST_RING_BL);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				TEST_RING_ENQUEUE(r, burst, esize, bsize, ret,
+						TEST_RING_M | TEST_RING_BL);
+			else if (flag == 1)
+				TEST_RING_DEQUEUE(r, burst, esize, bsize, ret,
+						TEST_RING_M | TEST_RING_BL);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t mp_end = rte_rdtsc();
 
-	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
-	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	p->spsc = ((double)(sp_end - sp_start))/(iterations * bsize);
+	p->mpmc = ((double)(mp_end - mp_start))/(iterations * bsize);
 	return 0;
 }
 
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, -1, params);
+}
+
 /*
  * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
  * thread running enqueue_bulk function
@@ -224,45 +258,41 @@ enqueue_bulk(void *p)
 static int
 dequeue_bulk(void *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
 	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
 
-#ifdef RTE_USE_C11_MEM_MODEL
-	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
-#else
-	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
-#endif
-		while(lcore_count != 2)
-			rte_pause();
+	return enqueue_dequeue_bulk_helper(1, -1, params);
+}
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t sc_end = rte_rdtsc();
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t mc_end = rte_rdtsc();
+	return enqueue_dequeue_bulk_helper(0, 16, params);
+}
 
-	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
-	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
-	return 0;
+/*
+ * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
+ * thread running enqueue_bulk function
+ */
+static int
+dequeue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(1, 16, params);
 }
 
 /*
  * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
  * used to measure ring perf between hyperthreads, cores and sockets.
  */
-static void
-run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
+static int
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, int esize,
 		lcore_function_t f1, lcore_function_t f2)
 {
 	struct thread_params param1 = {0}, param2 = {0};
@@ -278,14 +308,20 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
 		} else {
 			rte_eal_remote_launch(f1, &param1, cores->c1);
 			rte_eal_remote_launch(f2, &param2, cores->c2);
-			rte_eal_wait_lcore(cores->c1);
-			rte_eal_wait_lcore(cores->c2);
+			if (rte_eal_wait_lcore(cores->c1) < 0)
+				return -1;
+			if (rte_eal_wait_lcore(cores->c2) < 0)
+				return -1;
 		}
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.spsc + param2.spsc);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.mpmc + param2.mpmc);
+		test_ring_print_test_string(TEST_RING_S | TEST_RING_BL, esize,
+						bulk_sizes[i],
+						param1.spsc + param2.spsc);
+		test_ring_print_test_string(TEST_RING_M | TEST_RING_BL, esize,
+						bulk_sizes[i],
+						param1.mpmc + param2.mpmc);
 	}
+
+	return 0;
 }
 
 static rte_atomic32_t synchro;
@@ -466,6 +502,24 @@ test_ring_perf(void)
 	printf("\n### Testing empty bulk deq ###\n");
 	test_empty_dequeue(r, -1, TEST_RING_S | TEST_RING_BL);
 	test_empty_dequeue(r, -1, TEST_RING_M | TEST_RING_BL);
+	if (get_two_hyperthreads(&cores) == 0) {
+		printf("\n### Testing using two hyperthreads ###\n");
+		if (run_on_core_pair(&cores, r, -1, enqueue_bulk,
+					dequeue_bulk) < 0)
+			return -1;
+	}
+	if (get_two_cores(&cores) == 0) {
+		printf("\n### Testing using two physical cores ###\n");
+		if (run_on_core_pair(&cores, r, -1, enqueue_bulk,
+					dequeue_bulk) < 0)
+			return -1;
+	}
+	if (get_two_sockets(&cores) == 0) {
+		printf("\n### Testing using two NUMA nodes ###\n");
+		if (run_on_core_pair(&cores, r, -1, enqueue_bulk,
+					dequeue_bulk) < 0)
+			return -1;
+	}
 	rte_ring_free(r);
 
 	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
@@ -494,29 +548,30 @@ test_ring_perf(void)
 	printf("\n### Testing empty bulk deq ###\n");
 	test_empty_dequeue(r, 16, TEST_RING_S | TEST_RING_BL);
 	test_empty_dequeue(r, 16, TEST_RING_M | TEST_RING_BL);
-	rte_ring_free(r);
-
-	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
-	if (r == NULL)
-		return -1;
-
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, 16, enqueue_bulk_16B,
+					dequeue_bulk_16B) < 0)
+			return -1;
 	}
 	if (get_two_cores(&cores) == 0) {
 		printf("\n### Testing using two physical cores ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, 16, enqueue_bulk_16B,
+					dequeue_bulk_16B) < 0)
+			return -1;
 	}
 	if (get_two_sockets(&cores) == 0) {
 		printf("\n### Testing using two NUMA nodes ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, 16, enqueue_bulk_16B,
+					dequeue_bulk_16B) < 0)
+			return -1;
 	}
 
 	printf("\n### Testing using all slave nodes ###\n");
 	run_on_all_cores(r);
 
 	rte_ring_free(r);
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores perf test cases
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (13 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 14/17] test/ring: modify multi-lcore " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2020-01-02 17:00       ` Ananyev, Konstantin
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 16/17] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 17/17] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
  16 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Adjust run-on-all-cores test case to use legacy APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index b893b5779..fb95e4f2c 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -520,6 +520,9 @@ test_ring_perf(void)
 					dequeue_bulk) < 0)
 			return -1;
 	}
+	printf("\n### Testing using all slave nodes ###\n");
+	if (run_on_all_cores(r) < 0)
+		return -1;
 	rte_ring_free(r);
 
 	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
@@ -567,9 +570,6 @@ test_ring_perf(void)
 			return -1;
 	}
 
-	printf("\n### Testing using all slave nodes ###\n");
-	run_on_all_cores(r);
-
 	rte_ring_free(r);
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 16/17] lib/hash: use ring with 32b element size to save memory
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (14 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores " Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 17/17] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 97 ++++++++++++++++---------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..734bec2ac 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -136,7 +136,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	char ring_name[RTE_RING_NAMESIZE];
 	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
-	unsigned i;
 	unsigned int hw_trans_mem_support = 0, use_local_cache = 0;
 	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
@@ -213,8 +212,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
-			params->socket_id, 0);
+	r = rte_ring_create_elem(ring_name, sizeof(uint32_t),
+			rte_align32pow2(num_key_slots), params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
 		goto err;
@@ -227,7 +226,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_elem(ext_ring_name, sizeof(uint32_t),
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -294,8 +293,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * use bucket index for the linked list and 0 means NULL
 		 * for next bucket
 		 */
-		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+		for (uint32_t i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue_elem(r_ext, &i, sizeof(uint32_t));
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -433,8 +432,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	}
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
-	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+	for (uint32_t i = 1; i < num_key_slots; i++)
+		rte_ring_sp_enqueue_elem(r, &i, sizeof(uint32_t));
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +597,13 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(h->free_slots, &i, sizeof(uint32_t));
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &i,
+							sizeof(uint32_t));
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +622,14 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t));
 }
 
 /* Search a key from bucket and update its data.
@@ -923,9 +923,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
-	uint32_t new_idx, bkt_id;
+	uint32_t slot_id;
+	uint32_t ext_bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
@@ -968,8 +967,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_elem(h->free_slots,
 					cached_free_slots->objs,
+					sizeof(uint32_t),
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
 				return -ENOSPC;
@@ -982,13 +982,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t)) != 0) {
 			return -ENOSPC;
 		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1001,9 +1001,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					short_sig, new_idx, &ret_val);
+					short_sig, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1011,9 +1011,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-				short_sig, prim_bucket_idx, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1021,10 +1021,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-				short_sig, sec_bucket_idx, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, slot_id, &ret_val);
 
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1067,10 +1067,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
-						 new_idx,
+						 slot_id,
 						 __ATOMIC_RELEASE);
 				__hash_rw_writer_unlock(h);
-				return new_idx - 1;
+				return slot_id - 1;
 			}
 		}
 	}
@@ -1078,26 +1078,26 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_elem(h->free_ext_bkts, &ext_bkt_id,
+						sizeof(uint32_t)) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
+	(h->buckets_ext[ext_bkt_id - 1]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
-			 new_idx,
+	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+			 slot_id,
 			 __ATOMIC_RELEASE);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
-	last->next = &h->buckets_ext[bkt_id];
+	last->next = &h->buckets_ext[ext_bkt_id - 1];
 	__hash_rw_writer_unlock(h);
-	return new_idx - 1;
+	return slot_id - 1;
 
 failure:
 	__hash_rw_writer_unlock(h);
@@ -1373,8 +1373,9 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
 				"%s: could not enqueue free slots in global ring\n",
@@ -1383,11 +1384,11 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+							bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_elem(h->free_slots,
+				&bkt->key_idx[i], sizeof(uint32_t));
 	}
 }
 
@@ -1551,7 +1552,8 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1616,8 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1628,19 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_elem(h->free_slots, &key_idx,
+						sizeof(uint32_t));
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v7 17/17] lib/eventdev: use custom element size ring for event rings
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
                       ` (15 preceding siblings ...)
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 16/17] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2019-12-20  4:45     ` Honnappa Nagarahalli
  16 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2019-12-20  4:45 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use custom element size ring APIs to replace event ring
implementation. This avoids code duplication.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_eventdev/rte_event_ring.c | 147 ++-------------------------
 lib/librte_eventdev/rte_event_ring.h |  45 ++++----
 2 files changed, 24 insertions(+), 168 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
index 50190de01..d27e23901 100644
--- a/lib/librte_eventdev/rte_event_ring.c
+++ b/lib/librte_eventdev/rte_event_ring.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <sys/queue.h>
@@ -11,13 +12,6 @@
 #include <rte_eal_memconfig.h>
 #include "rte_event_ring.h"
 
-TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
-
-static struct rte_tailq_elem rte_event_ring_tailq = {
-	.name = RTE_TAILQ_EVENT_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_event_ring_tailq)
-
 int
 rte_event_ring_init(struct rte_event_ring *r, const char *name,
 	unsigned int count, unsigned int flags)
@@ -35,150 +29,21 @@ struct rte_event_ring *
 rte_event_ring_create(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_event_ring *r;
-	struct rte_tailq_entry *te;
-	const struct rte_memzone *mz;
-	ssize_t ring_size;
-	int mz_flags = 0;
-	struct rte_event_ring_list *ring_list = NULL;
-	const unsigned int requested_count = count;
-	int ret;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-		rte_event_ring_list);
-
-	/* for an exact size ring, round up from count to a power of two */
-	if (flags & RING_F_EXACT_SZ)
-		count = rte_align32pow2(count + 1);
-	else if (!rte_is_power_of_2(count)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
-
-	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
-		RTE_RING_MZ_PREFIX, name);
-	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-		rte_errno = ENAMETOOLONG;
-		return NULL;
-	}
-
-	te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-	if (te == NULL) {
-		RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	rte_mcfg_tailq_write_lock();
-
-	/*
-	 * reserve a memory zone for this ring. If we can't get rte_config or
-	 * we are secondary process, the memzone_reserve function will set
-	 * rte_errno for us appropriately - hence no check in this this function
-	 */
-	mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
-	if (mz != NULL) {
-		r = mz->addr;
-		/* Check return value in case rte_ring_init() fails on size */
-		int err = rte_event_ring_init(r, name, requested_count, flags);
-		if (err) {
-			RTE_LOG(ERR, RING, "Ring init failed\n");
-			if (rte_memzone_free(mz) != 0)
-				RTE_LOG(ERR, RING, "Cannot free memzone\n");
-			rte_free(te);
-			rte_mcfg_tailq_write_unlock();
-			return NULL;
-		}
-
-		te->data = (void *) r;
-		r->r.memzone = mz;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-	} else {
-		r = NULL;
-		RTE_LOG(ERR, RING, "Cannot reserve memory\n");
-		rte_free(te);
-	}
-	rte_mcfg_tailq_write_unlock();
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_create_elem(name,
+						sizeof(struct rte_event),
+						count, socket_id, flags);
 }
 
 
 struct rte_event_ring *
 rte_event_ring_lookup(const char *name)
 {
-	struct rte_tailq_entry *te;
-	struct rte_event_ring *r = NULL;
-	struct rte_event_ring_list *ring_list;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-
-	rte_mcfg_tailq_read_lock();
-
-	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_event_ring *) te->data;
-		if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
-			break;
-	}
-
-	rte_mcfg_tailq_read_unlock();
-
-	if (te == NULL) {
-		rte_errno = ENOENT;
-		return NULL;
-	}
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_lookup(name);
 }
 
 /* free the ring */
 void
 rte_event_ring_free(struct rte_event_ring *r)
 {
-	struct rte_event_ring_list *ring_list = NULL;
-	struct rte_tailq_entry *te;
-
-	if (r == NULL)
-		return;
-
-	/*
-	 * Ring was not created with rte_event_ring_create,
-	 * therefore, there is no memzone to free.
-	 */
-	if (r->r.memzone == NULL) {
-		RTE_LOG(ERR, RING,
-			"Cannot free ring (not created with rte_event_ring_create()");
-		return;
-	}
-
-	if (rte_memzone_free(r->r.memzone) != 0) {
-		RTE_LOG(ERR, RING, "Cannot free memory\n");
-		return;
-	}
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-	rte_mcfg_tailq_write_lock();
-
-	/* find out tailq entry */
-	TAILQ_FOREACH(te, ring_list, next) {
-		if (te->data == (void *) r)
-			break;
-	}
-
-	if (te == NULL) {
-		rte_mcfg_tailq_write_unlock();
-		return;
-	}
-
-	TAILQ_REMOVE(ring_list, te, next);
-
-	rte_mcfg_tailq_write_unlock();
-
-	rte_free(te);
+	rte_ring_free((struct rte_ring *)r);
 }
diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
index 827a3209e..c0861b0ec 100644
--- a/lib/librte_eventdev/rte_event_ring.h
+++ b/lib/librte_eventdev/rte_event_ring.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2016-2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 /**
@@ -19,6 +20,7 @@
 #include <rte_memory.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
@@ -88,22 +90,17 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
 		const struct rte_event *events,
 		unsigned int n, uint16_t *free_space)
 {
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
+	unsigned int num;
+	uint32_t space;
 
-	n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
+	num = rte_ring_enqueue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&space);
 
-	ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
-
-	update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
-end:
 	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
+		*free_space = space;
+
+	return num;
 }
 
 /**
@@ -129,23 +126,17 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
 		struct rte_event *events,
 		unsigned int n, uint16_t *available)
 {
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
+	unsigned int num;
+	uint32_t remaining;
 
-	DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
+	num = rte_ring_dequeue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&remaining);
 
-	update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
-
-end:
 	if (available != NULL)
-		*available = entries - n;
-	return n;
+		*available = remaining;
+
+	return num;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2020-01-02 16:31       ` Ananyev, Konstantin
  2020-01-07  5:13         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-02 16:31 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd


Hi Honnappa,

> Add basic infrastructure to test rte_ring_xxx_elem APIs. Add
> test cases for testing burst and bulk tests.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring.c | 466 ++++++++++++++++++++-----------------------
>  app/test/test_ring.h | 203 +++++++++++++++++++
>  2 files changed, 419 insertions(+), 250 deletions(-)
>  create mode 100644 app/test/test_ring.h
> 
> diff --git a/app/test/test_ring.c b/app/test/test_ring.c
> index aaf1e70ad..e7a8b468b 100644
> --- a/app/test/test_ring.c
> +++ b/app/test/test_ring.c
> @@ -23,11 +23,13 @@
>  #include <rte_branch_prediction.h>
>  #include <rte_malloc.h>
>  #include <rte_ring.h>
> +#include <rte_ring_elem.h>
>  #include <rte_random.h>
>  #include <rte_errno.h>
>  #include <rte_hexdump.h>
> 
>  #include "test.h"
> +#include "test_ring.h"
> 
>  /*
>   * Ring
> @@ -67,6 +69,50 @@ static rte_atomic32_t synchro;
> 
>  #define	TEST_RING_FULL_EMTPY_ITER	8
> 
> +static int esize[] = {-1, 4, 8, 16};
> +
> +static void
> +test_ring_mem_init(void *obj, unsigned int count, int esize)
> +{
> +	unsigned int i;
> +
> +	/* Legacy queue APIs? */
> +	if (esize == -1)
> +		for (i = 0; i < count; i++)
> +			((void **)obj)[i] = (void *)(unsigned long)i;
> +	else
> +		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
> +			((uint32_t *)obj)[i] = i;
> +}
> +
> +static void
> +test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
> +{
> +	printf("\n%s: ", istr);
> +
> +	if (esize == -1)
> +		printf("legacy APIs: ");
> +	else
> +		printf("elem APIs: element size %dB ", esize);
> +
> +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> +		return;
> +
> +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> +		printf(": default enqueue/dequeue: ");
> +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> +		printf(": SP/SC: ");
> +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> +		printf(": MP/MC: ");
> +
> +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> +		printf("single\n");
> +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> +		printf("bulk\n");
> +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> +		printf("burst\n");
> +}
> +
>  /*
>   * helper routine for test_ring_basic
>   */
> @@ -314,286 +360,203 @@ test_ring_basic(struct rte_ring *r)
>  	return -1;
>  }
> 
> +/*
> + * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
> + */
>  static int
> -test_ring_burst_basic(struct rte_ring *r)
> +test_ring_burst_bulk_tests(unsigned int api_type)
>  {
> +	struct rte_ring *r;
>  	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
>  	int ret;
> -	unsigned i;
> +	unsigned int i, j;
> +	unsigned int num_elems;
> 
> -	/* alloc dummy object pointers */
> -	src = malloc(RING_SIZE*2*sizeof(void *));
> -	if (src == NULL)
> -		goto fail;
> -
> -	for (i = 0; i < RING_SIZE*2 ; i++) {
> -		src[i] = (void *)(unsigned long)i;
> -	}
> -	cur_src = src;
> +	for (i = 0; i < RTE_DIM(esize); i++) {
> +		test_ring_print_test_string("Test standard ring", api_type,
> +						esize[i]);
> 
> -	/* alloc some room for copied objects */
> -	dst = malloc(RING_SIZE*2*sizeof(void *));
> -	if (dst == NULL)
> -		goto fail;
> +		/* Create the ring */
> +		TEST_RING_CREATE("test_ring_burst_bulk_tests", esize[i],
> +					RING_SIZE, SOCKET_ID_ANY, 0, r);
> 
> -	memset(dst, 0, RING_SIZE*2*sizeof(void *));
> -	cur_dst = dst;
> -
> -	printf("Test SP & SC basic functions \n");
> -	printf("enqueue 1 obj\n");
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
> -	cur_src += 1;
> -	if (ret != 1)
> -		goto fail;
> -
> -	printf("enqueue 2 objs\n");
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> -	cur_src += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	printf("enqueue MAX_BULK objs\n");
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -	cur_src += MAX_BULK;
> -	if (ret != MAX_BULK)
> -		goto fail;
> -
> -	printf("dequeue 1 obj\n");
> -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
> -	cur_dst += 1;
> -	if (ret != 1)
> -		goto fail;
> -
> -	printf("dequeue 2 objs\n");
> -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> -	cur_dst += 2;
> -	if (ret != 2)
> -		goto fail;
> +		/* alloc dummy object pointers */
> +		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
> +		if (src == NULL)
> +			goto fail;
> +		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
> +		cur_src = src;
> 
> -	printf("dequeue MAX_BULK objs\n");
> -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -	cur_dst += MAX_BULK;
> -	if (ret != MAX_BULK)
> -		goto fail;
> +		/* alloc some room for copied objects */
> +		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
> +		if (dst == NULL)
> +			goto fail;
> +		cur_dst = dst;
> 
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> -	}
> +		printf("enqueue 1 obj\n");
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 1, ret, api_type);
> +		if (ret != 1)
> +			goto fail;
> +		TEST_RING_INCP(cur_src, esize[i], 1);
> 
> -	cur_src = src;
> -	cur_dst = dst;
> +		printf("enqueue 2 objs\n");
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> +		if (ret != 2)
> +			goto fail;
> +		TEST_RING_INCP(cur_src, esize[i], 2);
> 
> -	printf("Test enqueue without enough memory space \n");
> -	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
> -		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -		cur_src += MAX_BULK;
> +		printf("enqueue MAX_BULK objs\n");
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK, ret,
> +						api_type);
>  		if (ret != MAX_BULK)
>  			goto fail;
> -	}
> -
> -	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> -	cur_src += 2;
> -	if (ret != 2)
> -		goto fail;
> +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> 
> -	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
> -	/* Always one free entry left */
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -	cur_src += MAX_BULK - 3;
> -	if (ret != MAX_BULK - 3)
> -		goto fail;
> -
> -	printf("Test if ring is full  \n");
> -	if (rte_ring_full(r) != 1)
> -		goto fail;
> +		printf("dequeue 1 obj\n");
> +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 1, ret, api_type);
> +		if (ret != 1)
> +			goto fail;
> +		TEST_RING_INCP(cur_dst, esize[i], 1);
> 
> -	printf("Test enqueue for a full entry  \n");
> -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -	if (ret != 0)
> -		goto fail;
> +		printf("dequeue 2 objs\n");
> +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> +		if (ret != 2)
> +			goto fail;
> +		TEST_RING_INCP(cur_dst, esize[i], 2);
> 
> -	printf("Test dequeue without enough objects \n");
> -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> -		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -		cur_dst += MAX_BULK;
> +		printf("dequeue MAX_BULK objs\n");
> +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK, ret,
> +						api_type);
>  		if (ret != MAX_BULK)
>  			goto fail;
> -	}
> -
> -	/* Available memory space for the exact MAX_BULK entries */
> -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> -	cur_dst += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -	cur_dst += MAX_BULK - 3;
> -	if (ret != MAX_BULK - 3)
> -		goto fail;
> -
> -	printf("Test if ring is empty \n");
> -	/* Check if ring is empty */
> -	if (1 != rte_ring_empty(r))
> -		goto fail;
> -
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> -	}
> +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> 
> -	cur_src = src;
> -	cur_dst = dst;
> -
> -	printf("Test MP & MC basic functions \n");
> -
> -	printf("enqueue 1 obj\n");
> -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
> -	cur_src += 1;
> -	if (ret != 1)
> -		goto fail;
> -
> -	printf("enqueue 2 objs\n");
> -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> -	cur_src += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	printf("enqueue MAX_BULK objs\n");
> -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -	cur_src += MAX_BULK;
> -	if (ret != MAX_BULK)
> -		goto fail;
> -
> -	printf("dequeue 1 obj\n");
> -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
> -	cur_dst += 1;
> -	if (ret != 1)
> -		goto fail;
> -
> -	printf("dequeue 2 objs\n");
> -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> -	cur_dst += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	printf("dequeue MAX_BULK objs\n");
> -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -	cur_dst += MAX_BULK;
> -	if (ret != MAX_BULK)
> -		goto fail;
> -
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> -	}
> -
> -	cur_src = src;
> -	cur_dst = dst;
> +		/* check data */
> +		if (memcmp(src, dst, cur_dst - dst)) {
> +			rte_hexdump(stdout, "src", src, cur_src - src);
> +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> +			printf("data after dequeue is not the same\n");
> +			goto fail;
> +		}
> +
> +		cur_src = src;
> +		cur_dst = dst;
> +
> +		printf("fill and empty the ring\n");
> +		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
> +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> +							ret, api_type);
> +			if (ret != MAX_BULK)
> +				goto fail;
> +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> +
> +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> +							ret, api_type);
> +			if (ret != MAX_BULK)
> +				goto fail;
> +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> +		}
> 
> -	printf("fill and empty the ring\n");
> -	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
> -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -		cur_src += MAX_BULK;
> -		if (ret != MAX_BULK)
> +		/* check data */
> +		if (memcmp(src, dst, cur_dst - dst)) {
> +			rte_hexdump(stdout, "src", src, cur_src - src);
> +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> +			printf("data after dequeue is not the same\n");
>  			goto fail;
> -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -		cur_dst += MAX_BULK;
> -		if (ret != MAX_BULK)
> +		}
> +
> +		cur_src = src;
> +		cur_dst = dst;
> +
> +		printf("Test enqueue without enough memory space\n");
> +		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
> +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> +							ret, api_type);
> +			if (ret != MAX_BULK)
> +				goto fail;
> +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> +		}
> +
> +		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> +		if (ret != 2)
>  			goto fail;
> -	}
> -
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> -	}
> -
> -	cur_src = src;
> -	cur_dst = dst;
> -
> -	printf("Test enqueue without enough memory space \n");
> -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -		cur_src += MAX_BULK;
> -		if (ret != MAX_BULK)
> +		TEST_RING_INCP(cur_src, esize[i], 2);
> +
> +
> +		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
> +		/* Bulk APIs enqueue exact number of elements */
> +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> +			num_elems = MAX_BULK - 3;
> +		else
> +			num_elems = MAX_BULK;
> +		/* Always one free entry left */
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], num_elems,
> +						ret, api_type);
> +		if (ret != MAX_BULK - 3)
>  			goto fail;
> -	}
> -
> -	/* Available memory space for the exact MAX_BULK objects */
> -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> -	cur_src += 2;
> -	if (ret != 2)
> -		goto fail;
> +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK - 3);
> 
> -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> -	cur_src += MAX_BULK - 3;
> -	if (ret != MAX_BULK - 3)
> -		goto fail;
> +		printf("Test if ring is full\n");
> +		if (rte_ring_full(r) != 1)
> +			goto fail;
> 
> +		printf("Test enqueue for a full entry\n");
> +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> +						ret, api_type);
> +		if (ret != 0)
> +			goto fail;
> 
> -	printf("Test dequeue without enough objects \n");
> -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -		cur_dst += MAX_BULK;
> -		if (ret != MAX_BULK)
> +		printf("Test dequeue without enough objects\n");
> +		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
> +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> +							ret, api_type);
> +			if (ret != MAX_BULK)
> +				goto fail;
> +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> +		}
> +
> +		/* Available memory space for the exact MAX_BULK entries */
> +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> +		if (ret != 2)
>  			goto fail;
> -	}
> +		TEST_RING_INCP(cur_dst, esize[i], 2);
> +
> +		/* Bulk APIs enqueue exact number of elements */
> +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> +			num_elems = MAX_BULK - 3;
> +		else
> +			num_elems = MAX_BULK;
> +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], num_elems,
> +						ret, api_type);
> +		if (ret != MAX_BULK - 3)
> +			goto fail;
> +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK - 3);
> 
> -	/* Available objects - the exact MAX_BULK */
> -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> -	cur_dst += 2;
> -	if (ret != 2)
> -		goto fail;
> +		printf("Test if ring is empty\n");
> +		/* Check if ring is empty */
> +		if (rte_ring_empty(r) != 1)
> +			goto fail;
> 
> -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> -	cur_dst += MAX_BULK - 3;
> -	if (ret != MAX_BULK - 3)
> -		goto fail;
> +		/* check data */
> +		if (memcmp(src, dst, cur_dst - dst)) {
> +			rte_hexdump(stdout, "src", src, cur_src - src);
> +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> +			printf("data after dequeue is not the same\n");
> +			goto fail;
> +		}
> 
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> +		/* Free memory before test completed */
> +		rte_ring_free(r);
> +		rte_free(src);
> +		rte_free(dst);
>  	}
> 
> -	cur_src = src;
> -	cur_dst = dst;
> -
> -	printf("Covering rte_ring_enqueue_burst functions \n");
> -
> -	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
> -	cur_src += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
> -	cur_dst += 2;
> -	if (ret != 2)
> -		goto fail;
> -
> -	/* Free memory before test completed */
> -	free(src);
> -	free(dst);
>  	return 0;
> -
> - fail:
> -	free(src);
> -	free(dst);
> +fail:
> +	rte_ring_free(r);
> +	rte_free(src);
> +	rte_free(dst);
>  	return -1;
>  }
> 
> @@ -810,6 +773,7 @@ test_ring_with_exact_size(void)
>  static int
>  test_ring(void)
>  {
> +	unsigned int i, j;
>  	struct rte_ring *r = NULL;
> 
>  	/* some more basic operations */
> @@ -828,9 +792,11 @@ test_ring(void)
>  		goto test_fail;
>  	}
> 
> -	/* burst operations */
> -	if (test_ring_burst_basic(r) < 0)
> -		goto test_fail;
> +	/* Burst and bulk operations with sp/sc, mp/mc and default */
> +	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
> +		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
> +			if (test_ring_burst_bulk_tests(i | j) < 0)
> +				goto test_fail;
> 
>  	/* basic operations */
>  	if (test_ring_basic(r) < 0)
> diff --git a/app/test/test_ring.h b/app/test/test_ring.h
> new file mode 100644
> index 000000000..19ef1b399
> --- /dev/null
> +++ b/app/test/test_ring.h
> @@ -0,0 +1,203 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Arm Limited
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_ring.h>
> +#include <rte_ring_elem.h>
> +
> +/* API type to call
> + * N - Calls default APIs
> + * S - Calls SP or SC API
> + * M - Calls MP or MC API
> + */
> +#define TEST_RING_N 1
> +#define TEST_RING_S 2
> +#define TEST_RING_M 4
> +
> +/* API type to call
> + * SL - Calls single element APIs
> + * BL - Calls bulk APIs
> + * BR - Calls burst APIs
> + */
> +#define TEST_RING_SL 8
> +#define TEST_RING_BL 16
> +#define TEST_RING_BR 32
> +
> +#define TEST_RING_IGNORE_API_TYPE ~0U
> +
> +#define TEST_RING_INCP(obj, esize, n) do { \
> +	/* Legacy queue APIs? */ \
> +	if ((esize) == -1) \
> +		obj = ((void **)obj) + n; \
> +	else \
> +		obj = (void **)(((uint32_t *)obj) + \
> +					(n * esize / sizeof(uint32_t))); \
> +} while (0)
> +
> +#define TEST_RING_CREATE(name, esize, count, socket_id, flags, r) do { \
> +	/* Legacy queue APIs? */ \
> +	if ((esize) == -1) \
> +		r = rte_ring_create((name), (count), (socket_id), (flags)); \
> +	else \
> +		r = rte_ring_create_elem((name), (esize), (count), \
> +						(socket_id), (flags)); \
> +} while (0)
> +
> +#define TEST_RING_ENQUEUE(r, obj, esize, n, ret, api_type) do { \
> +	/* Legacy queue APIs? */ \
> +	if ((esize) == -1) \
> +		switch (api_type) { \
> +		case (TEST_RING_N | TEST_RING_SL): \
> +			ret = rte_ring_enqueue(r, obj); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_SL): \
> +			ret = rte_ring_sp_enqueue(r, obj); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_SL): \
> +			ret = rte_ring_mp_enqueue(r, obj); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BL): \
> +			ret = rte_ring_enqueue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BL): \
> +			ret = rte_ring_sp_enqueue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BL): \
> +			ret = rte_ring_mp_enqueue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BR): \
> +			ret = rte_ring_enqueue_burst(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BR): \
> +			ret = rte_ring_sp_enqueue_burst(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BR): \
> +			ret = rte_ring_mp_enqueue_burst(r, obj, n, NULL); \
> +		} \
> +	else \
> +		switch (api_type) { \
> +		case (TEST_RING_N | TEST_RING_SL): \
> +			ret = rte_ring_enqueue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_SL): \
> +			ret = rte_ring_sp_enqueue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_SL): \
> +			ret = rte_ring_mp_enqueue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BL): \
> +			ret = rte_ring_enqueue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BL): \
> +			ret = rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BL): \
> +			ret = rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BR): \
> +			ret = rte_ring_enqueue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BR): \
> +			ret = rte_ring_sp_enqueue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BR): \
> +			ret = rte_ring_mp_enqueue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +		} \
> +} while (0)
> +
> +#define TEST_RING_DEQUEUE(r, obj, esize, n, ret, api_type) do { \
> +	/* Legacy queue APIs? */ \
> +	if ((esize) == -1) \
> +		switch (api_type) { \
> +		case (TEST_RING_N | TEST_RING_SL): \
> +			ret = rte_ring_dequeue(r, obj); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_SL): \
> +			ret = rte_ring_sc_dequeue(r, obj); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_SL): \
> +			ret = rte_ring_mc_dequeue(r, obj); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BL): \
> +			ret = rte_ring_dequeue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BL): \
> +			ret = rte_ring_sc_dequeue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BL): \
> +			ret = rte_ring_mc_dequeue_bulk(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BR): \
> +			ret = rte_ring_dequeue_burst(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BR): \
> +			ret = rte_ring_sc_dequeue_burst(r, obj, n, NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BR): \
> +			ret = rte_ring_mc_dequeue_burst(r, obj, n, NULL); \
> +		} \
> +	else \
> +		switch (api_type) { \
> +		case (TEST_RING_N | TEST_RING_SL): \
> +			ret = rte_ring_dequeue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_SL): \
> +			ret = rte_ring_sc_dequeue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_SL): \
> +			ret = rte_ring_mc_dequeue_elem(r, obj, esize); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BL): \
> +			ret = rte_ring_dequeue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BL): \
> +			ret = rte_ring_sc_dequeue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BL): \
> +			ret = rte_ring_mc_dequeue_bulk_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_N | TEST_RING_BR): \
> +			ret = rte_ring_dequeue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_S | TEST_RING_BR): \
> +			ret = rte_ring_sc_dequeue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +			break; \
> +		case (TEST_RING_M | TEST_RING_BR): \
> +			ret = rte_ring_mc_dequeue_burst_elem(r, obj, esize, n, \
> +								NULL); \
> +		} \
> +} while (0)


My thought to avoid test-code duplication was a bit different.
Instead of adding extra enums/parameters and then do switch on them,
my intention was something like that:

1. mv  test_ring_perf.c test_ring_perf.h
2. Inside test_ring_perf.h change rte_ring_ create/enqueue/dequeue function
    calls to some not-defined function/macros invocations. 
   With similar name, same number of parameters, and same semantics.
   Also change 'void *burst[..]' to 'RING_ELEM[...]';
3. For each test configuration we want to have (default, 4B, 8B, 16B) 
    create a new .c file where we:
    - define used in test_ring_perf.h macros(/function)
   - include test_ring_perf.h
   -  REGISTER_TEST_COMMAND(<test_name>, test_ring_perf);

As an example:
test_ring_perf.h:
...
static int
enqueue_bulk(void *p)
{
        ...
        RING_ELEM burst[MAX_BURST];

        memset(burst, 0, sizeof(burst));
        ....
        const uint64_t sp_start = rte_rdtsc();
        for (i = 0; i < iterations; i++)
                while (RING_SP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
                        rte_pause();
        const uint64_t sp_end = rte_rdtsc();

        const uint64_t mp_start = rte_rdtsc();
        for (i = 0; i < iterations; i++)
                while (RING_MP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
                        rte_pause();
        const uint64_t mp_end = rte_rdtsc();
        ....

Then in test_ring_perf.c:

....
#define RING_ELEM	void *
...
#define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
       rte_ring_sp_enqueue_bulk(ring, buf, size, spc)
....

#include "test_ring_perf.h"
REGISTER_TEST_COMMAND(ring_perf_autotest, test_ring_perf);


In test_ring_elem16B_perf.c:
....
#define RING_ELEM	__uint128_t
#define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \	
	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size, spc)
....
#include "test_ring_perf.h"
REGISTER_TEST_COMMAND(ring_perf_elem16B_autotest, test_ring_perf);

In test_ring_elem4B_per.c:

....
#define RING_ELEM	uint32_t
#define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \	
	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size, spc)
....
#include "test_ring_perf.h"
REGISTER_TEST_COMMAND(ring_perf_elem4B_autotest, test_ring_perf);

And so on.

> +
> +/* This function is placed here as it is required for both
> + * performance and functional tests.
> + */
> +static __rte_always_inline void *
> +test_ring_calloc(unsigned int rsize, int esize)
> +{
> +	unsigned int sz;
> +	void *p;
> +
> +	/* Legacy queue APIs? */
> +	if (esize == -1)
> +		sz = sizeof(void *);
> +	else
> +		sz = esize;
> +
> +	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
> +	if (p == NULL)
> +		printf("Failed to allocate memory\n");
> +
> +	return p;
> +}
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2020-01-02 16:42       ` Ananyev, Konstantin
  2020-01-07  5:35         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-02 16:42 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd


> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> new file mode 100644
> index 000000000..fc7fe127c
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -0,0 +1,1002 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2019 Arm Limited
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_ELEM_H_
> +#define _RTE_RING_ELEM_H_
> +
> +/**
> + * @file
> + * RTE Ring with user defined element size
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <sys/queue.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +
> +#include "rte_ring.h"
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Calculate the memory size needed for a ring with given element size
> + *
> + * This function returns the number of bytes needed for a ring, given
> + * the number of elements in it and the size of the element. This value
> + * is the sum of the size of the structure rte_ring and the size of the
> + * memory needed for storing the elements. The value is aligned to a cache
> + * line size.
> + *
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @return
> + *   - The memory size needed for the ring on success.
> + *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
> + *		 power of 2.
> + */
> +__rte_experimental
> +ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a new ring named *name* that stores elements with given size.
> + *
> + * This function uses ``memzone_reserve()`` to allocate memory. Then it
> + * calls rte_ring_init() to initialize an empty ring.
> + *
> + * The new ring size is set to *count*, which must be a power of
> + * two. Water marking is disabled by default. The real usable ring size
> + * is *count-1* instead of *count* to differentiate a free ring from an
> + * empty ring.
> + *
> + * The ring is added in RTE_TAILQ_RING list.
> + *
> + * @param name
> + *   The name of the ring.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + * @param count
> + *   The number of elements in the ring (must be a power of 2).
> + * @param socket_id
> + *   The *socket_id* argument is the socket identifier in case of
> + *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
> + *   constraint for the reserved zone.
> + * @param flags
> + *   An OR of the following:
> + *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
> + *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
> + *      is "single-producer". Otherwise, it is "multi-producers".
> + *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
> + *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
> + *      is "single-consumer". Otherwise, it is "multi-consumers".
> + * @return
> + *   On success, the pointer to the new allocated ring. NULL on error with
> + *    rte_errno set appropriately. Possible errno values include:
> + *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
> + *    - E_RTE_SECONDARY - function was called from a secondary process instance
> + *    - EINVAL - esize is not a multiple of 4 or count provided is not a
> + *		 power of 2.
> + *    - ENOSPC - the maximum number of memzones has already been allocated
> + *    - EEXIST - a memzone with the same name already exists
> + *    - ENOMEM - no appropriate memory area found in which to create memzone
> + */
> +__rte_experimental
> +struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
> +			unsigned int count, int socket_id, unsigned int flags);
> +
> +static __rte_always_inline void
> +enqueue_elems_32(struct rte_ring *r, uint32_t idx,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	const uint32_t size = r->size;
> +	uint32_t *ring = (uint32_t *)&r[1];
> +	const uint32_t *obj = (const uint32_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> +			ring[idx] = obj[i];
> +			ring[idx + 1] = obj[i + 1];
> +			ring[idx + 2] = obj[i + 2];
> +			ring[idx + 3] = obj[i + 3];
> +			ring[idx + 4] = obj[i + 4];
> +			ring[idx + 5] = obj[i + 5];
> +			ring[idx + 6] = obj[i + 6];
> +			ring[idx + 7] = obj[i + 7];
> +		}
> +		switch (n & 0x7) {
> +		case 7:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 6:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 5:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 4:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 3:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			ring[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			ring[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	const uint32_t size = r->size;
> +	uint32_t idx = prod_head & r->mask;
> +	uint64_t *ring = (uint64_t *)&r[1];
> +	const uint64_t *obj = (const uint64_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> +			ring[idx] = obj[i];
> +			ring[idx + 1] = obj[i + 1];
> +			ring[idx + 2] = obj[i + 2];
> +			ring[idx + 3] = obj[i + 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			ring[idx++] = obj[i++];
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			ring[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			ring[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	const uint32_t size = r->size;
> +	uint32_t idx = prod_head & r->mask;
> +	__uint128_t *ring = (__uint128_t *)&r[1];
> +	const __uint128_t *obj = (const __uint128_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2) {
> +			ring[idx] = obj[i];
> +			ring[idx + 1] = obj[i + 1];


AFAIK, that implies 16B aligned obj_table...
Would it always be the case?  

> +		}
> +		switch (n & 0x1) {
> +		case 1:
> +			ring[idx++] = obj[i++];
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			ring[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			ring[idx] = obj[i];
> +	}
> +}
> +
> +/* the actual enqueue of elements on the ring.
> + * Placed here since identical code needed in both
> + * single and multi producer enqueue functions.
> + */
> +static __rte_always_inline void
> +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
> +		uint32_t esize, uint32_t num)
> +{
> +	uint32_t idx, nr_idx, nr_num;
> +
> +	/* 8B and 16B copies implemented individually to retain
> +	 * the current performance.
> +	 */
> +	if (esize == 8)
> +		enqueue_elems_64(r, prod_head, obj_table, num);
> +	else if (esize == 16)
> +		enqueue_elems_128(r, prod_head, obj_table, num);
> +	else {
> +		/* Normalize to uint32_t */
> +		uint32_t scale = esize / sizeof(uint32_t);
> +		nr_num = num * scale;
> +		idx = prod_head & r->mask;
> +		nr_idx = idx * scale;
> +		enqueue_elems_32(r, nr_idx, obj_table, nr_num);
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst enq/deq perf test cases
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst " Honnappa Nagarahalli
@ 2020-01-02 16:57       ` Ananyev, Konstantin
  2020-01-07  5:42         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-02 16:57 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd



> 
> Add test cases to test legacy and rte_ring_xxx_elem APIs for
> burst enqueue/dequeue test cases.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring_perf.c | 78 ++++++++++++++++++++-------------------
>  1 file changed, 40 insertions(+), 38 deletions(-)
> 
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index 5829718c1..508c688dc 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -397,47 +397,40 @@ test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
>  }
> 
>  /*
> - * Test that does both enqueue and dequeue on a core using the burst() API calls
> - * instead of the bulk() calls used in other tests. Results should be the same
> - * as for the bulk function called on a single lcore.
> + * Test that does both enqueue and dequeue on a core using the burst/bulk API
> + * calls Results should be the same as for the bulk function called on a
> + * single lcore.
>   */
> -static void
> -test_burst_enqueue_dequeue(struct rte_ring *r)
> +static int
> +test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
> +	const unsigned int api_type)
>  {
> -	const unsigned iter_shift = 23;
> -	const unsigned iterations = 1<<iter_shift;
> -	unsigned sz, i = 0;
> -	void *burst[MAX_BURST] = {0};
> +	int ret;
> +	const unsigned int iter_shift = 23;
> +	const unsigned int iterations = 1 << iter_shift;
> +	unsigned int sz, i = 0;
> +	void **burst = NULL;
> 
> -	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
> -		const uint64_t sc_start = rte_rdtsc();
> -		for (i = 0; i < iterations; i++) {
> -			rte_ring_sp_enqueue_burst(r, burst,
> -					bulk_sizes[sz], NULL);
> -			rte_ring_sc_dequeue_burst(r, burst,
> -					bulk_sizes[sz], NULL);
> -		}
> -		const uint64_t sc_end = rte_rdtsc();
> +	(void)ret;
> +	burst = test_ring_calloc(MAX_BURST, esize);
> +	if (burst == NULL)
> +		return -1;
> 
> -		const uint64_t mc_start = rte_rdtsc();
> +	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
> +		const uint64_t start = rte_rdtsc();
>  		for (i = 0; i < iterations; i++) {
> -			rte_ring_mp_enqueue_burst(r, burst,
> -					bulk_sizes[sz], NULL);
> -			rte_ring_mc_dequeue_burst(r, burst,
> -					bulk_sizes[sz], NULL);
> +			TEST_RING_ENQUEUE(r, burst, esize, bulk_sizes[sz],
> +						ret, api_type);
> +			TEST_RING_DEQUEUE(r, burst, esize, bulk_sizes[sz],
> +						ret, api_type);
>  		}
> -		const uint64_t mc_end = rte_rdtsc();
> -
> -		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
> -					bulk_sizes[sz];
> -		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
> -					bulk_sizes[sz];
> +		const uint64_t end = rte_rdtsc();
> 
> -		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
> -				bulk_sizes[sz], sc_avg);
> -		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
> -				bulk_sizes[sz], mc_avg);
> +		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
> +					((double)(end - start)) / iterations);
>  	}
> +

missing rte_free(burst);
?

> +	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores perf test cases
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores " Honnappa Nagarahalli
@ 2020-01-02 17:00       ` Ananyev, Konstantin
  2020-01-07  5:42         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-02 17:00 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

> 
> Adjust run-on-all-cores test case to use legacy APIs.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring_perf.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index b893b5779..fb95e4f2c 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -520,6 +520,9 @@ test_ring_perf(void)
>  					dequeue_bulk) < 0)
>  			return -1;
>  	}
> +	printf("\n### Testing using all slave nodes ###\n");
> +	if (run_on_all_cores(r) < 0)
> +		return -1;
>  	rte_ring_free(r);
> 
>  	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0, r);
> @@ -567,9 +570,6 @@ test_ring_perf(void)
>  			return -1;
>  	}
> 
> -	printf("\n### Testing using all slave nodes ###\n");
> -	run_on_all_cores(r);
> -

No run_on_all_cores() for 16B elems case?

>  	rte_ring_free(r);
> 
>  	return 0;
>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases
  2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases Honnappa Nagarahalli
@ 2020-01-02 17:03       ` Ananyev, Konstantin
  2020-01-07  5:54         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-02 17:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

> 
> Add test cases to test rte_ring_xxx_elem APIs for single
> element enqueue/dequeue test cases.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring_perf.c | 100 ++++++++++++++++++++++++++++++--------
>  1 file changed, 80 insertions(+), 20 deletions(-)
> 
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index 6c2aca483..5829718c1 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -13,6 +13,7 @@
>  #include <string.h>
> 
>  #include "test.h"
> +#include "test_ring.h"
> 
>  /*
>   * Ring
> @@ -41,6 +42,35 @@ struct lcore_pair {
> 
>  static volatile unsigned lcore_count = 0;
> 
> +static void
> +test_ring_print_test_string(unsigned int api_type, int esize,
> +	unsigned int bsz, double value)
> +{
> +	if (esize == -1)
> +		printf("legacy APIs");
> +	else
> +		printf("elem APIs: element size %dB", esize);
> +
> +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> +		return;
> +
> +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> +		printf(": default enqueue/dequeue: ");
> +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> +		printf(": SP/SC: ");
> +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> +		printf(": MP/MC: ");
> +
> +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> +		printf("single: ");
> +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> +		printf("bulk (size: %u): ", bsz);
> +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> +		printf("burst (size: %u): ", bsz);
> +
> +	printf("%.2F\n", value);
> +}
> +
>  /**** Functions to analyse our core mask to get cores for different tests ***/
> 
>  static int
> @@ -335,32 +365,35 @@ run_on_all_cores(struct rte_ring *r)
>   * Test function that determines how long an enqueue + dequeue of a single item
>   * takes on a single lcore. Result is for comparison with the bulk enq+deq.
>   */
> -static void
> -test_single_enqueue_dequeue(struct rte_ring *r)
> +static int
> +test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
> +	const unsigned int api_type)
>  {
> -	const unsigned iter_shift = 24;
> -	const unsigned iterations = 1<<iter_shift;
> -	unsigned i = 0;
> +	int ret;
> +	const unsigned int iter_shift = 24;
> +	const unsigned int iterations = 1 << iter_shift;
> +	unsigned int i = 0;
>  	void *burst = NULL;
> 
> -	const uint64_t sc_start = rte_rdtsc();
> -	for (i = 0; i < iterations; i++) {
> -		rte_ring_sp_enqueue(r, burst);
> -		rte_ring_sc_dequeue(r, &burst);
> -	}
> -	const uint64_t sc_end = rte_rdtsc();
> +	(void)ret;

Here, and in few other places, looks redundant.

> +	/* alloc dummy object pointers */
> +	burst = test_ring_calloc(1, esize);
> +	if (burst == NULL)
> +		return -1;
> 
> -	const uint64_t mc_start = rte_rdtsc();
> +	const uint64_t start = rte_rdtsc();
>  	for (i = 0; i < iterations; i++) {
> -		rte_ring_mp_enqueue(r, burst);
> -		rte_ring_mc_dequeue(r, &burst);
> +		TEST_RING_ENQUEUE(r, burst, esize, 1, ret, api_type);
> +		TEST_RING_DEQUEUE(r, burst, esize, 1, ret, api_type);
>  	}
> -	const uint64_t mc_end = rte_rdtsc();
> +	const uint64_t end = rte_rdtsc();
> +
> +	test_ring_print_test_string(api_type, esize, 1,
> +					((double)(end - start)) / iterations);
> +
> +	rte_free(burst);
> 
> -	printf("SP/SC single enq/dequeue: %.2F\n",
> -			((double)(sc_end-sc_start)) / iterations);
> -	printf("MP/MC single enq/dequeue: %.2F\n",
> -			((double)(mc_end-mc_start)) / iterations);
> +	return 0;
>  }
> 
>  /*

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-02 16:31       ` Ananyev, Konstantin
@ 2020-01-07  5:13         ` Honnappa Nagarahalli
  2020-01-07 16:03           ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  5:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

> Hi Honnappa,
Thanks Konstantin for your feedback.

> 
> > Add basic infrastructure to test rte_ring_xxx_elem APIs. Add test
> > cases for testing burst and bulk tests.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring.c | 466
> > ++++++++++++++++++++-----------------------
> >  app/test/test_ring.h | 203 +++++++++++++++++++
> >  2 files changed, 419 insertions(+), 250 deletions(-)  create mode
> > 100644 app/test/test_ring.h
> >
> > diff --git a/app/test/test_ring.c b/app/test/test_ring.c index
> > aaf1e70ad..e7a8b468b 100644
> > --- a/app/test/test_ring.c
> > +++ b/app/test/test_ring.c
> > @@ -23,11 +23,13 @@
> >  #include <rte_branch_prediction.h>
> >  #include <rte_malloc.h>
> >  #include <rte_ring.h>
> > +#include <rte_ring_elem.h>
> >  #include <rte_random.h>
> >  #include <rte_errno.h>
> >  #include <rte_hexdump.h>
> >
> >  #include "test.h"
> > +#include "test_ring.h"
> >
> >  /*
> >   * Ring
> > @@ -67,6 +69,50 @@ static rte_atomic32_t synchro;
> >
> >  #define	TEST_RING_FULL_EMTPY_ITER	8
> >
> > +static int esize[] = {-1, 4, 8, 16};
> > +
> > +static void
> > +test_ring_mem_init(void *obj, unsigned int count, int esize) {
> > +	unsigned int i;
> > +
> > +	/* Legacy queue APIs? */
> > +	if (esize == -1)
> > +		for (i = 0; i < count; i++)
> > +			((void **)obj)[i] = (void *)(unsigned long)i;
> > +	else
> > +		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
> > +			((uint32_t *)obj)[i] = i;
> > +}
> > +
> > +static void
> > +test_ring_print_test_string(const char *istr, unsigned int api_type,
> > +int esize) {
> > +	printf("\n%s: ", istr);
> > +
> > +	if (esize == -1)
> > +		printf("legacy APIs: ");
> > +	else
> > +		printf("elem APIs: element size %dB ", esize);
> > +
> > +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> > +		return;
> > +
> > +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> > +		printf(": default enqueue/dequeue: ");
> > +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> > +		printf(": SP/SC: ");
> > +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> > +		printf(": MP/MC: ");
> > +
> > +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> > +		printf("single\n");
> > +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > +		printf("bulk\n");
> > +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> > +		printf("burst\n");
> > +}
> > +
> >  /*
> >   * helper routine for test_ring_basic
> >   */
> > @@ -314,286 +360,203 @@ test_ring_basic(struct rte_ring *r)
> >  	return -1;
> >  }
> >
> > +/*
> > + * Burst and bulk operations with sp/sc, mp/mc and default (during
> > +creation)  */
> >  static int
> > -test_ring_burst_basic(struct rte_ring *r)
> > +test_ring_burst_bulk_tests(unsigned int api_type)
> >  {
> > +	struct rte_ring *r;
> >  	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
> >  	int ret;
> > -	unsigned i;
> > +	unsigned int i, j;
> > +	unsigned int num_elems;
> >
> > -	/* alloc dummy object pointers */
> > -	src = malloc(RING_SIZE*2*sizeof(void *));
> > -	if (src == NULL)
> > -		goto fail;
> > -
> > -	for (i = 0; i < RING_SIZE*2 ; i++) {
> > -		src[i] = (void *)(unsigned long)i;
> > -	}
> > -	cur_src = src;
> > +	for (i = 0; i < RTE_DIM(esize); i++) {
> > +		test_ring_print_test_string("Test standard ring", api_type,
> > +						esize[i]);
> >
> > -	/* alloc some room for copied objects */
> > -	dst = malloc(RING_SIZE*2*sizeof(void *));
> > -	if (dst == NULL)
> > -		goto fail;
> > +		/* Create the ring */
> > +		TEST_RING_CREATE("test_ring_burst_bulk_tests", esize[i],
> > +					RING_SIZE, SOCKET_ID_ANY, 0, r);
> >
> > -	memset(dst, 0, RING_SIZE*2*sizeof(void *));
> > -	cur_dst = dst;
> > -
> > -	printf("Test SP & SC basic functions \n");
> > -	printf("enqueue 1 obj\n");
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
> > -	cur_src += 1;
> > -	if (ret != 1)
> > -		goto fail;
> > -
> > -	printf("enqueue 2 objs\n");
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> > -	cur_src += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	printf("enqueue MAX_BULK objs\n");
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > -	cur_src += MAX_BULK;
> > -	if (ret != MAX_BULK)
> > -		goto fail;
> > -
> > -	printf("dequeue 1 obj\n");
> > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
> > -	cur_dst += 1;
> > -	if (ret != 1)
> > -		goto fail;
> > -
> > -	printf("dequeue 2 objs\n");
> > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> > -	cur_dst += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > +		/* alloc dummy object pointers */
> > +		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > +		if (src == NULL)
> > +			goto fail;
> > +		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
> > +		cur_src = src;
> >
> > -	printf("dequeue MAX_BULK objs\n");
> > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > -	cur_dst += MAX_BULK;
> > -	if (ret != MAX_BULK)
> > -		goto fail;
> > +		/* alloc some room for copied objects */
> > +		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > +		if (dst == NULL)
> > +			goto fail;
> > +		cur_dst = dst;
> >
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > -	}
> > +		printf("enqueue 1 obj\n");
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 1, ret, api_type);
> > +		if (ret != 1)
> > +			goto fail;
> > +		TEST_RING_INCP(cur_src, esize[i], 1);
> >
> > -	cur_src = src;
> > -	cur_dst = dst;
> > +		printf("enqueue 2 objs\n");
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> > +		if (ret != 2)
> > +			goto fail;
> > +		TEST_RING_INCP(cur_src, esize[i], 2);
> >
> > -	printf("Test enqueue without enough memory space \n");
> > -	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
> > -		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK,
> NULL);
> > -		cur_src += MAX_BULK;
> > +		printf("enqueue MAX_BULK objs\n");
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK, ret,
> > +						api_type);
> >  		if (ret != MAX_BULK)
> >  			goto fail;
> > -	}
> > -
> > -	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> > -	cur_src += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> >
> > -	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
> > -	/* Always one free entry left */
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > -	cur_src += MAX_BULK - 3;
> > -	if (ret != MAX_BULK - 3)
> > -		goto fail;
> > -
> > -	printf("Test if ring is full  \n");
> > -	if (rte_ring_full(r) != 1)
> > -		goto fail;
> > +		printf("dequeue 1 obj\n");
> > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 1, ret, api_type);
> > +		if (ret != 1)
> > +			goto fail;
> > +		TEST_RING_INCP(cur_dst, esize[i], 1);
> >
> > -	printf("Test enqueue for a full entry  \n");
> > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > -	if (ret != 0)
> > -		goto fail;
> > +		printf("dequeue 2 objs\n");
> > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> > +		if (ret != 2)
> > +			goto fail;
> > +		TEST_RING_INCP(cur_dst, esize[i], 2);
> >
> > -	printf("Test dequeue without enough objects \n");
> > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > -		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK,
> NULL);
> > -		cur_dst += MAX_BULK;
> > +		printf("dequeue MAX_BULK objs\n");
> > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK, ret,
> > +						api_type);
> >  		if (ret != MAX_BULK)
> >  			goto fail;
> > -	}
> > -
> > -	/* Available memory space for the exact MAX_BULK entries */
> > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> > -	cur_dst += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > -	cur_dst += MAX_BULK - 3;
> > -	if (ret != MAX_BULK - 3)
> > -		goto fail;
> > -
> > -	printf("Test if ring is empty \n");
> > -	/* Check if ring is empty */
> > -	if (1 != rte_ring_empty(r))
> > -		goto fail;
> > -
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > -	}
> > +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> >
> > -	cur_src = src;
> > -	cur_dst = dst;
> > -
> > -	printf("Test MP & MC basic functions \n");
> > -
> > -	printf("enqueue 1 obj\n");
> > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
> > -	cur_src += 1;
> > -	if (ret != 1)
> > -		goto fail;
> > -
> > -	printf("enqueue 2 objs\n");
> > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> > -	cur_src += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	printf("enqueue MAX_BULK objs\n");
> > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > -	cur_src += MAX_BULK;
> > -	if (ret != MAX_BULK)
> > -		goto fail;
> > -
> > -	printf("dequeue 1 obj\n");
> > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
> > -	cur_dst += 1;
> > -	if (ret != 1)
> > -		goto fail;
> > -
> > -	printf("dequeue 2 objs\n");
> > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> > -	cur_dst += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	printf("dequeue MAX_BULK objs\n");
> > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > -	cur_dst += MAX_BULK;
> > -	if (ret != MAX_BULK)
> > -		goto fail;
> > -
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > -	}
> > -
> > -	cur_src = src;
> > -	cur_dst = dst;
> > +		/* check data */
> > +		if (memcmp(src, dst, cur_dst - dst)) {
> > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > +			printf("data after dequeue is not the same\n");
> > +			goto fail;
> > +		}
> > +
> > +		cur_src = src;
> > +		cur_dst = dst;
> > +
> > +		printf("fill and empty the ring\n");
> > +		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
> > +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > +							ret, api_type);
> > +			if (ret != MAX_BULK)
> > +				goto fail;
> > +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> > +
> > +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> > +							ret, api_type);
> > +			if (ret != MAX_BULK)
> > +				goto fail;
> > +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> > +		}
> >
> > -	printf("fill and empty the ring\n");
> > -	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
> > -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK,
> NULL);
> > -		cur_src += MAX_BULK;
> > -		if (ret != MAX_BULK)
> > +		/* check data */
> > +		if (memcmp(src, dst, cur_dst - dst)) {
> > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > +			printf("data after dequeue is not the same\n");
> >  			goto fail;
> > -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK,
> NULL);
> > -		cur_dst += MAX_BULK;
> > -		if (ret != MAX_BULK)
> > +		}
> > +
> > +		cur_src = src;
> > +		cur_dst = dst;
> > +
> > +		printf("Test enqueue without enough memory space\n");
> > +		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
> > +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > +							ret, api_type);
> > +			if (ret != MAX_BULK)
> > +				goto fail;
> > +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> > +		}
> > +
> > +		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> > +		if (ret != 2)
> >  			goto fail;
> > -	}
> > -
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > -	}
> > -
> > -	cur_src = src;
> > -	cur_dst = dst;
> > -
> > -	printf("Test enqueue without enough memory space \n");
> > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK,
> NULL);
> > -		cur_src += MAX_BULK;
> > -		if (ret != MAX_BULK)
> > +		TEST_RING_INCP(cur_src, esize[i], 2);
> > +
> > +
> > +		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
> > +		/* Bulk APIs enqueue exact number of elements */
> > +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > +			num_elems = MAX_BULK - 3;
> > +		else
> > +			num_elems = MAX_BULK;
> > +		/* Always one free entry left */
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], num_elems,
> > +						ret, api_type);
> > +		if (ret != MAX_BULK - 3)
> >  			goto fail;
> > -	}
> > -
> > -	/* Available memory space for the exact MAX_BULK objects */
> > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> > -	cur_src += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK - 3);
> >
> > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > -	cur_src += MAX_BULK - 3;
> > -	if (ret != MAX_BULK - 3)
> > -		goto fail;
> > +		printf("Test if ring is full\n");
> > +		if (rte_ring_full(r) != 1)
> > +			goto fail;
> >
> > +		printf("Test enqueue for a full entry\n");
> > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > +						ret, api_type);
> > +		if (ret != 0)
> > +			goto fail;
> >
> > -	printf("Test dequeue without enough objects \n");
> > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK,
> NULL);
> > -		cur_dst += MAX_BULK;
> > -		if (ret != MAX_BULK)
> > +		printf("Test dequeue without enough objects\n");
> > +		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
> > +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> > +							ret, api_type);
> > +			if (ret != MAX_BULK)
> > +				goto fail;
> > +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> > +		}
> > +
> > +		/* Available memory space for the exact MAX_BULK entries
> */
> > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> > +		if (ret != 2)
> >  			goto fail;
> > -	}
> > +		TEST_RING_INCP(cur_dst, esize[i], 2);
> > +
> > +		/* Bulk APIs enqueue exact number of elements */
> > +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > +			num_elems = MAX_BULK - 3;
> > +		else
> > +			num_elems = MAX_BULK;
> > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], num_elems,
> > +						ret, api_type);
> > +		if (ret != MAX_BULK - 3)
> > +			goto fail;
> > +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK - 3);
> >
> > -	/* Available objects - the exact MAX_BULK */
> > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> > -	cur_dst += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > +		printf("Test if ring is empty\n");
> > +		/* Check if ring is empty */
> > +		if (rte_ring_empty(r) != 1)
> > +			goto fail;
> >
> > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > -	cur_dst += MAX_BULK - 3;
> > -	if (ret != MAX_BULK - 3)
> > -		goto fail;
> > +		/* check data */
> > +		if (memcmp(src, dst, cur_dst - dst)) {
> > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > +			printf("data after dequeue is not the same\n");
> > +			goto fail;
> > +		}
> >
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > +		/* Free memory before test completed */
> > +		rte_ring_free(r);
> > +		rte_free(src);
> > +		rte_free(dst);
> >  	}
> >
> > -	cur_src = src;
> > -	cur_dst = dst;
> > -
> > -	printf("Covering rte_ring_enqueue_burst functions \n");
> > -
> > -	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
> > -	cur_src += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
> > -	cur_dst += 2;
> > -	if (ret != 2)
> > -		goto fail;
> > -
> > -	/* Free memory before test completed */
> > -	free(src);
> > -	free(dst);
> >  	return 0;
> > -
> > - fail:
> > -	free(src);
> > -	free(dst);
> > +fail:
> > +	rte_ring_free(r);
> > +	rte_free(src);
> > +	rte_free(dst);
> >  	return -1;
> >  }
> >
> > @@ -810,6 +773,7 @@ test_ring_with_exact_size(void)  static int
> >  test_ring(void)
> >  {
> > +	unsigned int i, j;
> >  	struct rte_ring *r = NULL;
> >
> >  	/* some more basic operations */
> > @@ -828,9 +792,11 @@ test_ring(void)
> >  		goto test_fail;
> >  	}
> >
> > -	/* burst operations */
> > -	if (test_ring_burst_basic(r) < 0)
> > -		goto test_fail;
> > +	/* Burst and bulk operations with sp/sc, mp/mc and default */
> > +	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
> > +		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
> > +			if (test_ring_burst_bulk_tests(i | j) < 0)
> > +				goto test_fail;
> >
> >  	/* basic operations */
> >  	if (test_ring_basic(r) < 0)
> > diff --git a/app/test/test_ring.h b/app/test/test_ring.h new file mode
> > 100644 index 000000000..19ef1b399
> > --- /dev/null
> > +++ b/app/test/test_ring.h
> > @@ -0,0 +1,203 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2019 Arm Limited
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_ring.h>
> > +#include <rte_ring_elem.h>
> > +
> > +/* API type to call
> > + * N - Calls default APIs
> > + * S - Calls SP or SC API
> > + * M - Calls MP or MC API
> > + */
> > +#define TEST_RING_N 1
> > +#define TEST_RING_S 2
> > +#define TEST_RING_M 4
> > +
> > +/* API type to call
> > + * SL - Calls single element APIs
> > + * BL - Calls bulk APIs
> > + * BR - Calls burst APIs
> > + */
> > +#define TEST_RING_SL 8
> > +#define TEST_RING_BL 16
> > +#define TEST_RING_BR 32
> > +
> > +#define TEST_RING_IGNORE_API_TYPE ~0U
> > +
> > +#define TEST_RING_INCP(obj, esize, n) do { \
> > +	/* Legacy queue APIs? */ \
> > +	if ((esize) == -1) \
> > +		obj = ((void **)obj) + n; \
> > +	else \
> > +		obj = (void **)(((uint32_t *)obj) + \
> > +					(n * esize / sizeof(uint32_t))); \ }
> while (0)
> > +
> > +#define TEST_RING_CREATE(name, esize, count, socket_id, flags, r) do { \
> > +	/* Legacy queue APIs? */ \
> > +	if ((esize) == -1) \
> > +		r = rte_ring_create((name), (count), (socket_id), (flags)); \
> > +	else \
> > +		r = rte_ring_create_elem((name), (esize), (count), \
> > +						(socket_id), (flags)); \
> > +} while (0)
> > +
> > +#define TEST_RING_ENQUEUE(r, obj, esize, n, ret, api_type) do { \
> > +	/* Legacy queue APIs? */ \
> > +	if ((esize) == -1) \
> > +		switch (api_type) { \
> > +		case (TEST_RING_N | TEST_RING_SL): \
> > +			ret = rte_ring_enqueue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_SL): \
> > +			ret = rte_ring_sp_enqueue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_SL): \
> > +			ret = rte_ring_mp_enqueue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BL): \
> > +			ret = rte_ring_enqueue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BL): \
> > +			ret = rte_ring_sp_enqueue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BL): \
> > +			ret = rte_ring_mp_enqueue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BR): \
> > +			ret = rte_ring_enqueue_burst(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BR): \
> > +			ret = rte_ring_sp_enqueue_burst(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BR): \
> > +			ret = rte_ring_mp_enqueue_burst(r, obj, n, NULL); \
> > +		} \
> > +	else \
> > +		switch (api_type) { \
> > +		case (TEST_RING_N | TEST_RING_SL): \
> > +			ret = rte_ring_enqueue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_SL): \
> > +			ret = rte_ring_sp_enqueue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_SL): \
> > +			ret = rte_ring_mp_enqueue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BL): \
> > +			ret = rte_ring_enqueue_bulk_elem(r, obj, esize, n, \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BL): \
> > +			ret = rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BL): \
> > +			ret = rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BR): \
> > +			ret = rte_ring_enqueue_burst_elem(r, obj, esize, n, \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BR): \
> > +			ret = rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BR): \
> > +			ret = rte_ring_mp_enqueue_burst_elem(r, obj, esize,
> n, \
> > +								NULL); \
> > +		} \
> > +} while (0)
> > +
> > +#define TEST_RING_DEQUEUE(r, obj, esize, n, ret, api_type) do { \
> > +	/* Legacy queue APIs? */ \
> > +	if ((esize) == -1) \
> > +		switch (api_type) { \
> > +		case (TEST_RING_N | TEST_RING_SL): \
> > +			ret = rte_ring_dequeue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_SL): \
> > +			ret = rte_ring_sc_dequeue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_SL): \
> > +			ret = rte_ring_mc_dequeue(r, obj); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BL): \
> > +			ret = rte_ring_dequeue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BL): \
> > +			ret = rte_ring_sc_dequeue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BL): \
> > +			ret = rte_ring_mc_dequeue_bulk(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BR): \
> > +			ret = rte_ring_dequeue_burst(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BR): \
> > +			ret = rte_ring_sc_dequeue_burst(r, obj, n, NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BR): \
> > +			ret = rte_ring_mc_dequeue_burst(r, obj, n, NULL); \
> > +		} \
> > +	else \
> > +		switch (api_type) { \
> > +		case (TEST_RING_N | TEST_RING_SL): \
> > +			ret = rte_ring_dequeue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_SL): \
> > +			ret = rte_ring_sc_dequeue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_SL): \
> > +			ret = rte_ring_mc_dequeue_elem(r, obj, esize); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BL): \
> > +			ret = rte_ring_dequeue_bulk_elem(r, obj, esize, n, \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BL): \
> > +			ret = rte_ring_sc_dequeue_bulk_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BL): \
> > +			ret = rte_ring_mc_dequeue_bulk_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_N | TEST_RING_BR): \
> > +			ret = rte_ring_dequeue_burst_elem(r, obj, esize, n, \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_S | TEST_RING_BR): \
> > +			ret = rte_ring_sc_dequeue_burst_elem(r, obj, esize, n,
> \
> > +								NULL); \
> > +			break; \
> > +		case (TEST_RING_M | TEST_RING_BR): \
> > +			ret = rte_ring_mc_dequeue_burst_elem(r, obj, esize,
> n, \
> > +								NULL); \
> > +		} \
> > +} while (0)
> 
> 
> My thought to avoid test-code duplication was a bit different.
Yes, this can be done multiple ways. My implementation is not complicated either.

> Instead of adding extra enums/parameters and then do switch on them, my
The switch statement should be removed by the compiler for the performance tests.

> intention was something like that:
> 
> 1. mv  test_ring_perf.c test_ring_perf.h 2. Inside test_ring_perf.h change
> rte_ring_ create/enqueue/dequeue function
>     calls to some not-defined function/macros invocations.
>    With similar name, same number of parameters, and same semantics.
>    Also change 'void *burst[..]' to 'RING_ELEM[...]'; 3. For each test
> configuration we want to have (default, 4B, 8B, 16B)
>     create a new .c file where we:
>     - define used in test_ring_perf.h macros(/function)
>    - include test_ring_perf.h
>    -  REGISTER_TEST_COMMAND(<test_name>, test_ring_perf);
> 
> As an example:
> test_ring_perf.h:
> ...
> static int
> enqueue_bulk(void *p)
> {
>         ...
>         RING_ELEM burst[MAX_BURST];
> 
>         memset(burst, 0, sizeof(burst));
>         ....
>         const uint64_t sp_start = rte_rdtsc();
>         for (i = 0; i < iterations; i++)
>                 while (RING_SP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
>                         rte_pause();
>         const uint64_t sp_end = rte_rdtsc();
> 
>         const uint64_t mp_start = rte_rdtsc();
>         for (i = 0; i < iterations; i++)
>                 while (RING_MP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
>                         rte_pause();
>         const uint64_t mp_end = rte_rdtsc();
>         ....
> 
> Then in test_ring_perf.c:
> 
> ....
> #define RING_ELEM	void *
> ...
> #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
>        rte_ring_sp_enqueue_bulk(ring, buf, size, spc) ....
> 
> #include "test_ring_perf.h"
> REGISTER_TEST_COMMAND(ring_perf_autotest, test_ring_perf);
> 
> 
> In test_ring_elem16B_perf.c:
> ....
> #define RING_ELEM	__uint128_t
> #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> spc) ....
> #include "test_ring_perf.h"
> REGISTER_TEST_COMMAND(ring_perf_elem16B_autotest, test_ring_perf);
> 
> In test_ring_elem4B_per.c:
> 
> ....
> #define RING_ELEM	uint32_t
> #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> spc) ....
> #include "test_ring_perf.h"
> REGISTER_TEST_COMMAND(ring_perf_elem4B_autotest, test_ring_perf);
> 
> And so on.
> 
> > +
> > +/* This function is placed here as it is required for both
> > + * performance and functional tests.
> > + */
> > +static __rte_always_inline void *
> > +test_ring_calloc(unsigned int rsize, int esize) {
> > +	unsigned int sz;
> > +	void *p;
> > +
> > +	/* Legacy queue APIs? */
> > +	if (esize == -1)
> > +		sz = sizeof(void *);
> > +	else
> > +		sz = esize;
> > +
> > +	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
> > +	if (p == NULL)
> > +		printf("Failed to allocate memory\n");
> > +
> > +	return p;
> > +}
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-02 16:42       ` Ananyev, Konstantin
@ 2020-01-07  5:35         ` Honnappa Nagarahalli
  2020-01-07  6:00           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  5:35 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>

> > diff --git a/lib/librte_ring/rte_ring_elem.h
> > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > 000000000..fc7fe127c
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -0,0 +1,1002 @@

<snip>

> > +
> > +static __rte_always_inline void
> > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> > +		const void *obj_table, uint32_t n)
> > +{
> > +	unsigned int i;
> > +	const uint32_t size = r->size;
> > +	uint32_t idx = prod_head & r->mask;
> > +	__uint128_t *ring = (__uint128_t *)&r[1];
> > +	const __uint128_t *obj = (const __uint128_t *)obj_table;
> > +	if (likely(idx + n < size)) {
> > +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2) {
> > +			ring[idx] = obj[i];
> > +			ring[idx + 1] = obj[i + 1];
> 
> 
> AFAIK, that implies 16B aligned obj_table...
> Would it always be the case?
I am not sure from the compiler perspective.
At least on Arm architecture, unaligned access (address that is accessed is not aligned to the size of the data element being accessed) will result in faults or require additional cycles. So, aligning on 16B should be fine.

> 
> > +		}
> > +		switch (n & 0x1) {
> > +		case 1:
> > +			ring[idx++] = obj[i++];
> > +		}
> > +	} else {
> > +		for (i = 0; idx < size; i++, idx++)
> > +			ring[idx] = obj[i];
> > +		/* Start at the beginning */
> > +		for (idx = 0; i < n; i++, idx++)
> > +			ring[idx] = obj[i];
> > +	}
> > +}
> > +
> > +/* the actual enqueue of elements on the ring.
> > + * Placed here since identical code needed in both
> > + * single and multi producer enqueue functions.
> > + */
> > +static __rte_always_inline void
> > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> *obj_table,
> > +		uint32_t esize, uint32_t num)
> > +{
> > +	uint32_t idx, nr_idx, nr_num;
> > +
> > +	/* 8B and 16B copies implemented individually to retain
> > +	 * the current performance.
> > +	 */
> > +	if (esize == 8)
> > +		enqueue_elems_64(r, prod_head, obj_table, num);
> > +	else if (esize == 16)
> > +		enqueue_elems_128(r, prod_head, obj_table, num);
> > +	else {
> > +		/* Normalize to uint32_t */
> > +		uint32_t scale = esize / sizeof(uint32_t);
> > +		nr_num = num * scale;
> > +		idx = prod_head & r->mask;
> > +		nr_idx = idx * scale;
> > +		enqueue_elems_32(r, nr_idx, obj_table, nr_num);
> > +	}
> > +}
> > +

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst enq/deq perf test cases
  2020-01-02 16:57       ` Ananyev, Konstantin
@ 2020-01-07  5:42         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  5:42 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd

<snip>

> >
> > Add test cases to test legacy and rte_ring_xxx_elem APIs for burst
> > enqueue/dequeue test cases.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring_perf.c | 78
> > ++++++++++++++++++++-------------------
> >  1 file changed, 40 insertions(+), 38 deletions(-)
> >
> > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > index 5829718c1..508c688dc 100644
> > --- a/app/test/test_ring_perf.c
> > +++ b/app/test/test_ring_perf.c
> > @@ -397,47 +397,40 @@ test_single_enqueue_dequeue(struct rte_ring *r,
> > const int esize,  }
> >
> >  /*
> > - * Test that does both enqueue and dequeue on a core using the
> > burst() API calls
> > - * instead of the bulk() calls used in other tests. Results should be
> > the same
> > - * as for the bulk function called on a single lcore.
> > + * Test that does both enqueue and dequeue on a core using the
> > + burst/bulk API
> > + * calls Results should be the same as for the bulk function called
> > + on a
> > + * single lcore.
> >   */
> > -static void
> > -test_burst_enqueue_dequeue(struct rte_ring *r)
> > +static int
> > +test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
> > +	const unsigned int api_type)
> >  {
> > -	const unsigned iter_shift = 23;
> > -	const unsigned iterations = 1<<iter_shift;
> > -	unsigned sz, i = 0;
> > -	void *burst[MAX_BURST] = {0};
> > +	int ret;
> > +	const unsigned int iter_shift = 23;
> > +	const unsigned int iterations = 1 << iter_shift;
> > +	unsigned int sz, i = 0;
> > +	void **burst = NULL;
> >
> > -	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
> > -		const uint64_t sc_start = rte_rdtsc();
> > -		for (i = 0; i < iterations; i++) {
> > -			rte_ring_sp_enqueue_burst(r, burst,
> > -					bulk_sizes[sz], NULL);
> > -			rte_ring_sc_dequeue_burst(r, burst,
> > -					bulk_sizes[sz], NULL);
> > -		}
> > -		const uint64_t sc_end = rte_rdtsc();
> > +	(void)ret;
> > +	burst = test_ring_calloc(MAX_BURST, esize);
> > +	if (burst == NULL)
> > +		return -1;
> >
> > -		const uint64_t mc_start = rte_rdtsc();
> > +	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
> > +		const uint64_t start = rte_rdtsc();
> >  		for (i = 0; i < iterations; i++) {
> > -			rte_ring_mp_enqueue_burst(r, burst,
> > -					bulk_sizes[sz], NULL);
> > -			rte_ring_mc_dequeue_burst(r, burst,
> > -					bulk_sizes[sz], NULL);
> > +			TEST_RING_ENQUEUE(r, burst, esize, bulk_sizes[sz],
> > +						ret, api_type);
> > +			TEST_RING_DEQUEUE(r, burst, esize, bulk_sizes[sz],
> > +						ret, api_type);
> >  		}
> > -		const uint64_t mc_end = rte_rdtsc();
> > -
> > -		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
> > -					bulk_sizes[sz];
> > -		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
> > -					bulk_sizes[sz];
> > +		const uint64_t end = rte_rdtsc();
> >
> > -		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
> > -				bulk_sizes[sz], sc_avg);
> > -		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
> > -				bulk_sizes[sz], mc_avg);
> > +		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
> > +					((double)(end - start)) / iterations);
> >  	}
> > +
> 
> missing rte_free(burst);
> ?
Yes, will fix.

> 
> > +	return 0;
> >  }
> >

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores perf test cases
  2020-01-02 17:00       ` Ananyev, Konstantin
@ 2020-01-07  5:42         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  5:42 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd

<snip>

> 
> >
> > Adjust run-on-all-cores test case to use legacy APIs.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring_perf.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > index b893b5779..fb95e4f2c 100644
> > --- a/app/test/test_ring_perf.c
> > +++ b/app/test/test_ring_perf.c
> > @@ -520,6 +520,9 @@ test_ring_perf(void)
> >  					dequeue_bulk) < 0)
> >  			return -1;
> >  	}
> > +	printf("\n### Testing using all slave nodes ###\n");
> > +	if (run_on_all_cores(r) < 0)
> > +		return -1;
> >  	rte_ring_free(r);
> >
> >  	TEST_RING_CREATE(RING_NAME, 16, RING_SIZE, rte_socket_id(), 0,
> r);
> > @@ -567,9 +570,6 @@ test_ring_perf(void)
> >  			return -1;
> >  	}
> >
> > -	printf("\n### Testing using all slave nodes ###\n");
> > -	run_on_all_cores(r);
> > -
> 
> No run_on_all_cores() for 16B elems case?
Ok, I can add.

> 
> >  	rte_ring_free(r);
> >
> >  	return 0;
> >

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases
  2020-01-02 17:03       ` Ananyev, Konstantin
@ 2020-01-07  5:54         ` Honnappa Nagarahalli
  2020-01-07 16:13           ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  5:54 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>

> 
> >
> > Add test cases to test rte_ring_xxx_elem APIs for single element
> > enqueue/dequeue test cases.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring_perf.c | 100
> > ++++++++++++++++++++++++++++++--------
> >  1 file changed, 80 insertions(+), 20 deletions(-)
> >
> > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > index 6c2aca483..5829718c1 100644
> > --- a/app/test/test_ring_perf.c
> > +++ b/app/test/test_ring_perf.c
> > @@ -13,6 +13,7 @@
> >  #include <string.h>
> >
> >  #include "test.h"
> > +#include "test_ring.h"
> >
> >  /*
> >   * Ring
> > @@ -41,6 +42,35 @@ struct lcore_pair {
> >
> >  static volatile unsigned lcore_count = 0;
> >
> > +static void
> > +test_ring_print_test_string(unsigned int api_type, int esize,
> > +	unsigned int bsz, double value)
> > +{
> > +	if (esize == -1)
> > +		printf("legacy APIs");
> > +	else
> > +		printf("elem APIs: element size %dB", esize);
> > +
> > +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> > +		return;
> > +
> > +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> > +		printf(": default enqueue/dequeue: ");
> > +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> > +		printf(": SP/SC: ");
> > +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> > +		printf(": MP/MC: ");
> > +
> > +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> > +		printf("single: ");
> > +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > +		printf("bulk (size: %u): ", bsz);
> > +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> > +		printf("burst (size: %u): ", bsz);
> > +
> > +	printf("%.2F\n", value);
> > +}
> > +
> >  /**** Functions to analyse our core mask to get cores for different
> > tests ***/
> >
> >  static int
> > @@ -335,32 +365,35 @@ run_on_all_cores(struct rte_ring *r)
> >   * Test function that determines how long an enqueue + dequeue of a
> single item
> >   * takes on a single lcore. Result is for comparison with the bulk enq+deq.
> >   */
> > -static void
> > -test_single_enqueue_dequeue(struct rte_ring *r)
> > +static int
> > +test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
> > +	const unsigned int api_type)
> >  {
> > -	const unsigned iter_shift = 24;
> > -	const unsigned iterations = 1<<iter_shift;
> > -	unsigned i = 0;
> > +	int ret;
> > +	const unsigned int iter_shift = 24;
> > +	const unsigned int iterations = 1 << iter_shift;
> > +	unsigned int i = 0;
> >  	void *burst = NULL;
> >
> > -	const uint64_t sc_start = rte_rdtsc();
> > -	for (i = 0; i < iterations; i++) {
> > -		rte_ring_sp_enqueue(r, burst);
> > -		rte_ring_sc_dequeue(r, &burst);
> > -	}
> > -	const uint64_t sc_end = rte_rdtsc();
> > +	(void)ret;
> 
> Here, and in few other places, looks redundant.
The compiler throws an error since 'ret' is assigned a value, but it is not used.

> 
> > +	/* alloc dummy object pointers */
> > +	burst = test_ring_calloc(1, esize);
> > +	if (burst == NULL)
> > +		return -1;
> >
> > -	const uint64_t mc_start = rte_rdtsc();
> > +	const uint64_t start = rte_rdtsc();
> >  	for (i = 0; i < iterations; i++) {
> > -		rte_ring_mp_enqueue(r, burst);
> > -		rte_ring_mc_dequeue(r, &burst);
> > +		TEST_RING_ENQUEUE(r, burst, esize, 1, ret, api_type);
> > +		TEST_RING_DEQUEUE(r, burst, esize, 1, ret, api_type);
> >  	}
> > -	const uint64_t mc_end = rte_rdtsc();
> > +	const uint64_t end = rte_rdtsc();
> > +
> > +	test_ring_print_test_string(api_type, esize, 1,
> > +					((double)(end - start)) / iterations);
> > +
> > +	rte_free(burst);
> >
> > -	printf("SP/SC single enq/dequeue: %.2F\n",
> > -			((double)(sc_end-sc_start)) / iterations);
> > -	printf("MP/MC single enq/dequeue: %.2F\n",
> > -			((double)(mc_end-mc_start)) / iterations);
> > +	return 0;
> >  }
> >
> >  /*

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-07  5:35         ` Honnappa Nagarahalli
@ 2020-01-07  6:00           ` Honnappa Nagarahalli
  2020-01-07 10:21             ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07  6:00 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>
> > > +
> > > +static __rte_always_inline void
> > > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, const
> > > +void *obj_table, uint32_t n) { unsigned int i; const uint32_t size
> > > += r->size; uint32_t idx = prod_head & r->mask; __uint128_t *ring =
> > > +(__uint128_t *)&r[1]; const __uint128_t *obj = (const __uint128_t
> > > +*)obj_table; if (likely(idx + n < size)) { for (i = 0; i < (n &
> > > +~0x1); i += 2, idx += 2) { ring[idx] = obj[i]; ring[idx + 1] =
> > > +obj[i + 1];
> >
> >
> > AFAIK, that implies 16B aligned obj_table...
> > Would it always be the case?
> I am not sure from the compiler perspective.
> At least on Arm architecture, unaligned access (address that is accessed is not
> aligned to the size of the data element being accessed) will result in faults or
> require additional cycles. So, aligning on 16B should be fine.
Further, I would be changing this to use 'rte_int128_t' as '__uint128_t' is not defined on 32b systems.

> 
> >
> > > +}
> > > +switch (n & 0x1) {
> > > +case 1:
> > > +ring[idx++] = obj[i++];
> > > +}
> > > +} else {
> > > +for (i = 0; idx < size; i++, idx++) ring[idx] = obj[i];
> > > +/* Start at the beginning */
> > > +for (idx = 0; i < n; i++, idx++)
> > > +ring[idx] = obj[i];
> > > +}
> > > +}
> > > +
> > > +/* the actual enqueue of elements on the ring.
> > > + * Placed here since identical code needed in both
> > > + * single and multi producer enqueue functions.
> > > + */
> > > +static __rte_always_inline void
> > > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> > *obj_table,
> > > +uint32_t esize, uint32_t num)
> > > +{
> > > +uint32_t idx, nr_idx, nr_num;
> > > +
> > > +/* 8B and 16B copies implemented individually to retain
> > > + * the current performance.
> > > + */
> > > +if (esize == 8)
> > > +enqueue_elems_64(r, prod_head, obj_table, num); else if (esize ==
> > > +16) enqueue_elems_128(r, prod_head, obj_table, num); else {
> > > +/* Normalize to uint32_t */
> > > +uint32_t scale = esize / sizeof(uint32_t); nr_num = num * scale;
> > > +idx = prod_head & r->mask; nr_idx = idx * scale;
> > > +enqueue_elems_32(r, nr_idx, obj_table, nr_num); } }
> > > +


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-07  6:00           ` Honnappa Nagarahalli
@ 2020-01-07 10:21             ` Ananyev, Konstantin
  2020-01-07 15:21               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-07 10:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd


> <snip>
> > > > +
> > > > +static __rte_always_inline void
> > > > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, const
> > > > +void *obj_table, uint32_t n) { unsigned int i; const uint32_t size
> > > > += r->size; uint32_t idx = prod_head & r->mask; __uint128_t *ring =
> > > > +(__uint128_t *)&r[1]; const __uint128_t *obj = (const __uint128_t
> > > > +*)obj_table; if (likely(idx + n < size)) { for (i = 0; i < (n &
> > > > +~0x1); i += 2, idx += 2) { ring[idx] = obj[i]; ring[idx + 1] =
> > > > +obj[i + 1];
> > >
> > >
> > > AFAIK, that implies 16B aligned obj_table...
> > > Would it always be the case?
> > I am not sure from the compiler perspective.
> > At least on Arm architecture, unaligned access (address that is accessed is not
> > aligned to the size of the data element being accessed) will result in faults or
> > require additional cycles. So, aligning on 16B should be fine.
> Further, I would be changing this to use 'rte_int128_t' as '__uint128_t' is not defined on 32b systems.

What I am trying to say: with this code we imply new requirement for
elems in the ring: when sizeof(elem)==16 it's alignment also has to be at least 16.
Which from my perspective is not ideal.
Note that for elem sizes > 16 (24, 32), there is no such constraint.

> 
> >
> > >
> > > > +}
> > > > +switch (n & 0x1) {
> > > > +case 1:
> > > > +ring[idx++] = obj[i++];
> > > > +}
> > > > +} else {
> > > > +for (i = 0; idx < size; i++, idx++) ring[idx] = obj[i];
> > > > +/* Start at the beginning */
> > > > +for (idx = 0; i < n; i++, idx++)
> > > > +ring[idx] = obj[i];
> > > > +}
> > > > +}
> > > > +
> > > > +/* the actual enqueue of elements on the ring.
> > > > + * Placed here since identical code needed in both
> > > > + * single and multi producer enqueue functions.
> > > > + */
> > > > +static __rte_always_inline void
> > > > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> > > *obj_table,
> > > > +uint32_t esize, uint32_t num)
> > > > +{
> > > > +uint32_t idx, nr_idx, nr_num;
> > > > +
> > > > +/* 8B and 16B copies implemented individually to retain
> > > > + * the current performance.
> > > > + */
> > > > +if (esize == 8)
> > > > +enqueue_elems_64(r, prod_head, obj_table, num); else if (esize ==
> > > > +16) enqueue_elems_128(r, prod_head, obj_table, num); else {
> > > > +/* Normalize to uint32_t */
> > > > +uint32_t scale = esize / sizeof(uint32_t); nr_num = num * scale;
> > > > +idx = prod_head & r->mask; nr_idx = idx * scale;
> > > > +enqueue_elems_32(r, nr_idx, obj_table, nr_num); } }
> > > > +


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-07 10:21             ` Ananyev, Konstantin
@ 2020-01-07 15:21               ` Honnappa Nagarahalli
  2020-01-07 15:41                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07 15:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>
> > > > > +
> > > > > +static __rte_always_inline void enqueue_elems_128(struct
> > > > > +rte_ring *r, uint32_t prod_head, const void *obj_table,
> > > > > +uint32_t n) { unsigned int i; const uint32_t size = r->size;
> > > > > +uint32_t idx = prod_head & r->mask; __uint128_t *ring =
> > > > > +(__uint128_t *)&r[1]; const __uint128_t *obj = (const
> > > > > +__uint128_t *)obj_table; if (likely(idx + n < size)) { for (i =
> > > > > +0; i < (n & ~0x1); i += 2, idx += 2) { ring[idx] = obj[i];
> > > > > +ring[idx + 1] = obj[i + 1];
> > > >
> > > >
> > > > AFAIK, that implies 16B aligned obj_table...
> > > > Would it always be the case?
> > > I am not sure from the compiler perspective.
> > > At least on Arm architecture, unaligned access (address that is
> > > accessed is not aligned to the size of the data element being
> > > accessed) will result in faults or require additional cycles. So, aligning on
> 16B should be fine.
> > Further, I would be changing this to use 'rte_int128_t' as '__uint128_t' is
> not defined on 32b systems.
> 
> What I am trying to say: with this code we imply new requirement for elems
The only existing use case in DPDK for 16B is the event ring. The event ring already does similar kind of copy (using 'struct rte_event'). So, there is no change in expectations for event ring.
For future code, I think this expectation should be fine since it allows for optimal code.

> in the ring: when sizeof(elem)==16 it's alignment also has to be at least 16.
> Which from my perspective is not ideal.
Any reasoning?

> Note that for elem sizes > 16 (24, 32), there is no such constraint.
The rest of them need to be aligned on 4B boundary. However, this should not affect the existing code.
The code for 8B and 16B is kept as is to ensure the performance is not affected for the existing code.

> 
> >
> > >
> > > >
> > > > > +}
> > > > > +switch (n & 0x1) {
> > > > > +case 1:
> > > > > +ring[idx++] = obj[i++];
> > > > > +}
> > > > > +} else {
> > > > > +for (i = 0; idx < size; i++, idx++) ring[idx] = obj[i];
> > > > > +/* Start at the beginning */
> > > > > +for (idx = 0; i < n; i++, idx++) ring[idx] = obj[i]; } }
> > > > > +
> > > > > +/* the actual enqueue of elements on the ring.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi producer enqueue functions.
> > > > > + */
> > > > > +static __rte_always_inline void enqueue_elems(struct rte_ring
> > > > > +*r, uint32_t prod_head, const void
> > > > *obj_table,
> > > > > +uint32_t esize, uint32_t num)
> > > > > +{
> > > > > +uint32_t idx, nr_idx, nr_num;
> > > > > +
> > > > > +/* 8B and 16B copies implemented individually to retain
> > > > > + * the current performance.
> > > > > + */
> > > > > +if (esize == 8)
> > > > > +enqueue_elems_64(r, prod_head, obj_table, num); else if (esize
> > > > > +==
> > > > > +16) enqueue_elems_128(r, prod_head, obj_table, num); else {
> > > > > +/* Normalize to uint32_t */
> > > > > +uint32_t scale = esize / sizeof(uint32_t); nr_num = num *
> > > > > +scale; idx = prod_head & r->mask; nr_idx = idx * scale;
> > > > > +enqueue_elems_32(r, nr_idx, obj_table, nr_num); } }
> > > > > +


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-07 15:21               ` Honnappa Nagarahalli
@ 2020-01-07 15:41                 ` Ananyev, Konstantin
  2020-01-08  6:17                   ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-07 15:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd


> <snip>
> > > > > > +
> > > > > > +static __rte_always_inline void enqueue_elems_128(struct
> > > > > > +rte_ring *r, uint32_t prod_head, const void *obj_table,
> > > > > > +uint32_t n) { unsigned int i; const uint32_t size = r->size;
> > > > > > +uint32_t idx = prod_head & r->mask; __uint128_t *ring =
> > > > > > +(__uint128_t *)&r[1]; const __uint128_t *obj = (const
> > > > > > +__uint128_t *)obj_table; if (likely(idx + n < size)) { for (i =
> > > > > > +0; i < (n & ~0x1); i += 2, idx += 2) { ring[idx] = obj[i];
> > > > > > +ring[idx + 1] = obj[i + 1];
> > > > >
> > > > >
> > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > Would it always be the case?
> > > > I am not sure from the compiler perspective.
> > > > At least on Arm architecture, unaligned access (address that is
> > > > accessed is not aligned to the size of the data element being
> > > > accessed) will result in faults or require additional cycles. So, aligning on
> > 16B should be fine.
> > > Further, I would be changing this to use 'rte_int128_t' as '__uint128_t' is
> > not defined on 32b systems.
> >
> > What I am trying to say: with this code we imply new requirement for elems
> The only existing use case in DPDK for 16B is the event ring. The event ring already does similar kind of copy (using 'struct rte_event').
> So, there is no change in expectations for event ring.
> For future code, I think this expectation should be fine since it allows for optimal code.
> 
> > in the ring: when sizeof(elem)==16 it's alignment also has to be at least 16.
> > Which from my perspective is not ideal.
> Any reasoning?

New implicit requirement and inconsistency.
Code like that:

struct ring_elem {uint64_t a, b;};
....
struct ring_elem elem; 
rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
 
might cause a crash.
While exactly the same code with:

struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t a, b, c, d;};

will work ok.

> 
> > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> The rest of them need to be aligned on 4B boundary. However, this should not affect the existing code.
> The code for 8B and 16B is kept as is to ensure the performance is not affected for the existing code.
> 
> >
> > >
> > > >
> > > > >
> > > > > > +}
> > > > > > +switch (n & 0x1) {
> > > > > > +case 1:
> > > > > > +ring[idx++] = obj[i++];
> > > > > > +}
> > > > > > +} else {
> > > > > > +for (i = 0; idx < size; i++, idx++) ring[idx] = obj[i];
> > > > > > +/* Start at the beginning */
> > > > > > +for (idx = 0; i < n; i++, idx++) ring[idx] = obj[i]; } }
> > > > > > +
> > > > > > +/* the actual enqueue of elements on the ring.
> > > > > > + * Placed here since identical code needed in both
> > > > > > + * single and multi producer enqueue functions.
> > > > > > + */
> > > > > > +static __rte_always_inline void enqueue_elems(struct rte_ring
> > > > > > +*r, uint32_t prod_head, const void
> > > > > *obj_table,
> > > > > > +uint32_t esize, uint32_t num)
> > > > > > +{
> > > > > > +uint32_t idx, nr_idx, nr_num;
> > > > > > +
> > > > > > +/* 8B and 16B copies implemented individually to retain
> > > > > > + * the current performance.
> > > > > > + */
> > > > > > +if (esize == 8)
> > > > > > +enqueue_elems_64(r, prod_head, obj_table, num); else if (esize
> > > > > > +==
> > > > > > +16) enqueue_elems_128(r, prod_head, obj_table, num); else {
> > > > > > +/* Normalize to uint32_t */
> > > > > > +uint32_t scale = esize / sizeof(uint32_t); nr_num = num *
> > > > > > +scale; idx = prod_head & r->mask; nr_idx = idx * scale;
> > > > > > +enqueue_elems_32(r, nr_idx, obj_table, nr_num); } }
> > > > > > +


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-07  5:13         ` Honnappa Nagarahalli
@ 2020-01-07 16:03           ` Ananyev, Konstantin
  2020-01-09  5:15             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-07 16:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd

> > > Add basic infrastructure to test rte_ring_xxx_elem APIs. Add test
> > > cases for testing burst and bulk tests.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> > >  app/test/test_ring.c | 466
> > > ++++++++++++++++++++-----------------------
> > >  app/test/test_ring.h | 203 +++++++++++++++++++
> > >  2 files changed, 419 insertions(+), 250 deletions(-)  create mode
> > > 100644 app/test/test_ring.h
> > >
> > > diff --git a/app/test/test_ring.c b/app/test/test_ring.c index
> > > aaf1e70ad..e7a8b468b 100644
> > > --- a/app/test/test_ring.c
> > > +++ b/app/test/test_ring.c
> > > @@ -23,11 +23,13 @@
> > >  #include <rte_branch_prediction.h>
> > >  #include <rte_malloc.h>
> > >  #include <rte_ring.h>
> > > +#include <rte_ring_elem.h>
> > >  #include <rte_random.h>
> > >  #include <rte_errno.h>
> > >  #include <rte_hexdump.h>
> > >
> > >  #include "test.h"
> > > +#include "test_ring.h"
> > >
> > >  /*
> > >   * Ring
> > > @@ -67,6 +69,50 @@ static rte_atomic32_t synchro;
> > >
> > >  #define	TEST_RING_FULL_EMTPY_ITER	8
> > >
> > > +static int esize[] = {-1, 4, 8, 16};
> > > +
> > > +static void
> > > +test_ring_mem_init(void *obj, unsigned int count, int esize) {
> > > +	unsigned int i;
> > > +
> > > +	/* Legacy queue APIs? */
> > > +	if (esize == -1)
> > > +		for (i = 0; i < count; i++)
> > > +			((void **)obj)[i] = (void *)(unsigned long)i;
> > > +	else
> > > +		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
> > > +			((uint32_t *)obj)[i] = i;
> > > +}
> > > +
> > > +static void
> > > +test_ring_print_test_string(const char *istr, unsigned int api_type,
> > > +int esize) {
> > > +	printf("\n%s: ", istr);
> > > +
> > > +	if (esize == -1)
> > > +		printf("legacy APIs: ");
> > > +	else
> > > +		printf("elem APIs: element size %dB ", esize);
> > > +
> > > +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> > > +		return;
> > > +
> > > +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> > > +		printf(": default enqueue/dequeue: ");
> > > +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> > > +		printf(": SP/SC: ");
> > > +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> > > +		printf(": MP/MC: ");
> > > +
> > > +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> > > +		printf("single\n");
> > > +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > > +		printf("bulk\n");
> > > +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> > > +		printf("burst\n");
> > > +}
> > > +
> > >  /*
> > >   * helper routine for test_ring_basic
> > >   */
> > > @@ -314,286 +360,203 @@ test_ring_basic(struct rte_ring *r)
> > >  	return -1;
> > >  }
> > >
> > > +/*
> > > + * Burst and bulk operations with sp/sc, mp/mc and default (during
> > > +creation)  */
> > >  static int
> > > -test_ring_burst_basic(struct rte_ring *r)
> > > +test_ring_burst_bulk_tests(unsigned int api_type)
> > >  {
> > > +	struct rte_ring *r;
> > >  	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
> > >  	int ret;
> > > -	unsigned i;
> > > +	unsigned int i, j;
> > > +	unsigned int num_elems;
> > >
> > > -	/* alloc dummy object pointers */
> > > -	src = malloc(RING_SIZE*2*sizeof(void *));
> > > -	if (src == NULL)
> > > -		goto fail;
> > > -
> > > -	for (i = 0; i < RING_SIZE*2 ; i++) {
> > > -		src[i] = (void *)(unsigned long)i;
> > > -	}
> > > -	cur_src = src;
> > > +	for (i = 0; i < RTE_DIM(esize); i++) {
> > > +		test_ring_print_test_string("Test standard ring", api_type,
> > > +						esize[i]);
> > >
> > > -	/* alloc some room for copied objects */
> > > -	dst = malloc(RING_SIZE*2*sizeof(void *));
> > > -	if (dst == NULL)
> > > -		goto fail;
> > > +		/* Create the ring */
> > > +		TEST_RING_CREATE("test_ring_burst_bulk_tests", esize[i],
> > > +					RING_SIZE, SOCKET_ID_ANY, 0, r);
> > >
> > > -	memset(dst, 0, RING_SIZE*2*sizeof(void *));
> > > -	cur_dst = dst;
> > > -
> > > -	printf("Test SP & SC basic functions \n");
> > > -	printf("enqueue 1 obj\n");
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
> > > -	cur_src += 1;
> > > -	if (ret != 1)
> > > -		goto fail;
> > > -
> > > -	printf("enqueue 2 objs\n");
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> > > -	cur_src += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	printf("enqueue MAX_BULK objs\n");
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > > -	cur_src += MAX_BULK;
> > > -	if (ret != MAX_BULK)
> > > -		goto fail;
> > > -
> > > -	printf("dequeue 1 obj\n");
> > > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
> > > -	cur_dst += 1;
> > > -	if (ret != 1)
> > > -		goto fail;
> > > -
> > > -	printf("dequeue 2 objs\n");
> > > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> > > -	cur_dst += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > +		/* alloc dummy object pointers */
> > > +		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > > +		if (src == NULL)
> > > +			goto fail;
> > > +		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
> > > +		cur_src = src;
> > >
> > > -	printf("dequeue MAX_BULK objs\n");
> > > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > > -	cur_dst += MAX_BULK;
> > > -	if (ret != MAX_BULK)
> > > -		goto fail;
> > > +		/* alloc some room for copied objects */
> > > +		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > > +		if (dst == NULL)
> > > +			goto fail;
> > > +		cur_dst = dst;
> > >
> > > -	/* check data */
> > > -	if (memcmp(src, dst, cur_dst - dst)) {
> > > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > -		printf("data after dequeue is not the same\n");
> > > -		goto fail;
> > > -	}
> > > +		printf("enqueue 1 obj\n");
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 1, ret, api_type);
> > > +		if (ret != 1)
> > > +			goto fail;
> > > +		TEST_RING_INCP(cur_src, esize[i], 1);
> > >
> > > -	cur_src = src;
> > > -	cur_dst = dst;
> > > +		printf("enqueue 2 objs\n");
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> > > +		if (ret != 2)
> > > +			goto fail;
> > > +		TEST_RING_INCP(cur_src, esize[i], 2);
> > >
> > > -	printf("Test enqueue without enough memory space \n");
> > > -	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
> > > -		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK,
> > NULL);
> > > -		cur_src += MAX_BULK;
> > > +		printf("enqueue MAX_BULK objs\n");
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK, ret,
> > > +						api_type);
> > >  		if (ret != MAX_BULK)
> > >  			goto fail;
> > > -	}
> > > -
> > > -	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
> > > -	cur_src += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> > >
> > > -	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
> > > -	/* Always one free entry left */
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > > -	cur_src += MAX_BULK - 3;
> > > -	if (ret != MAX_BULK - 3)
> > > -		goto fail;
> > > -
> > > -	printf("Test if ring is full  \n");
> > > -	if (rte_ring_full(r) != 1)
> > > -		goto fail;
> > > +		printf("dequeue 1 obj\n");
> > > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 1, ret, api_type);
> > > +		if (ret != 1)
> > > +			goto fail;
> > > +		TEST_RING_INCP(cur_dst, esize[i], 1);
> > >
> > > -	printf("Test enqueue for a full entry  \n");
> > > -	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > > -	if (ret != 0)
> > > -		goto fail;
> > > +		printf("dequeue 2 objs\n");
> > > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> > > +		if (ret != 2)
> > > +			goto fail;
> > > +		TEST_RING_INCP(cur_dst, esize[i], 2);
> > >
> > > -	printf("Test dequeue without enough objects \n");
> > > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > > -		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK,
> > NULL);
> > > -		cur_dst += MAX_BULK;
> > > +		printf("dequeue MAX_BULK objs\n");
> > > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK, ret,
> > > +						api_type);
> > >  		if (ret != MAX_BULK)
> > >  			goto fail;
> > > -	}
> > > -
> > > -	/* Available memory space for the exact MAX_BULK entries */
> > > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
> > > -	cur_dst += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > > -	cur_dst += MAX_BULK - 3;
> > > -	if (ret != MAX_BULK - 3)
> > > -		goto fail;
> > > -
> > > -	printf("Test if ring is empty \n");
> > > -	/* Check if ring is empty */
> > > -	if (1 != rte_ring_empty(r))
> > > -		goto fail;
> > > -
> > > -	/* check data */
> > > -	if (memcmp(src, dst, cur_dst - dst)) {
> > > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > -		printf("data after dequeue is not the same\n");
> > > -		goto fail;
> > > -	}
> > > +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> > >
> > > -	cur_src = src;
> > > -	cur_dst = dst;
> > > -
> > > -	printf("Test MP & MC basic functions \n");
> > > -
> > > -	printf("enqueue 1 obj\n");
> > > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
> > > -	cur_src += 1;
> > > -	if (ret != 1)
> > > -		goto fail;
> > > -
> > > -	printf("enqueue 2 objs\n");
> > > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> > > -	cur_src += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	printf("enqueue MAX_BULK objs\n");
> > > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > > -	cur_src += MAX_BULK;
> > > -	if (ret != MAX_BULK)
> > > -		goto fail;
> > > -
> > > -	printf("dequeue 1 obj\n");
> > > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
> > > -	cur_dst += 1;
> > > -	if (ret != 1)
> > > -		goto fail;
> > > -
> > > -	printf("dequeue 2 objs\n");
> > > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> > > -	cur_dst += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	printf("dequeue MAX_BULK objs\n");
> > > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > > -	cur_dst += MAX_BULK;
> > > -	if (ret != MAX_BULK)
> > > -		goto fail;
> > > -
> > > -	/* check data */
> > > -	if (memcmp(src, dst, cur_dst - dst)) {
> > > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > -		printf("data after dequeue is not the same\n");
> > > -		goto fail;
> > > -	}
> > > -
> > > -	cur_src = src;
> > > -	cur_dst = dst;
> > > +		/* check data */
> > > +		if (memcmp(src, dst, cur_dst - dst)) {
> > > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > +			printf("data after dequeue is not the same\n");
> > > +			goto fail;
> > > +		}
> > > +
> > > +		cur_src = src;
> > > +		cur_dst = dst;
> > > +
> > > +		printf("fill and empty the ring\n");
> > > +		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
> > > +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > > +							ret, api_type);
> > > +			if (ret != MAX_BULK)
> > > +				goto fail;
> > > +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> > > +
> > > +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> > > +							ret, api_type);
> > > +			if (ret != MAX_BULK)
> > > +				goto fail;
> > > +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> > > +		}
> > >
> > > -	printf("fill and empty the ring\n");
> > > -	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
> > > -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK,
> > NULL);
> > > -		cur_src += MAX_BULK;
> > > -		if (ret != MAX_BULK)
> > > +		/* check data */
> > > +		if (memcmp(src, dst, cur_dst - dst)) {
> > > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > +			printf("data after dequeue is not the same\n");
> > >  			goto fail;
> > > -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK,
> > NULL);
> > > -		cur_dst += MAX_BULK;
> > > -		if (ret != MAX_BULK)
> > > +		}
> > > +
> > > +		cur_src = src;
> > > +		cur_dst = dst;
> > > +
> > > +		printf("Test enqueue without enough memory space\n");
> > > +		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
> > > +			TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > > +							ret, api_type);
> > > +			if (ret != MAX_BULK)
> > > +				goto fail;
> > > +			TEST_RING_INCP(cur_src, esize[i], MAX_BULK);
> > > +		}
> > > +
> > > +		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], 2, ret, api_type);
> > > +		if (ret != 2)
> > >  			goto fail;
> > > -	}
> > > -
> > > -	/* check data */
> > > -	if (memcmp(src, dst, cur_dst - dst)) {
> > > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > -		printf("data after dequeue is not the same\n");
> > > -		goto fail;
> > > -	}
> > > -
> > > -	cur_src = src;
> > > -	cur_dst = dst;
> > > -
> > > -	printf("Test enqueue without enough memory space \n");
> > > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > > -		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK,
> > NULL);
> > > -		cur_src += MAX_BULK;
> > > -		if (ret != MAX_BULK)
> > > +		TEST_RING_INCP(cur_src, esize[i], 2);
> > > +
> > > +
> > > +		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
> > > +		/* Bulk APIs enqueue exact number of elements */
> > > +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > > +			num_elems = MAX_BULK - 3;
> > > +		else
> > > +			num_elems = MAX_BULK;
> > > +		/* Always one free entry left */
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], num_elems,
> > > +						ret, api_type);
> > > +		if (ret != MAX_BULK - 3)
> > >  			goto fail;
> > > -	}
> > > -
> > > -	/* Available memory space for the exact MAX_BULK objects */
> > > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
> > > -	cur_src += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > +		TEST_RING_INCP(cur_src, esize[i], MAX_BULK - 3);
> > >
> > > -	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
> > > -	cur_src += MAX_BULK - 3;
> > > -	if (ret != MAX_BULK - 3)
> > > -		goto fail;
> > > +		printf("Test if ring is full\n");
> > > +		if (rte_ring_full(r) != 1)
> > > +			goto fail;
> > >
> > > +		printf("Test enqueue for a full entry\n");
> > > +		TEST_RING_ENQUEUE(r, cur_src, esize[i], MAX_BULK,
> > > +						ret, api_type);
> > > +		if (ret != 0)
> > > +			goto fail;
> > >
> > > -	printf("Test dequeue without enough objects \n");
> > > -	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
> > > -		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK,
> > NULL);
> > > -		cur_dst += MAX_BULK;
> > > -		if (ret != MAX_BULK)
> > > +		printf("Test dequeue without enough objects\n");
> > > +		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
> > > +			TEST_RING_DEQUEUE(r, cur_dst, esize[i], MAX_BULK,
> > > +							ret, api_type);
> > > +			if (ret != MAX_BULK)
> > > +				goto fail;
> > > +			TEST_RING_INCP(cur_dst, esize[i], MAX_BULK);
> > > +		}
> > > +
> > > +		/* Available memory space for the exact MAX_BULK entries
> > */
> > > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], 2, ret, api_type);
> > > +		if (ret != 2)
> > >  			goto fail;
> > > -	}
> > > +		TEST_RING_INCP(cur_dst, esize[i], 2);
> > > +
> > > +		/* Bulk APIs enqueue exact number of elements */
> > > +		if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > > +			num_elems = MAX_BULK - 3;
> > > +		else
> > > +			num_elems = MAX_BULK;
> > > +		TEST_RING_DEQUEUE(r, cur_dst, esize[i], num_elems,
> > > +						ret, api_type);
> > > +		if (ret != MAX_BULK - 3)
> > > +			goto fail;
> > > +		TEST_RING_INCP(cur_dst, esize[i], MAX_BULK - 3);
> > >
> > > -	/* Available objects - the exact MAX_BULK */
> > > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
> > > -	cur_dst += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > +		printf("Test if ring is empty\n");
> > > +		/* Check if ring is empty */
> > > +		if (rte_ring_empty(r) != 1)
> > > +			goto fail;
> > >
> > > -	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
> > > -	cur_dst += MAX_BULK - 3;
> > > -	if (ret != MAX_BULK - 3)
> > > -		goto fail;
> > > +		/* check data */
> > > +		if (memcmp(src, dst, cur_dst - dst)) {
> > > +			rte_hexdump(stdout, "src", src, cur_src - src);
> > > +			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > +			printf("data after dequeue is not the same\n");
> > > +			goto fail;
> > > +		}
> > >
> > > -	/* check data */
> > > -	if (memcmp(src, dst, cur_dst - dst)) {
> > > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > > -		printf("data after dequeue is not the same\n");
> > > -		goto fail;
> > > +		/* Free memory before test completed */
> > > +		rte_ring_free(r);
> > > +		rte_free(src);
> > > +		rte_free(dst);
> > >  	}
> > >
> > > -	cur_src = src;
> > > -	cur_dst = dst;
> > > -
> > > -	printf("Covering rte_ring_enqueue_burst functions \n");
> > > -
> > > -	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
> > > -	cur_src += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
> > > -	cur_dst += 2;
> > > -	if (ret != 2)
> > > -		goto fail;
> > > -
> > > -	/* Free memory before test completed */
> > > -	free(src);
> > > -	free(dst);
> > >  	return 0;
> > > -
> > > - fail:
> > > -	free(src);
> > > -	free(dst);
> > > +fail:
> > > +	rte_ring_free(r);
> > > +	rte_free(src);
> > > +	rte_free(dst);
> > >  	return -1;
> > >  }
> > >
> > > @@ -810,6 +773,7 @@ test_ring_with_exact_size(void)  static int
> > >  test_ring(void)
> > >  {
> > > +	unsigned int i, j;
> > >  	struct rte_ring *r = NULL;
> > >
> > >  	/* some more basic operations */
> > > @@ -828,9 +792,11 @@ test_ring(void)
> > >  		goto test_fail;
> > >  	}
> > >
> > > -	/* burst operations */
> > > -	if (test_ring_burst_basic(r) < 0)
> > > -		goto test_fail;
> > > +	/* Burst and bulk operations with sp/sc, mp/mc and default */
> > > +	for (j = TEST_RING_BL; j <= TEST_RING_BR; j <<= 1)
> > > +		for (i = TEST_RING_N; i <= TEST_RING_M; i <<= 1)
> > > +			if (test_ring_burst_bulk_tests(i | j) < 0)
> > > +				goto test_fail;
> > >
> > >  	/* basic operations */
> > >  	if (test_ring_basic(r) < 0)
> > > diff --git a/app/test/test_ring.h b/app/test/test_ring.h new file mode
> > > 100644 index 000000000..19ef1b399
> > > --- /dev/null
> > > +++ b/app/test/test_ring.h
> > > @@ -0,0 +1,203 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2019 Arm Limited
> > > + */
> > > +
> > > +#include <rte_malloc.h>
> > > +#include <rte_ring.h>
> > > +#include <rte_ring_elem.h>
> > > +
> > > +/* API type to call
> > > + * N - Calls default APIs
> > > + * S - Calls SP or SC API
> > > + * M - Calls MP or MC API
> > > + */
> > > +#define TEST_RING_N 1
> > > +#define TEST_RING_S 2
> > > +#define TEST_RING_M 4
> > > +
> > > +/* API type to call
> > > + * SL - Calls single element APIs
> > > + * BL - Calls bulk APIs
> > > + * BR - Calls burst APIs
> > > + */
> > > +#define TEST_RING_SL 8
> > > +#define TEST_RING_BL 16
> > > +#define TEST_RING_BR 32
> > > +
> > > +#define TEST_RING_IGNORE_API_TYPE ~0U
> > > +
> > > +#define TEST_RING_INCP(obj, esize, n) do { \
> > > +	/* Legacy queue APIs? */ \
> > > +	if ((esize) == -1) \
> > > +		obj = ((void **)obj) + n; \
> > > +	else \
> > > +		obj = (void **)(((uint32_t *)obj) + \
> > > +					(n * esize / sizeof(uint32_t))); \ }
> > while (0)
> > > +
> > > +#define TEST_RING_CREATE(name, esize, count, socket_id, flags, r) do { \
> > > +	/* Legacy queue APIs? */ \
> > > +	if ((esize) == -1) \
> > > +		r = rte_ring_create((name), (count), (socket_id), (flags)); \
> > > +	else \
> > > +		r = rte_ring_create_elem((name), (esize), (count), \
> > > +						(socket_id), (flags)); \
> > > +} while (0)
> > > +
> > > +#define TEST_RING_ENQUEUE(r, obj, esize, n, ret, api_type) do { \
> > > +	/* Legacy queue APIs? */ \
> > > +	if ((esize) == -1) \
> > > +		switch (api_type) { \
> > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > +			ret = rte_ring_enqueue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > +			ret = rte_ring_sp_enqueue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > +			ret = rte_ring_mp_enqueue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > +			ret = rte_ring_enqueue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > +			ret = rte_ring_sp_enqueue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > +			ret = rte_ring_mp_enqueue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > +			ret = rte_ring_enqueue_burst(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > +			ret = rte_ring_sp_enqueue_burst(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > +			ret = rte_ring_mp_enqueue_burst(r, obj, n, NULL); \
> > > +		} \
> > > +	else \
> > > +		switch (api_type) { \
> > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > +			ret = rte_ring_enqueue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > +			ret = rte_ring_sp_enqueue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > +			ret = rte_ring_mp_enqueue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > +			ret = rte_ring_enqueue_bulk_elem(r, obj, esize, n, \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > +			ret = rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > +			ret = rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > +			ret = rte_ring_enqueue_burst_elem(r, obj, esize, n, \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > +			ret = rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > +			ret = rte_ring_mp_enqueue_burst_elem(r, obj, esize,
> > n, \
> > > +								NULL); \
> > > +		} \
> > > +} while (0)
> > > +
> > > +#define TEST_RING_DEQUEUE(r, obj, esize, n, ret, api_type) do { \
> > > +	/* Legacy queue APIs? */ \
> > > +	if ((esize) == -1) \
> > > +		switch (api_type) { \
> > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > +			ret = rte_ring_dequeue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > +			ret = rte_ring_sc_dequeue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > +			ret = rte_ring_mc_dequeue(r, obj); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > +			ret = rte_ring_dequeue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > +			ret = rte_ring_sc_dequeue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > +			ret = rte_ring_mc_dequeue_bulk(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > +			ret = rte_ring_dequeue_burst(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > +			ret = rte_ring_sc_dequeue_burst(r, obj, n, NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > +			ret = rte_ring_mc_dequeue_burst(r, obj, n, NULL); \
> > > +		} \
> > > +	else \
> > > +		switch (api_type) { \
> > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > +			ret = rte_ring_dequeue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > +			ret = rte_ring_sc_dequeue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > +			ret = rte_ring_mc_dequeue_elem(r, obj, esize); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > +			ret = rte_ring_dequeue_bulk_elem(r, obj, esize, n, \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > +			ret = rte_ring_sc_dequeue_bulk_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > +			ret = rte_ring_mc_dequeue_bulk_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > +			ret = rte_ring_dequeue_burst_elem(r, obj, esize, n, \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > +			ret = rte_ring_sc_dequeue_burst_elem(r, obj, esize, n,
> > \
> > > +								NULL); \
> > > +			break; \
> > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > +			ret = rte_ring_mc_dequeue_burst_elem(r, obj, esize,
> > n, \
> > > +								NULL); \
> > > +		} \
> > > +} while (0)
> >
> >
> > My thought to avoid test-code duplication was a bit different.
> Yes, this can be done multiple ways. My implementation is not complicated either.
> 
> > Instead of adding extra enums/parameters and then do switch on them, my
> The switch statement should be removed by the compiler for the performance tests.

I am sure the compiler will do its job properly.
My concern is that with all these extra flags, it is really hard to
read and understand what exactly function we are calling and what we are trying to test.
Might be just me, but let say in original version for enqueue_bulk() we have:

        const uint64_t sp_start = rte_rdtsc();
        for (i = 0; i < iterations; i++)
                while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
                        rte_pause();
        const uint64_t sp_end = rte_rdtsc();

        const uint64_t mp_start = rte_rdtsc();
        for (i = 0; i < iterations; i++)
                while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
                        rte_pause();
        const uint64_t mp_end = rte_rdtsc();

Simple and easy to understand.
Same code after the patch doesn't that straightforward anymore:

 const uint64_t sp_start = rte_rdtsc();
        for (i = 0; i < iterations; i++)
                do {
                        if (flag == 0)
                                TEST_RING_ENQUEUE(r, burst, esize, bsize, ret,
                                                TEST_RING_S | TEST_RING_BL);
                        else if (flag == 1)
                                TEST_RING_DEQUEUE(r, burst, esize, bsize, ret,
                                                TEST_RING_S | TEST_RING_BL);
                        if (ret == 0)
                                rte_pause();
                } while (!ret);
 const uint64_t sp_end = rte_rdtsc();

Another thing - if tomorrow we'll want to add perf tests
for elem_size==4/8, etc. - we'll need to do copy/paste
for all test-case invocations, as you did for 16B
(or some code reorg). 

> 
> > intention was something like that:
> >
> > 1. mv  test_ring_perf.c test_ring_perf.h 2. Inside test_ring_perf.h change
> > rte_ring_ create/enqueue/dequeue function
> >     calls to some not-defined function/macros invocations.
> >    With similar name, same number of parameters, and same semantics.
> >    Also change 'void *burst[..]' to 'RING_ELEM[...]'; 3. For each test
> > configuration we want to have (default, 4B, 8B, 16B)
> >     create a new .c file where we:
> >     - define used in test_ring_perf.h macros(/function)
> >    - include test_ring_perf.h
> >    -  REGISTER_TEST_COMMAND(<test_name>, test_ring_perf);
> >
> > As an example:
> > test_ring_perf.h:
> > ...
> > static int
> > enqueue_bulk(void *p)
> > {
> >         ...
> >         RING_ELEM burst[MAX_BURST];
> >
> >         memset(burst, 0, sizeof(burst));
> >         ....
> >         const uint64_t sp_start = rte_rdtsc();
> >         for (i = 0; i < iterations; i++)
> >                 while (RING_SP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
> >                         rte_pause();
> >         const uint64_t sp_end = rte_rdtsc();
> >
> >         const uint64_t mp_start = rte_rdtsc();
> >         for (i = 0; i < iterations; i++)
> >                 while (RING_MP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
> >                         rte_pause();
> >         const uint64_t mp_end = rte_rdtsc();
> >         ....
> >
> > Then in test_ring_perf.c:
> >
> > ....
> > #define RING_ELEM	void *
> > ...
> > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> >        rte_ring_sp_enqueue_bulk(ring, buf, size, spc) ....
> >
> > #include "test_ring_perf.h"
> > REGISTER_TEST_COMMAND(ring_perf_autotest, test_ring_perf);
> >
> >
> > In test_ring_elem16B_perf.c:
> > ....
> > #define RING_ELEM	__uint128_t
> > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> > 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> > spc) ....
> > #include "test_ring_perf.h"
> > REGISTER_TEST_COMMAND(ring_perf_elem16B_autotest, test_ring_perf);
> >
> > In test_ring_elem4B_per.c:
> >
> > ....
> > #define RING_ELEM	uint32_t
> > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> > 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> > spc) ....
> > #include "test_ring_perf.h"
> > REGISTER_TEST_COMMAND(ring_perf_elem4B_autotest, test_ring_perf);
> >
> > And so on.
> >
> > > +
> > > +/* This function is placed here as it is required for both
> > > + * performance and functional tests.
> > > + */
> > > +static __rte_always_inline void *
> > > +test_ring_calloc(unsigned int rsize, int esize) {
> > > +	unsigned int sz;
> > > +	void *p;
> > > +
> > > +	/* Legacy queue APIs? */
> > > +	if (esize == -1)
> > > +		sz = sizeof(void *);
> > > +	else
> > > +		sz = esize;
> > > +
> > > +	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
> > > +	if (p == NULL)
> > > +		printf("Failed to allocate memory\n");
> > > +
> > > +	return p;
> > > +}
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases
  2020-01-07  5:54         ` Honnappa Nagarahalli
@ 2020-01-07 16:13           ` Ananyev, Konstantin
  2020-01-07 22:33             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-07 16:13 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd


> > > Add test cases to test rte_ring_xxx_elem APIs for single element
> > > enqueue/dequeue test cases.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> > >  app/test/test_ring_perf.c | 100
> > > ++++++++++++++++++++++++++++++--------
> > >  1 file changed, 80 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > > index 6c2aca483..5829718c1 100644
> > > --- a/app/test/test_ring_perf.c
> > > +++ b/app/test/test_ring_perf.c
> > > @@ -13,6 +13,7 @@
> > >  #include <string.h>
> > >
> > >  #include "test.h"
> > > +#include "test_ring.h"
> > >
> > >  /*
> > >   * Ring
> > > @@ -41,6 +42,35 @@ struct lcore_pair {
> > >
> > >  static volatile unsigned lcore_count = 0;
> > >
> > > +static void
> > > +test_ring_print_test_string(unsigned int api_type, int esize,
> > > +	unsigned int bsz, double value)
> > > +{
> > > +	if (esize == -1)
> > > +		printf("legacy APIs");
> > > +	else
> > > +		printf("elem APIs: element size %dB", esize);
> > > +
> > > +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> > > +		return;
> > > +
> > > +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> > > +		printf(": default enqueue/dequeue: ");
> > > +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> > > +		printf(": SP/SC: ");
> > > +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> > > +		printf(": MP/MC: ");
> > > +
> > > +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> > > +		printf("single: ");
> > > +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > > +		printf("bulk (size: %u): ", bsz);
> > > +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> > > +		printf("burst (size: %u): ", bsz);
> > > +
> > > +	printf("%.2F\n", value);
> > > +}
> > > +
> > >  /**** Functions to analyse our core mask to get cores for different
> > > tests ***/
> > >
> > >  static int
> > > @@ -335,32 +365,35 @@ run_on_all_cores(struct rte_ring *r)
> > >   * Test function that determines how long an enqueue + dequeue of a
> > single item
> > >   * takes on a single lcore. Result is for comparison with the bulk enq+deq.
> > >   */
> > > -static void
> > > -test_single_enqueue_dequeue(struct rte_ring *r)
> > > +static int
> > > +test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
> > > +	const unsigned int api_type)
> > >  {
> > > -	const unsigned iter_shift = 24;
> > > -	const unsigned iterations = 1<<iter_shift;
> > > -	unsigned i = 0;
> > > +	int ret;
> > > +	const unsigned int iter_shift = 24;
> > > +	const unsigned int iterations = 1 << iter_shift;
> > > +	unsigned int i = 0;
> > >  	void *burst = NULL;
> > >
> > > -	const uint64_t sc_start = rte_rdtsc();
> > > -	for (i = 0; i < iterations; i++) {
> > > -		rte_ring_sp_enqueue(r, burst);
> > > -		rte_ring_sc_dequeue(r, &burst);
> > > -	}
> > > -	const uint64_t sc_end = rte_rdtsc();
> > > +	(void)ret;
> >
> > Here, and in few other places, looks redundant.
> The compiler throws an error since 'ret' is assigned a value, but it is not used.

Probably one way to change  TEST_RING_ENQUEUE() from macro
to inline-function returning ret.  

> 
> >
> > > +	/* alloc dummy object pointers */
> > > +	burst = test_ring_calloc(1, esize);
> > > +	if (burst == NULL)
> > > +		return -1;
> > >
> > > -	const uint64_t mc_start = rte_rdtsc();
> > > +	const uint64_t start = rte_rdtsc();
> > >  	for (i = 0; i < iterations; i++) {
> > > -		rte_ring_mp_enqueue(r, burst);
> > > -		rte_ring_mc_dequeue(r, &burst);
> > > +		TEST_RING_ENQUEUE(r, burst, esize, 1, ret, api_type);
> > > +		TEST_RING_DEQUEUE(r, burst, esize, 1, ret, api_type);
> > >  	}
> > > -	const uint64_t mc_end = rte_rdtsc();
> > > +	const uint64_t end = rte_rdtsc();
> > > +
> > > +	test_ring_print_test_string(api_type, esize, 1,
> > > +					((double)(end - start)) / iterations);
> > > +
> > > +	rte_free(burst);
> > >
> > > -	printf("SP/SC single enq/dequeue: %.2F\n",
> > > -			((double)(sc_end-sc_start)) / iterations);
> > > -	printf("MP/MC single enq/dequeue: %.2F\n",
> > > -			((double)(mc_end-mc_start)) / iterations);
> > > +	return 0;
> > >  }
> > >
> > >  /*

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases
  2020-01-07 16:13           ` Ananyev, Konstantin
@ 2020-01-07 22:33             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-07 22:33 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>

> 
> > > > Add test cases to test rte_ring_xxx_elem APIs for single element
> > > > enqueue/dequeue test cases.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
> > > >  app/test/test_ring_perf.c | 100
> > > > ++++++++++++++++++++++++++++++--------
> > > >  1 file changed, 80 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > > > index 6c2aca483..5829718c1 100644
> > > > --- a/app/test/test_ring_perf.c
> > > > +++ b/app/test/test_ring_perf.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include <string.h>
> > > >
> > > >  #include "test.h"
> > > > +#include "test_ring.h"
> > > >
> > > >  /*
> > > >   * Ring
> > > > @@ -41,6 +42,35 @@ struct lcore_pair {
> > > >
> > > >  static volatile unsigned lcore_count = 0;
> > > >
> > > > +static void
> > > > +test_ring_print_test_string(unsigned int api_type, int esize,
> > > > +	unsigned int bsz, double value)
> > > > +{
> > > > +	if (esize == -1)
> > > > +		printf("legacy APIs");
> > > > +	else
> > > > +		printf("elem APIs: element size %dB", esize);
> > > > +
> > > > +	if (api_type == TEST_RING_IGNORE_API_TYPE)
> > > > +		return;
> > > > +
> > > > +	if ((api_type & TEST_RING_N) == TEST_RING_N)
> > > > +		printf(": default enqueue/dequeue: ");
> > > > +	else if ((api_type & TEST_RING_S) == TEST_RING_S)
> > > > +		printf(": SP/SC: ");
> > > > +	else if ((api_type & TEST_RING_M) == TEST_RING_M)
> > > > +		printf(": MP/MC: ");
> > > > +
> > > > +	if ((api_type & TEST_RING_SL) == TEST_RING_SL)
> > > > +		printf("single: ");
> > > > +	else if ((api_type & TEST_RING_BL) == TEST_RING_BL)
> > > > +		printf("bulk (size: %u): ", bsz);
> > > > +	else if ((api_type & TEST_RING_BR) == TEST_RING_BR)
> > > > +		printf("burst (size: %u): ", bsz);
> > > > +
> > > > +	printf("%.2F\n", value);
> > > > +}
> > > > +
> > > >  /**** Functions to analyse our core mask to get cores for
> > > > different tests ***/
> > > >
> > > >  static int
> > > > @@ -335,32 +365,35 @@ run_on_all_cores(struct rte_ring *r)
> > > >   * Test function that determines how long an enqueue + dequeue of
> > > > a
> > > single item
> > > >   * takes on a single lcore. Result is for comparison with the bulk
> enq+deq.
> > > >   */
> > > > -static void
> > > > -test_single_enqueue_dequeue(struct rte_ring *r)
> > > > +static int
> > > > +test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
> > > > +	const unsigned int api_type)
> > > >  {
> > > > -	const unsigned iter_shift = 24;
> > > > -	const unsigned iterations = 1<<iter_shift;
> > > > -	unsigned i = 0;
> > > > +	int ret;
> > > > +	const unsigned int iter_shift = 24;
> > > > +	const unsigned int iterations = 1 << iter_shift;
> > > > +	unsigned int i = 0;
> > > >  	void *burst = NULL;
> > > >
> > > > -	const uint64_t sc_start = rte_rdtsc();
> > > > -	for (i = 0; i < iterations; i++) {
> > > > -		rte_ring_sp_enqueue(r, burst);
> > > > -		rte_ring_sc_dequeue(r, &burst);
> > > > -	}
> > > > -	const uint64_t sc_end = rte_rdtsc();
> > > > +	(void)ret;
> > >
> > > Here, and in few other places, looks redundant.
> > The compiler throws an error since 'ret' is assigned a value, but it is not
> used.
> 
> Probably one way to change  TEST_RING_ENQUEUE() from macro to inline-
> function returning ret.
> 
Yes, that is possible, will do.

> >
> > >
<snip>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-07 15:41                 ` Ananyev, Konstantin
@ 2020-01-08  6:17                   ` Honnappa Nagarahalli
  2020-01-08 10:05                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-08  6:17 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>
> > > > > > > +
> > > > > > > +static __rte_always_inline void enqueue_elems_128(struct
> > > > > > > +rte_ring *r, uint32_t prod_head, const void *obj_table,
> > > > > > > +uint32_t n) { unsigned int i; const uint32_t size =
> > > > > > > +r->size; uint32_t idx = prod_head & r->mask; __uint128_t
> > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t *obj =
> > > > > > > +(const __uint128_t *)obj_table; if (likely(idx + n < size))
> > > > > > > +{ for (i = 0; i < (n & ~0x1); i += 2, idx += 2) { ring[idx]
> > > > > > > += obj[i]; ring[idx + 1] = obj[i + 1];
> > > > > >
> > > > > >
> > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > Would it always be the case?
> > > > > I am not sure from the compiler perspective.
> > > > > At least on Arm architecture, unaligned access (address that is
> > > > > accessed is not aligned to the size of the data element being
> > > > > accessed) will result in faults or require additional cycles.
> > > > > So, aligning on
> > > 16B should be fine.
> > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > '__uint128_t' is
> > > not defined on 32b systems.
> > >
> > > What I am trying to say: with this code we imply new requirement for
> > > elems
> > The only existing use case in DPDK for 16B is the event ring. The event ring
> already does similar kind of copy (using 'struct rte_event').
> > So, there is no change in expectations for event ring.
> > For future code, I think this expectation should be fine since it allows for
> optimal code.
> >
> > > in the ring: when sizeof(elem)==16 it's alignment also has to be at least
> 16.
> > > Which from my perspective is not ideal.
> > Any reasoning?
> 
> New implicit requirement and inconsistency.
> Code like that:
> 
> struct ring_elem {uint64_t a, b;};
> ....
> struct ring_elem elem;
> rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> 
> might cause a crash.
The alignment here is 8B. Assuming that instructions generated will require 16B alignment, it will result in a crash, if configured to generate exception.
But, these instructions are not atomic instructions. At least on aarch64, unaligned access will not result in an exception for non-atomic loads/stores. I believe it is the same behavior for x86 as well.

> While exactly the same code with:
> 
> struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t a, b, c, d;};
> 
> will work ok.
The alignment for these structures is still 8B. Are you saying this will work because these will be copied using pointer to uint32_t (whose alignment is 4B)?

> 
> >
> > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > The rest of them need to be aligned on 4B boundary. However, this should
> not affect the existing code.
> > The code for 8B and 16B is kept as is to ensure the performance is not
> affected for the existing code.
<snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-08  6:17                   ` Honnappa Nagarahalli
@ 2020-01-08 10:05                     ` Ananyev, Konstantin
  2020-01-08 23:40                       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-08 10:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd



> <snip>
> > > > > > > > +
> > > > > > > > +static __rte_always_inline void enqueue_elems_128(struct
> > > > > > > > +rte_ring *r, uint32_t prod_head, const void *obj_table,
> > > > > > > > +uint32_t n) { unsigned int i; const uint32_t size =
> > > > > > > > +r->size; uint32_t idx = prod_head & r->mask; __uint128_t
> > > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t *obj =
> > > > > > > > +(const __uint128_t *)obj_table; if (likely(idx + n < size))
> > > > > > > > +{ for (i = 0; i < (n & ~0x1); i += 2, idx += 2) { ring[idx]
> > > > > > > > += obj[i]; ring[idx + 1] = obj[i + 1];
> > > > > > >
> > > > > > >
> > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > Would it always be the case?
> > > > > > I am not sure from the compiler perspective.
> > > > > > At least on Arm architecture, unaligned access (address that is
> > > > > > accessed is not aligned to the size of the data element being
> > > > > > accessed) will result in faults or require additional cycles.
> > > > > > So, aligning on
> > > > 16B should be fine.
> > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > '__uint128_t' is
> > > > not defined on 32b systems.
> > > >
> > > > What I am trying to say: with this code we imply new requirement for
> > > > elems
> > > The only existing use case in DPDK for 16B is the event ring. The event ring
> > already does similar kind of copy (using 'struct rte_event').
> > > So, there is no change in expectations for event ring.
> > > For future code, I think this expectation should be fine since it allows for
> > optimal code.
> > >
> > > > in the ring: when sizeof(elem)==16 it's alignment also has to be at least
> > 16.
> > > > Which from my perspective is not ideal.
> > > Any reasoning?
> >
> > New implicit requirement and inconsistency.
> > Code like that:
> >
> > struct ring_elem {uint64_t a, b;};
> > ....
> > struct ring_elem elem;
> > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> >
> > might cause a crash.
> The alignment here is 8B. Assuming that instructions generated will require 16B alignment, it will result in a crash, if configured to generate
> exception.
> But, these instructions are not atomic instructions. At least on aarch64, unaligned access will not result in an exception for non-atomic
> loads/stores. I believe it is the same behavior for x86 as well.

On IA, there are 2 types of 16B load/store instructions: aligned and unaligned.
Aligned are a bit faster, but will cause an exception if used on non 16B aligned address. 
As you using uint128_t * compiler will assume that both src and dst are 16B aligned
and might generate code with aligned instructions.

> 
> > While exactly the same code with:
> >
> > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t a, b, c, d;};
> >
> > will work ok.
> The alignment for these structures is still 8B. Are you saying this will work because these will be copied using pointer to uint32_t (whose
> alignment is 4B)?

Yes, as we doing uint32_t copies, compiler can't assume the data will be 16B aligned
and will use unaligned instructions.

> 
> >
> > >
> > > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > > The rest of them need to be aligned on 4B boundary. However, this should
> > not affect the existing code.
> > > The code for 8B and 16B is kept as is to ensure the performance is not
> > affected for the existing code.
> <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-08 10:05                     ` Ananyev, Konstantin
@ 2020-01-08 23:40                       ` Honnappa Nagarahalli
  2020-01-09  0:48                         ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-08 23:40 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>
> > > > > > > > > +
> > > > > > > > > +static __rte_always_inline void
> > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t
> > > > > > > > > +prod_head, const void *obj_table, uint32_t n) {
> > > > > > > > > +unsigned int i; const uint32_t size =
> > > > > > > > > +r->size; uint32_t idx = prod_head & r->mask;
> > > > > > > > > +r->__uint128_t
> > > > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t *obj =
> > > > > > > > > +(const __uint128_t *)obj_table; if (likely(idx + n <
> > > > > > > > > +size)) { for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> > > > > > > > > +{ ring[idx] = obj[i]; ring[idx + 1] = obj[i + 1];
> > > > > > > >
> > > > > > > >
> > > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > > Would it always be the case?
> > > > > > > I am not sure from the compiler perspective.
> > > > > > > At least on Arm architecture, unaligned access (address that
> > > > > > > is accessed is not aligned to the size of the data element
> > > > > > > being
> > > > > > > accessed) will result in faults or require additional cycles.
> > > > > > > So, aligning on
> > > > > 16B should be fine.
> > > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > > '__uint128_t' is
> > > > > not defined on 32b systems.
> > > > >
> > > > > What I am trying to say: with this code we imply new requirement
> > > > > for elems
> > > > The only existing use case in DPDK for 16B is the event ring. The
> > > > event ring
> > > already does similar kind of copy (using 'struct rte_event').
> > > > So, there is no change in expectations for event ring.
> > > > For future code, I think this expectation should be fine since it
> > > > allows for
> > > optimal code.
> > > >
> > > > > in the ring: when sizeof(elem)==16 it's alignment also has to be
> > > > > at least
> > > 16.
> > > > > Which from my perspective is not ideal.
> > > > Any reasoning?
> > >
> > > New implicit requirement and inconsistency.
> > > Code like that:
> > >
> > > struct ring_elem {uint64_t a, b;};
> > > ....
> > > struct ring_elem elem;
> > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> > >
> > > might cause a crash.
> > The alignment here is 8B. Assuming that instructions generated will
> > require 16B alignment, it will result in a crash, if configured to generate
> exception.
> > But, these instructions are not atomic instructions. At least on
> > aarch64, unaligned access will not result in an exception for non-atomic
> loads/stores. I believe it is the same behavior for x86 as well.
> 
> On IA, there are 2 types of 16B load/store instructions: aligned and unaligned.
> Aligned are a bit faster, but will cause an exception if used on non 16B aligned
> address.
> As you using uint128_t * compiler will assume that both src and dst are 16B
> aligned and might generate code with aligned instructions.
Ok, looking at few articles, I read that if the address is aligned, the unaligned instructions do not incur the penalty. Is this understanding correct?

I see 2 solutions here:
1) We can switch this copy to use uint32_t pointer. It would still allow the compiler to generate (unaligned) instructions for up to 256b load/store. The 2 multiplications (to normalize the index and the size of copy) can use shifts. This should make it safer. If one wants performance, they can align the obj table to 16B (the ring itself is already aligned on the cache line boundary).

2) Considering that performance is paramount, we could document that the obj table needs to be aligned on 16B boundary. This would affect event dev (if we go ahead with replacing the event ring implementation) significantly.

Note that we have to do the same thing for 64b elements as well.

> 
> >
> > > While exactly the same code with:
> > >
> > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t
> > > a, b, c, d;};
> > >
> > > will work ok.
> > The alignment for these structures is still 8B. Are you saying this
> > will work because these will be copied using pointer to uint32_t (whose
> alignment is 4B)?
> 
> Yes, as we doing uint32_t copies, compiler can't assume the data will be 16B
> aligned and will use unaligned instructions.
> 
> >
> > >
> > > >
> > > > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > > > The rest of them need to be aligned on 4B boundary. However, this
> > > > should
> > > not affect the existing code.
> > > > The code for 8B and 16B is kept as is to ensure the performance is
> > > > not
> > > affected for the existing code.
> > <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-08 23:40                       ` Honnappa Nagarahalli
@ 2020-01-09  0:48                         ` Ananyev, Konstantin
  2020-01-09 16:06                           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-09  0:48 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd



> <snip>
> > > > > > > > > > +
> > > > > > > > > > +static __rte_always_inline void
> > > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t
> > > > > > > > > > +prod_head, const void *obj_table, uint32_t n) {
> > > > > > > > > > +unsigned int i; const uint32_t size =
> > > > > > > > > > +r->size; uint32_t idx = prod_head & r->mask;
> > > > > > > > > > +r->__uint128_t
> > > > > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t *obj =
> > > > > > > > > > +(const __uint128_t *)obj_table; if (likely(idx + n <
> > > > > > > > > > +size)) { for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> > > > > > > > > > +{ ring[idx] = obj[i]; ring[idx + 1] = obj[i + 1];
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > > > Would it always be the case?
> > > > > > > > I am not sure from the compiler perspective.
> > > > > > > > At least on Arm architecture, unaligned access (address that
> > > > > > > > is accessed is not aligned to the size of the data element
> > > > > > > > being
> > > > > > > > accessed) will result in faults or require additional cycles.
> > > > > > > > So, aligning on
> > > > > > 16B should be fine.
> > > > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > > > '__uint128_t' is
> > > > > > not defined on 32b systems.
> > > > > >
> > > > > > What I am trying to say: with this code we imply new requirement
> > > > > > for elems
> > > > > The only existing use case in DPDK for 16B is the event ring. The
> > > > > event ring
> > > > already does similar kind of copy (using 'struct rte_event').
> > > > > So, there is no change in expectations for event ring.
> > > > > For future code, I think this expectation should be fine since it
> > > > > allows for
> > > > optimal code.
> > > > >
> > > > > > in the ring: when sizeof(elem)==16 it's alignment also has to be
> > > > > > at least
> > > > 16.
> > > > > > Which from my perspective is not ideal.
> > > > > Any reasoning?
> > > >
> > > > New implicit requirement and inconsistency.
> > > > Code like that:
> > > >
> > > > struct ring_elem {uint64_t a, b;};
> > > > ....
> > > > struct ring_elem elem;
> > > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> > > >
> > > > might cause a crash.
> > > The alignment here is 8B. Assuming that instructions generated will
> > > require 16B alignment, it will result in a crash, if configured to generate
> > exception.
> > > But, these instructions are not atomic instructions. At least on
> > > aarch64, unaligned access will not result in an exception for non-atomic
> > loads/stores. I believe it is the same behavior for x86 as well.
> >
> > On IA, there are 2 types of 16B load/store instructions: aligned and unaligned.
> > Aligned are a bit faster, but will cause an exception if used on non 16B aligned
> > address.
> > As you using uint128_t * compiler will assume that both src and dst are 16B
> > aligned and might generate code with aligned instructions.
> Ok, looking at few articles, I read that if the address is aligned, the unaligned instructions do not incur the penalty. Is this understanding
> correct?

Yes, from my experience the difference is negligible.

> 
> I see 2 solutions here:
> 1) We can switch this copy to use uint32_t pointer. It would still allow the compiler to generate (unaligned) instructions for up to 256b
> load/store. The 2 multiplications (to normalize the index and the size of copy) can use shifts. This should make it safer. If one wants
> performance, they can align the obj table to 16B (the ring itself is already aligned on the cache line boundary).

Sounds good to me.

> 
> 2) Considering that performance is paramount, we could document that the obj table needs to be aligned on 16B boundary. This would
> affect event dev (if we go ahead with replacing the event ring implementation) significantly.

I don't think perf difference would be that significant to justify such constraint.
I am in favor of #1.
 
> Note that we have to do the same thing for 64b elements as well.

I don't mind to have one unified copy procedure, which would always use 32bit elems,
but AFAIK, on IA there is no such limitation for 64bit load/stores.


> 
> >
> > >
> > > > While exactly the same code with:
> > > >
> > > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t
> > > > a, b, c, d;};
> > > >
> > > > will work ok.
> > > The alignment for these structures is still 8B. Are you saying this
> > > will work because these will be copied using pointer to uint32_t (whose
> > alignment is 4B)?
> >
> > Yes, as we doing uint32_t copies, compiler can't assume the data will be 16B
> > aligned and will use unaligned instructions.
> >
> > >
> > > >
> > > > >
> > > > > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > > > > The rest of them need to be aligned on 4B boundary. However, this
> > > > > should
> > > > not affect the existing code.
> > > > > The code for 8B and 16B is kept as is to ensure the performance is
> > > > > not
> > > > affected for the existing code.
> > > <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-07 16:03           ` Ananyev, Konstantin
@ 2020-01-09  5:15             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-09  5:15 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>

> Subject: RE: [PATCH v7 03/17] test/ring: add functional tests for
> rte_ring_xxx_elem APIs
> 
> > > > Add basic infrastructure to test rte_ring_xxx_elem APIs. Add test
> > > > cases for testing burst and bulk tests.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
<snip>

> > > > diff --git a/app/test/test_ring.h b/app/test/test_ring.h new file
> > > > mode
> > > > 100644 index 000000000..19ef1b399
> > > > --- /dev/null
> > > > +++ b/app/test/test_ring.h
> > > > @@ -0,0 +1,203 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright(c) 2019 Arm Limited
> > > > + */
> > > > +
> > > > +#include <rte_malloc.h>
> > > > +#include <rte_ring.h>
> > > > +#include <rte_ring_elem.h>
> > > > +
> > > > +/* API type to call
> > > > + * N - Calls default APIs
> > > > + * S - Calls SP or SC API
> > > > + * M - Calls MP or MC API
> > > > + */
> > > > +#define TEST_RING_N 1
> > > > +#define TEST_RING_S 2
> > > > +#define TEST_RING_M 4
> > > > +
> > > > +/* API type to call
> > > > + * SL - Calls single element APIs
> > > > + * BL - Calls bulk APIs
> > > > + * BR - Calls burst APIs
> > > > + */
> > > > +#define TEST_RING_SL 8
> > > > +#define TEST_RING_BL 16
> > > > +#define TEST_RING_BR 32
> > > > +
> > > > +#define TEST_RING_IGNORE_API_TYPE ~0U
> > > > +
> > > > +#define TEST_RING_INCP(obj, esize, n) do { \
> > > > +	/* Legacy queue APIs? */ \
> > > > +	if ((esize) == -1) \
> > > > +		obj = ((void **)obj) + n; \
> > > > +	else \
> > > > +		obj = (void **)(((uint32_t *)obj) + \
> > > > +					(n * esize / sizeof(uint32_t))); \ }
> > > while (0)
> > > > +
> > > > +#define TEST_RING_CREATE(name, esize, count, socket_id, flags, r) do
> { \
> > > > +	/* Legacy queue APIs? */ \
> > > > +	if ((esize) == -1) \
> > > > +		r = rte_ring_create((name), (count), (socket_id), (flags)); \
> > > > +	else \
> > > > +		r = rte_ring_create_elem((name), (esize), (count), \
> > > > +						(socket_id), (flags)); \
> > > > +} while (0)
> > > > +
> > > > +#define TEST_RING_ENQUEUE(r, obj, esize, n, ret, api_type) do { \
> > > > +	/* Legacy queue APIs? */ \
> > > > +	if ((esize) == -1) \
> > > > +		switch (api_type) { \
> > > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > > +			ret = rte_ring_enqueue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > > +			ret = rte_ring_sp_enqueue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > > +			ret = rte_ring_mp_enqueue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > > +			ret = rte_ring_enqueue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > > +			ret = rte_ring_sp_enqueue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > > +			ret = rte_ring_mp_enqueue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > > +			ret = rte_ring_enqueue_burst(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > > +			ret = rte_ring_sp_enqueue_burst(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > > +			ret = rte_ring_mp_enqueue_burst(r, obj, n, NULL); \
> > > > +		} \
> > > > +	else \
> > > > +		switch (api_type) { \
> > > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > > +			ret = rte_ring_enqueue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > > +			ret = rte_ring_sp_enqueue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > > +			ret = rte_ring_mp_enqueue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > > +			ret = rte_ring_enqueue_bulk_elem(r, obj, esize, n, \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > > +			ret = rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > > +			ret = rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > > +			ret = rte_ring_enqueue_burst_elem(r, obj, esize, n, \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > > +			ret = rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > > +			ret = rte_ring_mp_enqueue_burst_elem(r, obj, esize,
> > > n, \
> > > > +								NULL); \
> > > > +		} \
> > > > +} while (0)
> > > > +
> > > > +#define TEST_RING_DEQUEUE(r, obj, esize, n, ret, api_type) do { \
> > > > +	/* Legacy queue APIs? */ \
> > > > +	if ((esize) == -1) \
> > > > +		switch (api_type) { \
> > > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > > +			ret = rte_ring_dequeue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > > +			ret = rte_ring_sc_dequeue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > > +			ret = rte_ring_mc_dequeue(r, obj); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > > +			ret = rte_ring_dequeue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > > +			ret = rte_ring_sc_dequeue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > > +			ret = rte_ring_mc_dequeue_bulk(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > > +			ret = rte_ring_dequeue_burst(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > > +			ret = rte_ring_sc_dequeue_burst(r, obj, n, NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > > +			ret = rte_ring_mc_dequeue_burst(r, obj, n, NULL); \
> > > > +		} \
> > > > +	else \
> > > > +		switch (api_type) { \
> > > > +		case (TEST_RING_N | TEST_RING_SL): \
> > > > +			ret = rte_ring_dequeue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_SL): \
> > > > +			ret = rte_ring_sc_dequeue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_SL): \
> > > > +			ret = rte_ring_mc_dequeue_elem(r, obj, esize); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BL): \
> > > > +			ret = rte_ring_dequeue_bulk_elem(r, obj, esize, n, \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BL): \
> > > > +			ret = rte_ring_sc_dequeue_bulk_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BL): \
> > > > +			ret = rte_ring_mc_dequeue_bulk_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_N | TEST_RING_BR): \
> > > > +			ret = rte_ring_dequeue_burst_elem(r, obj, esize, n, \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_S | TEST_RING_BR): \
> > > > +			ret = rte_ring_sc_dequeue_burst_elem(r, obj, esize, n,
> > > \
> > > > +								NULL); \
> > > > +			break; \
> > > > +		case (TEST_RING_M | TEST_RING_BR): \
> > > > +			ret = rte_ring_mc_dequeue_burst_elem(r, obj, esize,
> > > n, \
> > > > +								NULL); \
> > > > +		} \
> > > > +} while (0)
> > >
> > >
> > > My thought to avoid test-code duplication was a bit different.
> > Yes, this can be done multiple ways. My implementation is not complicated
> either.
> >
> > > Instead of adding extra enums/parameters and then do switch on them,
> > > my
> > The switch statement should be removed by the compiler for the
> performance tests.
> 
> I am sure the compiler will do its job properly.
> My concern is that with all these extra flags, it is really hard to read and
> understand what exactly function we are calling and what we are trying to
> test.
There are just 2 flags - 1) representing single/bulk/burst 2) representing default/single/multiple threads. This is the way the rte_ring APIs are also organized (rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>).
If we want to keep the code flexible, we have to keep these 2 flags that can be varied.
Your proposal considers only the element size as a variable. It does not consider the above mentioned variables. This results in code duplication. This is visible in patch 10/17.

> Might be just me, but let say in original version for enqueue_bulk() we have:
> 
>         const uint64_t sp_start = rte_rdtsc();
>         for (i = 0; i < iterations; i++)
>                 while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
>                         rte_pause();
>         const uint64_t sp_end = rte_rdtsc();
> 
>         const uint64_t mp_start = rte_rdtsc();
>         for (i = 0; i < iterations; i++)
>                 while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
>                         rte_pause();
>         const uint64_t mp_end = rte_rdtsc();
> 
> Simple and easy to understand.
> Same code after the patch doesn't that straightforward anymore:
> 
>  const uint64_t sp_start = rte_rdtsc();
>         for (i = 0; i < iterations; i++)
>                 do {
>                         if (flag == 0)
>                                 TEST_RING_ENQUEUE(r, burst, esize, bsize, ret,
>                                                 TEST_RING_S | TEST_RING_BL);
>                         else if (flag == 1)
>                                 TEST_RING_DEQUEUE(r, burst, esize, bsize, ret,
>                                                 TEST_RING_S | TEST_RING_BL);
Would it help if the #define names are better?
May be convert

TEST_RING_SL to TEST_ELEM_SINGLE
TEST_RING_BL to TEST_ELEM_BULK
TEST_RING_BR to TEST_ELEM_BURST

and

TEST_RING_N to TEST_THREAD_DEFAULT
TEST_RING_S to TEST_THREAD_SPSC
TEST_RING_M to TEST_THREAD_MPMC

>                         if (ret == 0)
>                                 rte_pause();
>                 } while (!ret);
>  const uint64_t sp_end = rte_rdtsc();
> 
> Another thing - if tomorrow we'll want to add perf tests for elem_size==4/8,
> etc. - we'll need to do copy/paste for all test-case invocations, as you did for
> 16B (or some code reorg).
This is a mistake on my side. Looking at the code, 'test_ring_perf' can be simplified to avoid the copy/paste. 'test_ring_perf' can be changed to call another function (that contains the test cases) with different element sizes. I will make this change.
The only issue would be the wrappers 'dequeue_bulk', 'dequeue_bulk_16B' etc. However, the wrappers are simple enough to maintain.

> 
> >
> > > intention was something like that:
> > >
> > > 1. mv  test_ring_perf.c test_ring_perf.h 2. Inside test_ring_perf.h
> > > change rte_ring_ create/enqueue/dequeue function
> > >     calls to some not-defined function/macros invocations.
> > >    With similar name, same number of parameters, and same semantics.
> > >    Also change 'void *burst[..]' to 'RING_ELEM[...]'; 3. For each
> > > test configuration we want to have (default, 4B, 8B, 16B)
> > >     create a new .c file where we:
> > >     - define used in test_ring_perf.h macros(/function)
> > >    - include test_ring_perf.h
> > >    -  REGISTER_TEST_COMMAND(<test_name>, test_ring_perf);
> > >
> > > As an example:
> > > test_ring_perf.h:
> > > ...
> > > static int
> > > enqueue_bulk(void *p)
> > > {
> > >         ...
> > >         RING_ELEM burst[MAX_BURST];
> > >
> > >         memset(burst, 0, sizeof(burst));
> > >         ....
> > >         const uint64_t sp_start = rte_rdtsc();
> > >         for (i = 0; i < iterations; i++)
> > >                 while (RING_SP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
> > >                         rte_pause();
> > >         const uint64_t sp_end = rte_rdtsc();
> > >
> > >         const uint64_t mp_start = rte_rdtsc();
> > >         for (i = 0; i < iterations; i++)
> > >                 while (RING_MP_ENQUEUE_BULK(r, burst, size, NULL) == 0)
> > >                         rte_pause();
> > >         const uint64_t mp_end = rte_rdtsc();
> > >         ....
> > >
> > > Then in test_ring_perf.c:
> > >
> > > ....
> > > #define RING_ELEM	void *
> > > ...
> > > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> > >        rte_ring_sp_enqueue_bulk(ring, buf, size, spc) ....
> > >
> > > #include "test_ring_perf.h"
> > > REGISTER_TEST_COMMAND(ring_perf_autotest, test_ring_perf);
> > >
> > >
> > > In test_ring_elem16B_perf.c:
> > > ....
> > > #define RING_ELEM	__uint128_t
> > > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> > > 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> > > spc) ....
> > > #include "test_ring_perf.h"
> > > REGISTER_TEST_COMMAND(ring_perf_elem16B_autotest, test_ring_perf);
> > >
> > > In test_ring_elem4B_per.c:
> > >
> > > ....
> > > #define RING_ELEM	uint32_t
> > > #define RING_SP_ENQUEUE_BULK(ring, buf, size, spc)  \
> > > 	rte_ring_sp_enqueue_bulk_elem(ring, buf, sizeof(RING_ELEM), size,
> > > spc) ....
> > > #include "test_ring_perf.h"
> > > REGISTER_TEST_COMMAND(ring_perf_elem4B_autotest, test_ring_perf);
> > >
> > > And so on.
This will result in additional test files.

> > >
> > > > +
> > > > +/* This function is placed here as it is required for both
> > > > + * performance and functional tests.
> > > > + */
> > > > +static __rte_always_inline void * test_ring_calloc(unsigned int
> > > > +rsize, int esize) {
> > > > +	unsigned int sz;
> > > > +	void *p;
> > > > +
> > > > +	/* Legacy queue APIs? */
> > > > +	if (esize == -1)
> > > > +		sz = sizeof(void *);
> > > > +	else
> > > > +		sz = esize;
> > > > +
> > > > +	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
> > > > +	if (p == NULL)
> > > > +		printf("Failed to allocate memory\n");
> > > > +
> > > > +	return p;
> > > > +}
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-09  0:48                         ` Ananyev, Konstantin
@ 2020-01-09 16:06                           ` Honnappa Nagarahalli
  2020-01-13 11:53                             ` Ananyev, Konstantin
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-09 16:06 UTC (permalink / raw)
  To: Ananyev, Konstantin, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

<snip>
> > > > > > > > > > > +
> > > > > > > > > > > +static __rte_always_inline void
> > > > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t
> > > > > > > > > > > +prod_head, const void *obj_table, uint32_t n) {
> > > > > > > > > > > +unsigned int i; const uint32_t size =
> > > > > > > > > > > +r->size; uint32_t idx = prod_head & r->mask;
> > > > > > > > > > > +r->__uint128_t
> > > > > > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t
> > > > > > > > > > > +*obj = (const __uint128_t *)obj_table; if
> > > > > > > > > > > +(likely(idx + n <
> > > > > > > > > > > +size)) { for (i = 0; i < (n & ~0x1); i += 2, idx +=
> > > > > > > > > > > +2) { ring[idx] = obj[i]; ring[idx + 1] = obj[i +
> > > > > > > > > > > +1];
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > > > > Would it always be the case?
> > > > > > > > > I am not sure from the compiler perspective.
> > > > > > > > > At least on Arm architecture, unaligned access (address
> > > > > > > > > that is accessed is not aligned to the size of the data
> > > > > > > > > element being
> > > > > > > > > accessed) will result in faults or require additional cycles.
> > > > > > > > > So, aligning on
> > > > > > > 16B should be fine.
> > > > > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > > > > '__uint128_t' is
> > > > > > > not defined on 32b systems.
> > > > > > >
> > > > > > > What I am trying to say: with this code we imply new
> > > > > > > requirement for elems
> > > > > > The only existing use case in DPDK for 16B is the event ring.
> > > > > > The event ring
> > > > > already does similar kind of copy (using 'struct rte_event').
> > > > > > So, there is no change in expectations for event ring.
> > > > > > For future code, I think this expectation should be fine since
> > > > > > it allows for
> > > > > optimal code.
> > > > > >
> > > > > > > in the ring: when sizeof(elem)==16 it's alignment also has
> > > > > > > to be at least
> > > > > 16.
> > > > > > > Which from my perspective is not ideal.
> > > > > > Any reasoning?
> > > > >
> > > > > New implicit requirement and inconsistency.
> > > > > Code like that:
> > > > >
> > > > > struct ring_elem {uint64_t a, b;}; ....
> > > > > struct ring_elem elem;
> > > > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> > > > >
> > > > > might cause a crash.
> > > > The alignment here is 8B. Assuming that instructions generated
> > > > will require 16B alignment, it will result in a crash, if
> > > > configured to generate
> > > exception.
> > > > But, these instructions are not atomic instructions. At least on
> > > > aarch64, unaligned access will not result in an exception for
> > > > non-atomic
> > > loads/stores. I believe it is the same behavior for x86 as well.
> > >
> > > On IA, there are 2 types of 16B load/store instructions: aligned and
> unaligned.
> > > Aligned are a bit faster, but will cause an exception if used on non
> > > 16B aligned address.
> > > As you using uint128_t * compiler will assume that both src and dst
> > > are 16B aligned and might generate code with aligned instructions.
> > Ok, looking at few articles, I read that if the address is aligned,
> > the unaligned instructions do not incur the penalty. Is this understanding
> correct?
> 
> Yes, from my experience the difference is negligible.
> 
> >
> > I see 2 solutions here:
> > 1) We can switch this copy to use uint32_t pointer. It would still
> > allow the compiler to generate (unaligned) instructions for up to 256b
> > load/store. The 2 multiplications (to normalize the index and the size of copy)
> can use shifts. This should make it safer. If one wants performance, they can
> align the obj table to 16B (the ring itself is already aligned on the cache line
> boundary).
> 
> Sounds good to me.
> 
> >
> > 2) Considering that performance is paramount, we could document that
> > the obj table needs to be aligned on 16B boundary. This would affect event
> dev (if we go ahead with replacing the event ring implementation) significantly.
> 
> I don't think perf difference would be that significant to justify such constraint.
> I am in favor of #1.
Ok, will go with this.
Is it ok if I squash the intermediate commits for test cases? I can keep one commit for functional tests and another for performance tests.

> 
> > Note that we have to do the same thing for 64b elements as well.
> 
> I don't mind to have one unified copy procedure, which would always use 32bit
> elems, but AFAIK, on IA there is no such limitation for 64bit load/stores.
> 
> 
> >
> > >
> > > >
> > > > > While exactly the same code with:
> > > > >
> > > > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem
> > > > > {uint64_t a, b, c, d;};
> > > > >
> > > > > will work ok.
> > > > The alignment for these structures is still 8B. Are you saying
> > > > this will work because these will be copied using pointer to
> > > > uint32_t (whose
> > > alignment is 4B)?
> > >
> > > Yes, as we doing uint32_t copies, compiler can't assume the data
> > > will be 16B aligned and will use unaligned instructions.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > > > > > The rest of them need to be aligned on 4B boundary. However,
> > > > > > this should
> > > > > not affect the existing code.
> > > > > > The code for 8B and 16B is kept as is to ensure the
> > > > > > performance is not
> > > > > affected for the existing code.
> > > > <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size
  2020-01-09 16:06                           ` Honnappa Nagarahalli
@ 2020-01-13 11:53                             ` Ananyev, Konstantin
  0 siblings, 0 replies; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-13 11:53 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj, Richardson,
	Bruce, david.marchand, pbhagavatula
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, nd



> <snip>
> > > > > > > > > > > > +
> > > > > > > > > > > > +static __rte_always_inline void
> > > > > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t
> > > > > > > > > > > > +prod_head, const void *obj_table, uint32_t n) {
> > > > > > > > > > > > +unsigned int i; const uint32_t size =
> > > > > > > > > > > > +r->size; uint32_t idx = prod_head & r->mask;
> > > > > > > > > > > > +r->__uint128_t
> > > > > > > > > > > > +*ring = (__uint128_t *)&r[1]; const __uint128_t
> > > > > > > > > > > > +*obj = (const __uint128_t *)obj_table; if
> > > > > > > > > > > > +(likely(idx + n <
> > > > > > > > > > > > +size)) { for (i = 0; i < (n & ~0x1); i += 2, idx +=
> > > > > > > > > > > > +2) { ring[idx] = obj[i]; ring[idx + 1] = obj[i +
> > > > > > > > > > > > +1];
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > > > > > Would it always be the case?
> > > > > > > > > > I am not sure from the compiler perspective.
> > > > > > > > > > At least on Arm architecture, unaligned access (address
> > > > > > > > > > that is accessed is not aligned to the size of the data
> > > > > > > > > > element being
> > > > > > > > > > accessed) will result in faults or require additional cycles.
> > > > > > > > > > So, aligning on
> > > > > > > > 16B should be fine.
> > > > > > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > > > > > '__uint128_t' is
> > > > > > > > not defined on 32b systems.
> > > > > > > >
> > > > > > > > What I am trying to say: with this code we imply new
> > > > > > > > requirement for elems
> > > > > > > The only existing use case in DPDK for 16B is the event ring.
> > > > > > > The event ring
> > > > > > already does similar kind of copy (using 'struct rte_event').
> > > > > > > So, there is no change in expectations for event ring.
> > > > > > > For future code, I think this expectation should be fine since
> > > > > > > it allows for
> > > > > > optimal code.
> > > > > > >
> > > > > > > > in the ring: when sizeof(elem)==16 it's alignment also has
> > > > > > > > to be at least
> > > > > > 16.
> > > > > > > > Which from my perspective is not ideal.
> > > > > > > Any reasoning?
> > > > > >
> > > > > > New implicit requirement and inconsistency.
> > > > > > Code like that:
> > > > > >
> > > > > > struct ring_elem {uint64_t a, b;}; ....
> > > > > > struct ring_elem elem;
> > > > > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> > > > > >
> > > > > > might cause a crash.
> > > > > The alignment here is 8B. Assuming that instructions generated
> > > > > will require 16B alignment, it will result in a crash, if
> > > > > configured to generate
> > > > exception.
> > > > > But, these instructions are not atomic instructions. At least on
> > > > > aarch64, unaligned access will not result in an exception for
> > > > > non-atomic
> > > > loads/stores. I believe it is the same behavior for x86 as well.
> > > >
> > > > On IA, there are 2 types of 16B load/store instructions: aligned and
> > unaligned.
> > > > Aligned are a bit faster, but will cause an exception if used on non
> > > > 16B aligned address.
> > > > As you using uint128_t * compiler will assume that both src and dst
> > > > are 16B aligned and might generate code with aligned instructions.
> > > Ok, looking at few articles, I read that if the address is aligned,
> > > the unaligned instructions do not incur the penalty. Is this understanding
> > correct?
> >
> > Yes, from my experience the difference is negligible.
> >
> > >
> > > I see 2 solutions here:
> > > 1) We can switch this copy to use uint32_t pointer. It would still
> > > allow the compiler to generate (unaligned) instructions for up to 256b
> > > load/store. The 2 multiplications (to normalize the index and the size of copy)
> > can use shifts. This should make it safer. If one wants performance, they can
> > align the obj table to 16B (the ring itself is already aligned on the cache line
> > boundary).
> >
> > Sounds good to me.
> >
> > >
> > > 2) Considering that performance is paramount, we could document that
> > > the obj table needs to be aligned on 16B boundary. This would affect event
> > dev (if we go ahead with replacing the event ring implementation) significantly.
> >
> > I don't think perf difference would be that significant to justify such constraint.
> > I am in favor of #1.
> Ok, will go with this.
> Is it ok if I squash the intermediate commits for test cases? I can keep one commit for functional tests and another for performance tests.

Yes, sounds like a good thing for me.
Konstantin 


> 
> >
> > > Note that we have to do the same thing for 64b elements as well.
> >
> > I don't mind to have one unified copy procedure, which would always use 32bit
> > elems, but AFAIK, on IA there is no such limitation for 64bit load/stores.
> >
> >
> > >
> > > >
> > > > >
> > > > > > While exactly the same code with:
> > > > > >
> > > > > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem
> > > > > > {uint64_t a, b, c, d;};
> > > > > >
> > > > > > will work ok.
> > > > > The alignment for these structures is still 8B. Are you saying
> > > > > this will work because these will be copied using pointer to
> > > > > uint32_t (whose
> > > > alignment is 4B)?
> > > >
> > > > Yes, as we doing uint32_t copies, compiler can't assume the data
> > > > will be 16B aligned and will use unaligned instructions.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > Note that for elem sizes > 16 (24, 32), there is no such constraint.
> > > > > > > The rest of them need to be aligned on 4B boundary. However,
> > > > > > > this should
> > > > > > not affect the existing code.
> > > > > > > The code for 8B and 16B is kept as is to ensure the
> > > > > > > performance is not
> > > > > > affected for the existing code.
> > > > > <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (12 preceding siblings ...)
  2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
@ 2020-01-13 17:25   ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
                       ` (5 more replies)
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
  15 siblings, 6 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation.

v8
 - Changed the 128b copy elements inline function to use 'memcpy'
   to generate unaligned load/store instructions for x86. Generic
   copy function results in performance drop. (Konstantin)
 - Changed the API type #defines to be more clear (Konstantin)
 - Removed the code duplication in performance tests (Konstantin)
 - Fixed memory leak, changed test macros to inline functions (Konstantin)
 - Changed functional tests to test for 20B ring element. Fixed
   a bug in 32b element copy code for enqueue/dequeue(ring size
   needs to be normalized for 32b).
 - Squashed the functional and performance tests in their
   respective single commits.

v7
 - Merged the test cases to test both legacy APIs and
   rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)
 - Performance test cases are merged as well (Konstantin, Olivier)
 - Macros to copy elements are converted into inline functions (Olivier)
 - Added back the changes to hash and event libraries

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for rte_ring_xxx_elem APIs
  test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  lib/hash: use ring with 32b element size to save memory
  lib/eventdev: use custom element size ring for event rings

 app/test/test_ring.c                 | 1244 +++++++++++---------------
 app/test/test_ring.h                 |  187 ++++
 app/test/test_ring_perf.c            |  452 ++++++----
 lib/librte_eventdev/rte_event_ring.c |  147 +--
 lib/librte_eventdev/rte_event_ring.h |   45 +-
 lib/librte_hash/rte_cuckoo_hash.c    |   97 +-
 lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 13 files changed, 2115 insertions(+), 1113 deletions(-)
 create mode 100644 app/test/test_ring.h
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use division instead of modulo operation to calculate more
accurate cycle count.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test/test_ring_perf.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 70ee46ffe..6c2aca483 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -357,10 +357,10 @@ test_single_enqueue_dequeue(struct rte_ring *r)
 	}
 	const uint64_t mc_end = rte_rdtsc();
 
-	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
-			(sc_end-sc_start) >> iter_shift);
-	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
-			(mc_end-mc_start) >> iter_shift);
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
 }
 
 /*
@@ -395,13 +395,15 @@ test_burst_enqueue_dequeue(struct rte_ring *r)
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
-		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) / bulk_sizes[sz];
-		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) / bulk_sizes[sz];
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
 
-		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				mc_avg);
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 2/6] lib/ring: apis to support configurable element size
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 6 files changed, 1045 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 22454b084..917c560ad 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca8a435e9..f2f3ccc88 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,5 +3,9 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..3e15dc398 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,38 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
 {
 	ssize_t sz;
 
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		RTE_LOG(ERR, RING, "element size is not a multiple of 4\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be power of 2, and not exceed %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(sizeof(void *), count);
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +130,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +151,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(esize, count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +198,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, sizeof(void *), count, socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..15d79bf2a
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,1003 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with user defined element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
+			unsigned int count, int socket_id, unsigned int flags);
+
+static __rte_always_inline void
+enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+			ring[idx + 4] = obj[i + 4];
+			ring[idx + 5] = obj[i + 5];
+			ring[idx + 6] = obj[i + 6];
+			ring[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	const uint64_t *obj = (const uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++];
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+/* the actual enqueue of elements on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		enqueue_elems_64(r, prod_head, obj_table, num);
+	else if (esize == 16)
+		enqueue_elems_128(r, prod_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = prod_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+			obj[i + 4] = ring[idx + 4];
+			obj[i + 5] = ring[idx + 5];
+			obj[i + 6] = ring[idx + 6];
+			obj[i + 7] = ring[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	uint64_t *obj = (uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+	}
+}
+
+/* the actual dequeue of elements from the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, uint32_t cons_head, void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		dequeue_elems_64(r, cons_head, obj_table, num);
+	else if (esize == 16)
+		dequeue_elems_128(r, cons_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = cons_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		dequeue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	enqueue_elems(r, prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	dequeue_elems(r, cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 89d84bcf4..7a5328dd5 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,6 +15,8 @@ DPDK_20.0 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add basic infrastructure to test rte_ring_xxx_elem APIs.
Adjust the existing test cases to test for various ring
element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 1244 ++++++++++++++++++------------------------
 app/test/test_ring.h |  187 +++++++
 2 files changed, 722 insertions(+), 709 deletions(-)
 create mode 100644 app/test/test_ring.h

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index aaf1e70ad..649f65d38 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -23,11 +23,13 @@
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_random.h>
 #include <rte_errno.h>
 #include <rte_hexdump.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -55,8 +57,6 @@
 #define RING_SIZE 4096
 #define MAX_BULK 32
 
-static rte_atomic32_t synchro;
-
 #define	TEST_RING_VERIFY(exp)						\
 	if (!(exp)) {							\
 		printf("error at %s:%d\tcondition " #exp " failed\n",	\
@@ -67,795 +67,624 @@ static rte_atomic32_t synchro;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-/*
- * helper routine for test_ring_basic
- */
-static int
-test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[])
-{
-	unsigned i, rand;
-	const unsigned rsz = RING_SIZE - 1;
-
-	printf("Basic full/empty test\n");
-
-	for (i = 0; TEST_RING_FULL_EMTPY_ITER != i; i++) {
-
-		/* random shift in the ring */
-		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
-		printf("%s: iteration %u, random shift: %u;\n",
-		    __func__, i, rand);
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
-				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
-				NULL) == rand);
-
-		/* fill the ring */
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
-		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
-		TEST_RING_VERIFY(rsz == rte_ring_count(r));
-		TEST_RING_VERIFY(rte_ring_full(r));
-		TEST_RING_VERIFY(0 == rte_ring_empty(r));
-
-		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
-				NULL) == rsz);
-		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_full(r));
-		TEST_RING_VERIFY(rte_ring_empty(r));
+static int esize[] = {-1, 4, 8, 16, 20};
 
-		/* check data */
-		TEST_RING_VERIFY(0 == memcmp(src, dst, rsz));
-		rte_ring_dump(stdout, r);
-	}
-	return 0;
+static void**
+test_ring_inc_ptr(void **obj, int esize, unsigned int n)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return ((void **)obj) + n;
+	else
+		return (void **)(((uint32_t *)obj) +
+					(n * esize / sizeof(uint32_t)));
 }
 
-static int
-test_ring_basic(struct rte_ring *r)
+static void
+test_ring_mem_init(void *obj, unsigned int count, int esize)
 {
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i, num_elems;
-
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-	}
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	if (test_ring_basic_full_empty(r, src, dst) != 0)
-		goto fail;
-
-	cur_src = src;
-	cur_dst = dst;
+	unsigned int i;
 
-	printf("test default bulk enqueue / dequeue\n");
-	num_elems = 16;
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		for (i = 0; i < count; i++)
+			((void **)obj)[i] = (void *)(unsigned long)i;
+	else
+		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+			((uint32_t *)obj)[i] = i;
+}
 
-	cur_src = src;
-	cur_dst = dst;
+static void
+test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
+{
+	printf("\n%s: ", istr);
+
+	if (esize == -1)
+		printf("legacy APIs: ");
+	else
+		printf("elem APIs: element size %dB ", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if (api_type & TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if (api_type & TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if (api_type & TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if (api_type & TEST_RING_ELEM_SINGLE)
+		printf("single\n");
+	else if (api_type & TEST_RING_ELEM_BULK)
+		printf("bulk\n");
+	else if (api_type & TEST_RING_ELEM_BURST)
+		printf("burst\n");
+}
 
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue2\n");
-		goto fail;
-	}
+/*
+ * Various negative test cases.
+ */
+static int
+test_ring_negative_tests(void)
+{
+	struct rte_ring *rp = NULL;
+	struct rte_ring *rt = NULL;
+	unsigned int i;
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
+	/* Test with esize not a multiple of 4 */
+	rp = test_ring_create("test_bad_element_size", 23,
+				RING_SIZE + 1, SOCKET_ID_ANY, 0);
+	if (rp != NULL) {
+		printf("Test failed to detect invalid element size\n");
+		goto test_fail;
 	}
 
-	cur_src = src;
-	cur_dst = dst;
-
-	ret = rte_ring_mp_enqueue(r, cur_src);
-	if (ret != 0)
-		goto fail;
 
-	ret = rte_ring_mc_dequeue(r, cur_dst);
-	if (ret != 0)
-		goto fail;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		/* Test if ring size is not power of 2 */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RING_SIZE + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect odd count\n");
+			goto test_fail;
+		}
+
+		/* Test if ring size is exceeding the limit */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RTE_RING_SZ_MASK + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect limits\n");
+			goto test_fail;
+		}
+
+		/* Tests if lookup returns NULL on non-existing ring */
+		rp = rte_ring_lookup("ring_not_found");
+		if (rp != NULL && rte_errno != ENOENT) {
+			printf("Test failed to detect NULL ring lookup\n");
+			goto test_fail;
+		}
+
+		/* Test to if a non-power of 2 count causes the create
+		 * function to fail correctly
+		 */
+		rp = test_ring_create("test_ring_count", esize[i], 4097,
+					SOCKET_ID_ANY, 0);
+		if (rp != NULL)
+			goto test_fail;
+
+		rp = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("test_ring_negative fail to create ring\n");
+			goto test_fail;
+		}
+
+		if (rte_ring_lookup("test_ring_negative") != rp)
+			goto test_fail;
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_nagative ring is not empty but it should be\n");
+			goto test_fail;
+		}
+
+		/* Tests if it would always fail to create ring with an used
+		 * ring name.
+		 */
+		rt = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY, 0);
+		if (rt != NULL)
+			goto test_fail;
+
+		rte_ring_free(rp);
+	}
 
-	free(src);
-	free(dst);
 	return 0;
 
- fail:
-	free(src);
-	free(dst);
+test_fail:
+
+	rte_ring_free(rp);
 	return -1;
 }
 
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ */
 static int
-test_ring_burst_basic(struct rte_ring *r)
+test_ring_burst_bulk_tests(unsigned int api_type)
 {
+	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
 	int ret;
-	unsigned i;
-
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
+	unsigned int i, j;
+	unsigned int num_elems;
+	int rand;
+	const unsigned int rsz = RING_SIZE - 1;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
+
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
+
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("Test SP & SC basic functions \n");
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
 
-	cur_src = src;
-	cur_dst = dst;
+		printf("enqueue 1 obj\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 1);
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
-		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("enqueue 2 objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
 			goto fail;
-	}
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
 
-	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
-	/* Always one free entry left */
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is full  \n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-	printf("Test enqueue for a full entry  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	if (ret != 0)
-		goto fail;
-
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK);
 
-	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is empty \n");
-	/* Check if ring is empty */
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test MP & MC basic functions \n");
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
+		printf("dequeue 1 obj\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 1);
 
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("dequeue 2 objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
+
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK);
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		cur_src = src;
+		cur_dst = dst;
+
+		printf("fill and empty the ring\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
 			goto fail;
-	}
+		}
+
+		cur_src = src;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_ring_enqueue(r, cur_src, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK - 3);
 
-	/* Available memory space for the exact MAX_BULK objects */
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
+		printf("Test if ring is full\n");
+		if (rte_ring_full(r) != 1)
+			goto fail;
 
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
+		printf("Test enqueue for a full entry\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
+		if (ret != 0)
+			goto fail;
 
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_ring_dequeue(r, cur_dst, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
 
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("Test if ring is empty\n");
+		/* Check if ring is empty */
+		if (rte_ring_empty(r) != 1)
 			goto fail;
-	}
 
-	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		printf("Random full/empty test\n");
+		cur_src = src;
+		cur_dst = dst;
+
+		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
+			/* random shift in the ring */
+			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+			    __func__, i, rand);
+			ret = test_ring_enqueue(r, cur_src, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret == rand);
+
+			/* fill the ring */
+			ret = test_ring_enqueue(r, cur_src, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
+			TEST_RING_VERIFY(rsz == rte_ring_count(r));
+			TEST_RING_VERIFY(rte_ring_full(r));
+			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
+
+			/* empty the ring */
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret == (int)rsz);
+			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
+			TEST_RING_VERIFY(rte_ring_count(r) == 0);
+			TEST_RING_VERIFY(rte_ring_full(r) == 0);
+			TEST_RING_VERIFY(rte_ring_empty(r));
+
+			/* check data */
+			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
+			rte_ring_dump(stdout, r);
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
 	}
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Covering rte_ring_enqueue_burst functions \n");
-
-	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
 /*
- * it will always fail to create ring with a wrong ring size number in this function
- */
-static int
-test_ring_creation_with_wrong_size(void)
-{
-	struct rte_ring * rp = NULL;
-
-	/* Test if ring size is not power of 2 */
-	rp = rte_ring_create("test_bad_ring_size", RING_SIZE + 1, SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
-
-	/* Test if ring size is exceeding the limit */
-	rp = rte_ring_create("test_bad_ring_size", (RTE_RING_SZ_MASK + 1), SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
-	return 0;
-}
-
-/*
- * it tests if it would always fail to create ring with an used ring name
- */
-static int
-test_ring_creation_with_an_used_name(void)
-{
-	struct rte_ring * rp;
-
-	rp = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (NULL != rp)
-		return -1;
-
-	return 0;
-}
-
-/*
- * Test to if a non-power of 2 count causes the create
- * function to fail correctly
- */
-static int
-test_create_count_odd(void)
-{
-	struct rte_ring *r = rte_ring_create("test_ring_count",
-			4097, SOCKET_ID_ANY, 0 );
-	if(r != NULL){
-		return -1;
-	}
-	return 0;
-}
-
-static int
-test_lookup_null(void)
-{
-	struct rte_ring *rlp = rte_ring_lookup("ring_not_found");
-	if (rlp ==NULL)
-	if (rte_errno != ENOENT){
-		printf( "test failed to returnn error on null pointer\n");
-		return -1;
-	}
-	return 0;
-}
-
-/*
- * it tests some more basic ring operations
+ * Test default, single element, bulk and burst APIs
  */
 static int
 test_ring_basic_ex(void)
 {
 	int ret = -1;
-	unsigned i;
+	unsigned int i, j;
 	struct rte_ring *rp = NULL;
-	void **obj = NULL;
-
-	obj = rte_calloc("test_ring_basic_ex_malloc", RING_SIZE, sizeof(void *), 0);
-	if (obj == NULL) {
-		printf("test_ring_basic_ex fail to rte_malloc\n");
-		goto fail_test;
-	}
-
-	rp = rte_ring_create("test_ring_basic_ex", RING_SIZE, SOCKET_ID_ANY,
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (rp == NULL) {
-		printf("test_ring_basic_ex fail to create ring\n");
-		goto fail_test;
-	}
-
-	if (rte_ring_lookup("test_ring_basic_ex") != rp) {
-		goto fail_test;
-	}
-
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
-
-	printf("%u ring entries are now free\n", rte_ring_free_count(rp));
-
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_enqueue(rp, obj[i]);
-	}
-
-	if (rte_ring_full(rp) != 1) {
-		printf("test_ring_basic_ex ring is not full but it should be\n");
-		goto fail_test;
-	}
-
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_dequeue(rp, &obj[i]);
-	}
-
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
-
-	/* Covering the ring burst operation */
-	ret = rte_ring_enqueue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_enqueue_burst fails \n");
-		goto fail_test;
+	void *obj = NULL;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		obj = test_ring_calloc(RING_SIZE, esize[i]);
+		if (obj == NULL) {
+			printf("test_ring_basic_ex fail to rte_malloc\n");
+			goto fail_test;
+		}
+
+		rp = test_ring_create("test_ring_basic_ex", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("test_ring_basic_ex fail to create ring\n");
+			goto fail_test;
+		}
+
+		if (rte_ring_lookup("test_ring_basic_ex") != rp) {
+			printf("test_ring_basic_ex ring is not found\n");
+			goto fail_test;
+		}
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_basic_ex ring is not empty but it should be\n");
+			goto fail_test;
+		}
+
+		printf("%u ring entries are now free\n",
+			rte_ring_free_count(rp));
+
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_enqueue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
+
+		if (rte_ring_full(rp) != 1) {
+			printf("test_ring_basic_ex ring is not full but it should be\n");
+			goto fail_test;
+		}
+
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_dequeue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_basic_ex ring is not empty but it should be\n");
+			goto fail_test;
+		}
+
+		/* Following tests use the configured flags to decide
+		 * SP/SC or MP/MC.
+		 */
+		/* Covering the ring burst operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_enqueue_burst fails\n");
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_dequeue_burst fails\n");
+			goto fail_test;
+		}
+
+		/* Covering the ring bulk operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_enqueue_bulk fails\n");
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("test_ring_basic_ex: rte_ring_dequeue_bulk fails\n");
+			goto fail_test;
+		}
+
+		rte_ring_free(rp);
+		rte_free(obj);
+		rp = NULL;
+		obj = NULL;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
-		goto fail_test;
-	}
+	return 0;
 
-	ret = 0;
 fail_test:
 	rte_ring_free(rp);
 	if (obj != NULL)
 		rte_free(obj);
 
-	return ret;
+	return -1;
 }
 
+/*
+ * Basic test cases with exact size ring.
+ */
 static int
 test_ring_with_exact_size(void)
 {
-	struct rte_ring *std_ring = NULL, *exact_sz_ring = NULL;
-	void *ptr_array[16];
-	static const unsigned int ring_sz = RTE_DIM(ptr_array);
-	unsigned int i;
+	struct rte_ring *std_r = NULL, *exact_sz_r = NULL;
+	void *obj;
+	const unsigned int ring_sz = 16;
+	unsigned int i, j;
 	int ret = -1;
 
-	std_ring = rte_ring_create("std", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (std_ring == NULL) {
-		printf("%s: error, can't create std ring\n", __func__);
-		goto end;
-	}
-	exact_sz_ring = rte_ring_create("exact sz", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
-	if (exact_sz_ring == NULL) {
-		printf("%s: error, can't create exact size ring\n", __func__);
-		goto end;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test exact size ring",
+				TEST_RING_IGNORE_API_TYPE,
+				esize[i]);
+
+		/* alloc object pointers */
+		obj = test_ring_calloc(16, esize[i]);
+		if (obj == NULL)
+			goto test_fail;
+
+		std_r = test_ring_create("std", esize[i], ring_sz,
+					rte_socket_id(),
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (std_r == NULL) {
+			printf("%s: error, can't create std ring\n", __func__);
+			goto test_fail;
+		}
+		exact_sz_r = test_ring_create("exact sz", esize[i], ring_sz,
+				rte_socket_id(),
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (exact_sz_r == NULL) {
+			printf("%s: error, can't create exact size ring\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/*
+		 * Check that the exact size ring is bigger than the
+		 * standard ring
+		 */
+		if (rte_ring_get_size(std_r) >= rte_ring_get_size(exact_sz_r)) {
+			printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
+					__func__,
+					rte_ring_get_size(std_r),
+					rte_ring_get_size(exact_sz_r));
+			goto test_fail;
+		}
+		/*
+		 * check that the exact_sz_ring can hold one more element
+		 * than the standard ring. (16 vs 15 elements)
+		 */
+		for (j = 0; j < ring_sz - 1; j++) {
+			test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+			test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
+		ret = test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret != -ENOBUFS) {
+			printf("%s: error, unexpected successful enqueue\n",
+				__func__);
+			goto test_fail;
+		}
+		ret = test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret == -ENOBUFS) {
+			printf("%s: error, enqueue failed\n", __func__);
+			goto test_fail;
+		}
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_ring_dequeue(exact_sz_r, obj, esize[i], ring_sz,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != (int)ring_sz) {
+			printf("%s: error, failed to dequeue expected nb of elements\n",
+				__func__);
+			goto test_fail;
+		}
 
-	/*
-	 * Check that the exact size ring is bigger than the standard ring
-	 */
-	if (rte_ring_get_size(std_ring) >= rte_ring_get_size(exact_sz_ring)) {
-		printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
-				__func__,
-				rte_ring_get_size(std_ring),
-				rte_ring_get_size(exact_sz_ring));
-		goto end;
-	}
-	/*
-	 * check that the exact_sz_ring can hold one more element than the
-	 * standard ring. (16 vs 15 elements)
-	 */
-	for (i = 0; i < ring_sz - 1; i++) {
-		rte_ring_enqueue(std_ring, NULL);
-		rte_ring_enqueue(exact_sz_ring, NULL);
-	}
-	if (rte_ring_enqueue(std_ring, NULL) != -ENOBUFS) {
-		printf("%s: error, unexpected successful enqueue\n", __func__);
-		goto end;
-	}
-	if (rte_ring_enqueue(exact_sz_ring, NULL) == -ENOBUFS) {
-		printf("%s: error, enqueue failed\n", __func__);
-		goto end;
-	}
+		/* check that the capacity function returns expected value */
+		if (rte_ring_get_capacity(exact_sz_r) != ring_sz) {
+			printf("%s: error, incorrect ring capacity reported\n",
+					__func__);
+			goto test_fail;
+		}
 
-	/* check that dequeue returns the expected number of elements */
-	if (rte_ring_dequeue_burst(exact_sz_ring, ptr_array,
-			RTE_DIM(ptr_array), NULL) != ring_sz) {
-		printf("%s: error, failed to dequeue expected nb of elements\n",
-				__func__);
-		goto end;
+		rte_free(obj);
+		rte_ring_free(std_r);
+		rte_ring_free(exact_sz_r);
 	}
 
-	/* check that the capacity function returns expected value */
-	if (rte_ring_get_capacity(exact_sz_ring) != ring_sz) {
-		printf("%s: error, incorrect ring capacity reported\n",
-				__func__);
-		goto end;
-	}
+	return 0;
 
-	ret = 0; /* all ok if we get here */
-end:
-	rte_ring_free(std_ring);
-	rte_ring_free(exact_sz_ring);
-	return ret;
+test_fail:
+	rte_free(obj);
+	rte_ring_free(std_r);
+	rte_ring_free(exact_sz_r);
+	return -1;
 }
 
 static int
 test_ring(void)
 {
-	struct rte_ring *r = NULL;
+	unsigned int i, j;
 
-	/* some more basic operations */
-	if (test_ring_basic_ex() < 0)
-		goto test_fail;
-
-	rte_atomic32_init(&synchro);
-
-	r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (r == NULL)
-		goto test_fail;
-
-	/* retrieve the ring from its name */
-	if (rte_ring_lookup("test") != r) {
-		printf("Cannot lookup ring from its name\n");
-		goto test_fail;
-	}
-
-	/* burst operations */
-	if (test_ring_burst_basic(r) < 0)
-		goto test_fail;
-
-	/* basic operations */
-	if (test_ring_basic(r) < 0)
-		goto test_fail;
-
-	/* basic operations */
-	if ( test_create_count_odd() < 0){
-		printf("Test failed to detect odd count\n");
+	/* Negative test cases */
+	if (test_ring_negative_tests() < 0)
 		goto test_fail;
-	} else
-		printf("Test detected odd count\n");
 
-	if ( test_lookup_null() < 0){
-		printf("Test failed to detect NULL ring lookup\n");
-		goto test_fail;
-	} else
-		printf("Test detected NULL ring lookup\n");
-
-	/* test of creating ring with wrong size */
-	if (test_ring_creation_with_wrong_size() < 0)
+	/* some more basic operations */
+	if (test_ring_basic_ex() < 0)
 		goto test_fail;
 
-	/* test of creation ring with an used name */
-	if (test_ring_creation_with_an_used_name() < 0)
-		goto test_fail;
+	/* Burst and bulk operations with sp/sc, mp/mc and default */
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests(i | j) < 0)
+				goto test_fail;
 
 	if (test_ring_with_exact_size() < 0)
 		goto test_fail;
@@ -863,12 +692,9 @@ test_ring(void)
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-	rte_ring_free(r);
-
 	return 0;
 
 test_fail:
-	rte_ring_free(r);
 
 	return -1;
 }
diff --git a/app/test/test_ring.h b/app/test/test_ring.h
new file mode 100644
index 000000000..26716e4f8
--- /dev/null
+++ b/app/test/test_ring.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+/* API type to call
+ * rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>
+ * TEST_RING_THREAD_DEF - Uses configured SPSC/MPMC calls
+ * TEST_RING_THREAD_SPSC - Calls SP or SC API
+ * TEST_RING_THREAD_MPMC - Calls MP or MC API
+ */
+#define TEST_RING_THREAD_DEF 1
+#define TEST_RING_THREAD_SPSC 2
+#define TEST_RING_THREAD_MPMC 4
+
+/* API type to call
+ * SL - Calls single element APIs
+ * BL - Calls bulk APIs
+ * BR - Calls burst APIs
+ */
+#define TEST_RING_ELEM_SINGLE 8
+#define TEST_RING_ELEM_BULK 16
+#define TEST_RING_ELEM_BURST 32
+
+#define TEST_RING_IGNORE_API_TYPE ~0U
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static inline struct rte_ring*
+test_ring_create(const char *name, int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return rte_ring_create((name), (count), (socket_id), (flags));
+	else
+		return rte_ring_create_elem((name), (esize), (count),
+						(socket_id), (flags));
+}
+
+static __rte_always_inline unsigned int
+test_ring_enqueue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+static __rte_always_inline unsigned int
+test_ring_dequeue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static __rte_always_inline void *
+test_ring_calloc(unsigned int rsize, int esize)
+{
+	unsigned int sz;
+	void *p;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		sz = sizeof(void *);
+	else
+		sz = esize;
+
+	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 4/6] test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
  5 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Adjust the performance test cases to test rte_ring_xxx_elem APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 454 +++++++++++++++++++++++---------------
 1 file changed, 273 insertions(+), 181 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 6c2aca483..8d1217951 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -13,6 +13,7 @@
 #include <string.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -41,6 +42,35 @@ struct lcore_pair {
 
 static volatile unsigned lcore_count = 0;
 
+static void
+test_ring_print_test_string(unsigned int api_type, int esize,
+	unsigned int bsz, double value)
+{
+	if (esize == -1)
+		printf("legacy APIs");
+	else
+		printf("elem APIs: element size %dB", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if ((api_type & TEST_RING_THREAD_DEF) == TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if ((api_type & TEST_RING_THREAD_SPSC) == TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if ((api_type & TEST_RING_THREAD_MPMC) == TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if ((api_type & TEST_RING_ELEM_SINGLE) == TEST_RING_ELEM_SINGLE)
+		printf("single: ");
+	else if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+		printf("bulk (size: %u): ", bsz);
+	else if ((api_type & TEST_RING_ELEM_BURST) == TEST_RING_ELEM_BURST)
+		printf("burst (size: %u): ", bsz);
+
+	printf("%.2F\n", value);
+}
+
 /**** Functions to analyse our core mask to get cores for different tests ***/
 
 static int
@@ -117,27 +147,21 @@ get_two_sockets(struct lcore_pair *lcp)
 
 /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
 static void
-test_empty_dequeue(struct rte_ring *r)
+test_empty_dequeue(struct rte_ring *r, const int esize,
+			const unsigned int api_type)
 {
-	const unsigned iter_shift = 26;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 26;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst[MAX_BURST];
 
-	const uint64_t sc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t sc_end = rte_rdtsc();
+		test_ring_dequeue(r, burst, esize, bulk_sizes[0], api_type);
+	const uint64_t end = rte_rdtsc();
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t mc_end = rte_rdtsc();
-
-	printf("SC empty dequeue: %.2F\n",
-			(double)(sc_end-sc_start) / iterations);
-	printf("MC empty dequeue: %.2F\n",
-			(double)(mc_end-mc_start) / iterations);
+	test_ring_print_test_string(api_type, esize, bulk_sizes[0],
+					((double)(end - start)) / iterations);
 }
 
 /*
@@ -151,19 +175,21 @@ struct thread_params {
 };
 
 /*
- * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
- * thread running dequeue_bulk function
+ * Helper function to call bulk SP/MP enqueue functions.
+ * flag == 0 -> enqueue
+ * flag == 1 -> dequeue
  */
-static int
-enqueue_bulk(void *p)
+static __rte_always_inline int
+enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize,
+	struct thread_params *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
+	int ret;
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	struct rte_ring *r = p->r;
+	unsigned int bsize = p->size;
+	unsigned int i;
+	void *burst = NULL;
 
 #ifdef RTE_USE_C11_MEM_MODEL
 	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
@@ -173,23 +199,67 @@ enqueue_bulk(void *p)
 		while(lcore_count != 2)
 			rte_pause();
 
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
+
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t mp_end = rte_rdtsc();
 
-	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
-	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	p->spsc = ((double)(sp_end - sp_start))/(iterations * bsize);
+	p->mpmc = ((double)(mp_end - mp_start))/(iterations * bsize);
 	return 0;
 }
 
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, -1, params);
+}
+
+static int
+enqueue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, 16, params);
+}
+
 /*
  * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
  * thread running enqueue_bulk function
@@ -197,49 +267,38 @@ enqueue_bulk(void *p)
 static int
 dequeue_bulk(void *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
 	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
-
-#ifdef RTE_USE_C11_MEM_MODEL
-	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
-#else
-	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
-#endif
-		while(lcore_count != 2)
-			rte_pause();
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t sc_end = rte_rdtsc();
+	return enqueue_dequeue_bulk_helper(1, -1, params);
+}
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t mc_end = rte_rdtsc();
+static int
+dequeue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
 
-	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
-	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
-	return 0;
+	return enqueue_dequeue_bulk_helper(1, 16, params);
 }
 
 /*
  * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
  * used to measure ring perf between hyperthreads, cores and sockets.
  */
-static void
-run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
-		lcore_function_t f1, lcore_function_t f2)
+static int
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 {
+	lcore_function_t *f1, *f2;
 	struct thread_params param1 = {0}, param2 = {0};
 	unsigned i;
+
+	if (esize == -1) {
+		f1 = enqueue_bulk;
+		f2 = dequeue_bulk;
+	} else {
+		f1 = enqueue_bulk_16B;
+		f2 = dequeue_bulk_16B;
+	}
+
 	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
 		lcore_count = 0;
 		param1.size = param2.size = bulk_sizes[i];
@@ -251,14 +310,20 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
 		} else {
 			rte_eal_remote_launch(f1, &param1, cores->c1);
 			rte_eal_remote_launch(f2, &param2, cores->c2);
-			rte_eal_wait_lcore(cores->c1);
-			rte_eal_wait_lcore(cores->c2);
+			if (rte_eal_wait_lcore(cores->c1) < 0)
+				return -1;
+			if (rte_eal_wait_lcore(cores->c2) < 0)
+				return -1;
 		}
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.spsc + param2.spsc);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.mpmc + param2.mpmc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.spsc + param2.spsc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.mpmc + param2.mpmc);
 	}
+
+	return 0;
 }
 
 static rte_atomic32_t synchro;
@@ -267,7 +332,7 @@ static uint64_t queue_count[RTE_MAX_LCORE];
 #define TIME_MS 100
 
 static int
-load_loop_fn(void *p)
+load_loop_fn_helper(struct thread_params *p, const int esize)
 {
 	uint64_t time_diff = 0;
 	uint64_t begin = 0;
@@ -275,7 +340,11 @@ load_loop_fn(void *p)
 	uint64_t lcount = 0;
 	const unsigned int lcore = rte_lcore_id();
 	struct thread_params *params = p;
-	void *burst[MAX_BURST] = {0};
+	void *burst = NULL;
+
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
 	/* wait synchro for slaves */
 	if (lcore != rte_get_master_lcore())
@@ -284,22 +353,49 @@ load_loop_fn(void *p)
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
-		rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL);
-		rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL);
+		test_ring_enqueue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
+		test_ring_dequeue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 		lcount++;
 		time_diff = rte_get_timer_cycles() - begin;
 	}
 	queue_count[lcore] = lcount;
+
+	rte_free(burst);
+
 	return 0;
 }
 
 static int
-run_on_all_cores(struct rte_ring *r)
+load_loop_fn(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, -1);
+}
+
+static int
+load_loop_fn_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, 16);
+}
+
+static int
+run_on_all_cores(struct rte_ring *r, const int esize)
 {
 	uint64_t total = 0;
 	struct thread_params param;
+	lcore_function_t *lcore_f;
 	unsigned int i, c;
 
+	if (esize == -1)
+		lcore_f = load_loop_fn;
+	else
+		lcore_f = load_loop_fn_16B;
+
 	memset(&param, 0, sizeof(struct thread_params));
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
 		printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]);
@@ -308,13 +404,12 @@ run_on_all_cores(struct rte_ring *r)
 
 		/* clear synchro and start slaves */
 		rte_atomic32_set(&synchro, 0);
-		if (rte_eal_mp_remote_launch(load_loop_fn, &param,
-			SKIP_MASTER) < 0)
+		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MASTER) < 0)
 			return -1;
 
 		/* start synchro and launch test on master */
 		rte_atomic32_set(&synchro, 1);
-		load_loop_fn(&param);
+		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
 
@@ -335,155 +430,152 @@ run_on_all_cores(struct rte_ring *r)
  * Test function that determines how long an enqueue + dequeue of a single item
  * takes on a single lcore. Result is for comparison with the bulk enq+deq.
  */
-static void
-test_single_enqueue_dequeue(struct rte_ring *r)
+static int
+test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 24;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 24;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst = NULL;
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++) {
-		rte_ring_sp_enqueue(r, burst);
-		rte_ring_sc_dequeue(r, &burst);
-	}
-	const uint64_t sc_end = rte_rdtsc();
+	/* alloc dummy object pointers */
+	burst = test_ring_calloc(1, esize);
+	if (burst == NULL)
+		return -1;
 
-	const uint64_t mc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++) {
-		rte_ring_mp_enqueue(r, burst);
-		rte_ring_mc_dequeue(r, &burst);
+		test_ring_enqueue(r, burst, esize, 1, api_type);
+		test_ring_dequeue(r, burst, esize, 1, api_type);
 	}
-	const uint64_t mc_end = rte_rdtsc();
+	const uint64_t end = rte_rdtsc();
+
+	test_ring_print_test_string(api_type, esize, 1,
+					((double)(end - start)) / iterations);
 
-	printf("SP/SC single enq/dequeue: %.2F\n",
-			((double)(sc_end-sc_start)) / iterations);
-	printf("MP/MC single enq/dequeue: %.2F\n",
-			((double)(mc_end-mc_start)) / iterations);
+	rte_free(burst);
+
+	return 0;
 }
 
 /*
- * Test that does both enqueue and dequeue on a core using the burst() API calls
- * instead of the bulk() calls used in other tests. Results should be the same
- * as for the bulk function called on a single lcore.
+ * Test that does both enqueue and dequeue on a core using the burst/bulk API
+ * calls Results should be the same as for the bulk function called on a
+ * single lcore.
  */
-static void
-test_burst_enqueue_dequeue(struct rte_ring *r)
+static int
+test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int sz, i = 0;
+	void **burst = NULL;
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
-		const uint64_t mc_start = rte_rdtsc();
+	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
+		const uint64_t start = rte_rdtsc();
 		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
+			test_ring_enqueue(r, burst, esize, bulk_sizes[sz],
+						api_type);
+			test_ring_dequeue(r, burst, esize, bulk_sizes[sz],
+						api_type);
 		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
-					bulk_sizes[sz];
-		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
-					bulk_sizes[sz];
+		const uint64_t end = rte_rdtsc();
 
-		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], mc_avg);
+		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
+					((double)(end - start)) / iterations);
 	}
-}
 
-/* Times enqueue and dequeue on a single lcore */
-static void
-test_bulk_enqueue_dequeue(struct rte_ring *r)
-{
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	rte_free(burst);
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
-
-		const uint64_t mc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double sc_avg = ((double)(sc_end-sc_start) /
-				(iterations * bulk_sizes[sz]));
-		double mc_avg = ((double)(mc_end-mc_start) /
-				(iterations * bulk_sizes[sz]));
-
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				mc_avg);
-	}
+	return 0;
 }
 
-static int
-test_ring_perf(void)
+/* Run all tests for a given element size */
+static __rte_always_inline int
+test_ring_perf_esize(const int esize)
 {
 	struct lcore_pair cores;
 	struct rte_ring *r = NULL;
 
-	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
+	/*
+	 * Performance test for legacy/_elem APIs
+	 * SP-SC/MP-MC, single
+	 */
+	r = test_ring_create(RING_NAME, esize, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
 		return -1;
 
-	printf("### Testing single element and burst enq/deq ###\n");
-	test_single_enqueue_dequeue(r);
-	test_burst_enqueue_dequeue(r);
+	printf("\n### Testing single element enq/deq ###\n");
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE) < 0)
+		return -1;
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE) < 0)
+		return -1;
+
+	printf("\n### Testing burst enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST) < 0)
+		return -1;
 
-	printf("\n### Testing empty dequeue ###\n");
-	test_empty_dequeue(r);
+	printf("\n### Testing bulk enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK) < 0)
+		return -1;
 
-	printf("\n### Testing using a single lcore ###\n");
-	test_bulk_enqueue_dequeue(r);
+	printf("\n### Testing empty bulk deq ###\n");
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK);
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 	if (get_two_cores(&cores) == 0) {
 		printf("\n### Testing using two physical cores ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 	if (get_two_sockets(&cores) == 0) {
 		printf("\n### Testing using two NUMA nodes ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 
 	printf("\n### Testing using all slave nodes ###\n");
-	run_on_all_cores(r);
+	if (run_on_all_cores(r, esize) < 0)
+		return -1;
 
 	rte_ring_free(r);
+
+	return 0;
+}
+
+static int
+test_ring_perf(void)
+{
+	/* Run all the tests for different element sizes */
+	if (test_ring_perf_esize(-1) == -1)
+		return -1;
+
+	if (test_ring_perf_esize(16) == -1)
+		return -1;
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
  5 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 97 ++++++++++++++++---------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..734bec2ac 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -136,7 +136,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	char ring_name[RTE_RING_NAMESIZE];
 	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
-	unsigned i;
 	unsigned int hw_trans_mem_support = 0, use_local_cache = 0;
 	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
@@ -213,8 +212,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
-			params->socket_id, 0);
+	r = rte_ring_create_elem(ring_name, sizeof(uint32_t),
+			rte_align32pow2(num_key_slots), params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
 		goto err;
@@ -227,7 +226,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_elem(ext_ring_name, sizeof(uint32_t),
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -294,8 +293,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * use bucket index for the linked list and 0 means NULL
 		 * for next bucket
 		 */
-		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+		for (uint32_t i = 1; i <= num_buckets; i++)
+			rte_ring_sp_enqueue_elem(r_ext, &i, sizeof(uint32_t));
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -433,8 +432,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	}
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
-	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+	for (uint32_t i = 1; i < num_key_slots; i++)
+		rte_ring_sp_enqueue_elem(r, &i, sizeof(uint32_t));
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +597,13 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(h->free_slots, &i, sizeof(uint32_t));
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &i,
+							sizeof(uint32_t));
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +622,14 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t));
 }
 
 /* Search a key from bucket and update its data.
@@ -923,9 +923,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
-	uint32_t new_idx, bkt_id;
+	uint32_t slot_id;
+	uint32_t ext_bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
@@ -968,8 +967,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_elem(h->free_slots,
 					cached_free_slots->objs,
+					sizeof(uint32_t),
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
 				return -ENOSPC;
@@ -982,13 +982,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t)) != 0) {
 			return -ENOSPC;
 		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1001,9 +1001,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					short_sig, new_idx, &ret_val);
+					short_sig, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1011,9 +1011,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-				short_sig, prim_bucket_idx, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1021,10 +1021,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-				short_sig, sec_bucket_idx, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, slot_id, &ret_val);
 
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1067,10 +1067,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
-						 new_idx,
+						 slot_id,
 						 __ATOMIC_RELEASE);
 				__hash_rw_writer_unlock(h);
-				return new_idx - 1;
+				return slot_id - 1;
 			}
 		}
 	}
@@ -1078,26 +1078,26 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_elem(h->free_ext_bkts, &ext_bkt_id,
+						sizeof(uint32_t)) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
+	(h->buckets_ext[ext_bkt_id - 1]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
-			 new_idx,
+	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+			 slot_id,
 			 __ATOMIC_RELEASE);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
-	last->next = &h->buckets_ext[bkt_id];
+	last->next = &h->buckets_ext[ext_bkt_id - 1];
 	__hash_rw_writer_unlock(h);
-	return new_idx - 1;
+	return slot_id - 1;
 
 failure:
 	__hash_rw_writer_unlock(h);
@@ -1373,8 +1373,9 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
 				"%s: could not enqueue free slots in global ring\n",
@@ -1383,11 +1384,11 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+							bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_elem(h->free_slots,
+				&bkt->key_idx[i], sizeof(uint32_t));
 	}
 }
 
@@ -1551,7 +1552,8 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1616,8 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1628,19 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_elem(h->free_slots, &key_idx,
+						sizeof(uint32_t));
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v8 6/6] lib/eventdev: use custom element size ring for event rings
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2020-01-13 17:25     ` Honnappa Nagarahalli
       [not found]       ` <1578977880-13011-1-git-send-email-robot@bytheb.org>
  5 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-13 17:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use custom element size ring APIs to replace event ring
implementation. This avoids code duplication.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_eventdev/rte_event_ring.c | 147 ++-------------------------
 lib/librte_eventdev/rte_event_ring.h |  45 ++++----
 2 files changed, 24 insertions(+), 168 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
index 50190de01..d27e23901 100644
--- a/lib/librte_eventdev/rte_event_ring.c
+++ b/lib/librte_eventdev/rte_event_ring.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <sys/queue.h>
@@ -11,13 +12,6 @@
 #include <rte_eal_memconfig.h>
 #include "rte_event_ring.h"
 
-TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
-
-static struct rte_tailq_elem rte_event_ring_tailq = {
-	.name = RTE_TAILQ_EVENT_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_event_ring_tailq)
-
 int
 rte_event_ring_init(struct rte_event_ring *r, const char *name,
 	unsigned int count, unsigned int flags)
@@ -35,150 +29,21 @@ struct rte_event_ring *
 rte_event_ring_create(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_event_ring *r;
-	struct rte_tailq_entry *te;
-	const struct rte_memzone *mz;
-	ssize_t ring_size;
-	int mz_flags = 0;
-	struct rte_event_ring_list *ring_list = NULL;
-	const unsigned int requested_count = count;
-	int ret;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-		rte_event_ring_list);
-
-	/* for an exact size ring, round up from count to a power of two */
-	if (flags & RING_F_EXACT_SZ)
-		count = rte_align32pow2(count + 1);
-	else if (!rte_is_power_of_2(count)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
-
-	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
-		RTE_RING_MZ_PREFIX, name);
-	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-		rte_errno = ENAMETOOLONG;
-		return NULL;
-	}
-
-	te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-	if (te == NULL) {
-		RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	rte_mcfg_tailq_write_lock();
-
-	/*
-	 * reserve a memory zone for this ring. If we can't get rte_config or
-	 * we are secondary process, the memzone_reserve function will set
-	 * rte_errno for us appropriately - hence no check in this this function
-	 */
-	mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
-	if (mz != NULL) {
-		r = mz->addr;
-		/* Check return value in case rte_ring_init() fails on size */
-		int err = rte_event_ring_init(r, name, requested_count, flags);
-		if (err) {
-			RTE_LOG(ERR, RING, "Ring init failed\n");
-			if (rte_memzone_free(mz) != 0)
-				RTE_LOG(ERR, RING, "Cannot free memzone\n");
-			rte_free(te);
-			rte_mcfg_tailq_write_unlock();
-			return NULL;
-		}
-
-		te->data = (void *) r;
-		r->r.memzone = mz;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-	} else {
-		r = NULL;
-		RTE_LOG(ERR, RING, "Cannot reserve memory\n");
-		rte_free(te);
-	}
-	rte_mcfg_tailq_write_unlock();
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_create_elem(name,
+						sizeof(struct rte_event),
+						count, socket_id, flags);
 }
 
 
 struct rte_event_ring *
 rte_event_ring_lookup(const char *name)
 {
-	struct rte_tailq_entry *te;
-	struct rte_event_ring *r = NULL;
-	struct rte_event_ring_list *ring_list;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-
-	rte_mcfg_tailq_read_lock();
-
-	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_event_ring *) te->data;
-		if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
-			break;
-	}
-
-	rte_mcfg_tailq_read_unlock();
-
-	if (te == NULL) {
-		rte_errno = ENOENT;
-		return NULL;
-	}
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_lookup(name);
 }
 
 /* free the ring */
 void
 rte_event_ring_free(struct rte_event_ring *r)
 {
-	struct rte_event_ring_list *ring_list = NULL;
-	struct rte_tailq_entry *te;
-
-	if (r == NULL)
-		return;
-
-	/*
-	 * Ring was not created with rte_event_ring_create,
-	 * therefore, there is no memzone to free.
-	 */
-	if (r->r.memzone == NULL) {
-		RTE_LOG(ERR, RING,
-			"Cannot free ring (not created with rte_event_ring_create()");
-		return;
-	}
-
-	if (rte_memzone_free(r->r.memzone) != 0) {
-		RTE_LOG(ERR, RING, "Cannot free memory\n");
-		return;
-	}
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-	rte_mcfg_tailq_write_lock();
-
-	/* find out tailq entry */
-	TAILQ_FOREACH(te, ring_list, next) {
-		if (te->data == (void *) r)
-			break;
-	}
-
-	if (te == NULL) {
-		rte_mcfg_tailq_write_unlock();
-		return;
-	}
-
-	TAILQ_REMOVE(ring_list, te, next);
-
-	rte_mcfg_tailq_write_unlock();
-
-	rte_free(te);
+	rte_ring_free((struct rte_ring *)r);
 }
diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
index 827a3209e..c0861b0ec 100644
--- a/lib/librte_eventdev/rte_event_ring.h
+++ b/lib/librte_eventdev/rte_event_ring.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2016-2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 /**
@@ -19,6 +20,7 @@
 #include <rte_memory.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
@@ -88,22 +90,17 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
 		const struct rte_event *events,
 		unsigned int n, uint16_t *free_space)
 {
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
+	unsigned int num;
+	uint32_t space;
 
-	n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
+	num = rte_ring_enqueue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&space);
 
-	ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
-
-	update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
-end:
 	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
+		*free_space = space;
+
+	return num;
 }
 
 /**
@@ -129,23 +126,17 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
 		struct rte_event *events,
 		unsigned int n, uint16_t *available)
 {
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
+	unsigned int num;
+	uint32_t remaining;
 
-	DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
+	num = rte_ring_dequeue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&remaining);
 
-	update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
-
-end:
 	if (available != NULL)
-		*available = entries - n;
-	return n;
+		*available = remaining;
+
+	return num;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
       [not found]         ` <VE1PR08MB5149BE79083CD66A41CBD6D198340@VE1PR08MB5149.eurprd08.prod.outlook.com>
@ 2020-01-14 15:12           ` Aaron Conole
  2020-01-14 16:51             ` Aaron Conole
  0 siblings, 1 reply; 173+ messages in thread
From: Aaron Conole @ 2020-01-14 15:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: test-report, ci, dev

Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:

> Hi Aaron,
> I am not able to understand the error, looks like there is no
> particular error. Can you please take a look?

Gladly.  A number of the systems that were running the build stopped
their output for an unknown reason (looks like this was a 1-time
thing).  See the error:

  [2164/2165] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.

  No output has been received in the last 10m0s, this potentially
  indicates a stalled build or something wrong with the build itself.

My guess is some kind of infrastructure change happened during the
build?  Maybe compiling the test_ring_perf.c object has been pushed out
too far.  I've restarted one of the jobs to see if it will successfully
execute.  If so, it's just a fluke.

As for the fluke part, the intent is to enhance the robot to detect
these "error" conditions and issue a restart for the travis build
(rather than not report a valid state).  That's work TBD (but patches
are welcome - see https://github.com/orgcandman/pw-ci for all the bot
code).

> Thank you,
> Honnappa
>
> -----Original Message-----
> From: 0-day Robot <robot@bytheb.org>
> Sent: Monday, January 13, 2020 10:58 PM
> To: test-report@dpdk.org
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; robot@bytheb.org
> Subject: || pw64572 lib/eventdev: use custom element size ring for event rings
>
> From: robot@bytheb.org
>
> Test-Label: travis-robot
> Test-Status:
> http://dpdk.org/patch/64572
>
> _Travis build: errored_
> Build URL: https://travis-ci.com/ovsrobot/dpdk/builds/144202958
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-14 15:12           ` [dpdk-dev] FW: || pw64572 " Aaron Conole
@ 2020-01-14 16:51             ` Aaron Conole
  2020-01-14 19:35               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Aaron Conole @ 2020-01-14 16:51 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Honnappa Nagarahalli, test-report, ci, dev

Aaron Conole <aconole@redhat.com> writes:

> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>
>> Hi Aaron,
>> I am not able to understand the error, looks like there is no
>> particular error. Can you please take a look?
>
> Gladly.  A number of the systems that were running the build stopped
> their output for an unknown reason (looks like this was a 1-time
> thing).  See the error:
>
>   [2164/2165] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.
>
>   No output has been received in the last 10m0s, this potentially
>   indicates a stalled build or something wrong with the build itself.

I see this continually happening (I've kicked it off a number of times).

This patch might need more investigation, since it's always failing when
building 2164/2165 object.

I'll note that it seems to be clang related, rather than gcc related.

> My guess is some kind of infrastructure change happened during the
> build?  Maybe compiling the test_ring_perf.c object has been pushed out
> too far.  I've restarted one of the jobs to see if it will successfully
> execute.  If so, it's just a fluke.
>
> As for the fluke part, the intent is to enhance the robot to detect
> these "error" conditions and issue a restart for the travis build
> (rather than not report a valid state).  That's work TBD (but patches
> are welcome - see https://github.com/orgcandman/pw-ci for all the bot
> code).
>
>> Thank you,
>> Honnappa
>>
>> -----Original Message-----
>> From: 0-day Robot <robot@bytheb.org>
>> Sent: Monday, January 13, 2020 10:58 PM
>> To: test-report@dpdk.org
>> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; robot@bytheb.org
>> Subject: || pw64572 lib/eventdev: use custom element size ring for event rings
>>
>> From: robot@bytheb.org
>>
>> Test-Label: travis-robot
>> Test-Status:
>> http://dpdk.org/patch/64572
>>
>> _Travis build: errored_
>> Build URL: https://travis-ci.com/ovsrobot/dpdk/builds/144202958
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose
>> the contents to any other person, use it for any purpose, or store or
>> copy the information in any medium. Thank you.
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose
>> the contents to any other person, use it for any purpose, or store or
>> copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-14 16:51             ` Aaron Conole
@ 2020-01-14 19:35               ` Honnappa Nagarahalli
  2020-01-14 20:44                 ` Aaron Conole
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-14 19:35 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, nd

<snip>
> 
> Aaron Conole <aconole@redhat.com> writes:
> 
> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> >
> >> Hi Aaron,
> >> I am not able to understand the error, looks like there is no
> >> particular error. Can you please take a look?
> >
> > Gladly.  A number of the systems that were running the build stopped
> > their output for an unknown reason (looks like this was a 1-time
> > thing).  See the error:
> >
> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> test@exe/test_ring_perf.c.o'.
> >
> >   No output has been received in the last 10m0s, this potentially
> >   indicates a stalled build or something wrong with the build itself.
> 
> I see this continually happening (I've kicked it off a number of times).
> 
> This patch might need more investigation, since it's always failing when
> building 2164/2165 object.
I compiled with clang-7. Compiler seems to hang while compiling test_ring.c

> 
> I'll note that it seems to be clang related, rather than gcc related.
> 
> > My guess is some kind of infrastructure change happened during the
> > build?  Maybe compiling the test_ring_perf.c object has been pushed
> > out too far.  I've restarted one of the jobs to see if it will
> > successfully execute.  If so, it's just a fluke.
> >
> > As for the fluke part, the intent is to enhance the robot to detect
> > these "error" conditions and issue a restart for the travis build
> > (rather than not report a valid state).  That's work TBD (but patches
> > are welcome - see https://github.com/orgcandman/pw-ci for all the bot
> > code).
> >
> >> Thank you,
> >> Honnappa
> >>
> >> -----Original Message-----
> >> From: 0-day Robot <robot@bytheb.org>
> >> Sent: Monday, January 13, 2020 10:58 PM
> >> To: test-report@dpdk.org
> >> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> >> robot@bytheb.org
> >> Subject: || pw64572 lib/eventdev: use custom element size ring for
> >> event rings
> >>
> >> From: robot@bytheb.org
> >>
> >> Test-Label: travis-robot
> >> Test-Status:
> >> http://dpdk.org/patch/64572
> >>
> >> _Travis build: errored_
> >> Build URL: https://travis-ci.com/ovsrobot/dpdk/builds/144202958

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-14 19:35               ` Honnappa Nagarahalli
@ 2020-01-14 20:44                 ` Aaron Conole
  2020-01-15  0:55                   ` Honnappa Nagarahalli
  2020-01-15  4:43                   ` Honnappa Nagarahalli
  0 siblings, 2 replies; 173+ messages in thread
From: Aaron Conole @ 2020-01-14 20:44 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: test-report, ci, dev, nd

Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:

> <snip>
>> 
>> Aaron Conole <aconole@redhat.com> writes:
>> 
>> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>> >
>> >> Hi Aaron,
>> >> I am not able to understand the error, looks like there is no
>> >> particular error. Can you please take a look?
>> >
>> > Gladly.  A number of the systems that were running the build stopped
>> > their output for an unknown reason (looks like this was a 1-time
>> > thing).  See the error:
>> >
>> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
>> test@exe/test_ring_perf.c.o'.
>> >
>> >   No output has been received in the last 10m0s, this potentially
>> >   indicates a stalled build or something wrong with the build itself.
>> 
>> I see this continually happening (I've kicked it off a number of times).
>> 
>> This patch might need more investigation, since it's always failing when
>> building 2164/2165 object.
> I compiled with clang-7. Compiler seems to hang while compiling test_ring.c

Cool.  Looks like a good catch, then :)

>> 
>> I'll note that it seems to be clang related, rather than gcc related.
>> 
>> > My guess is some kind of infrastructure change happened during the
>> > build?  Maybe compiling the test_ring_perf.c object has been pushed
>> > out too far.  I've restarted one of the jobs to see if it will
>> > successfully execute.  If so, it's just a fluke.
>> >
>> > As for the fluke part, the intent is to enhance the robot to detect
>> > these "error" conditions and issue a restart for the travis build
>> > (rather than not report a valid state).  That's work TBD (but patches
>> > are welcome - see https://github.com/orgcandman/pw-ci for all the bot
>> > code).
>> >
>> >> Thank you,
>> >> Honnappa
>> >>
>> >> -----Original Message-----
>> >> From: 0-day Robot <robot@bytheb.org>
>> >> Sent: Monday, January 13, 2020 10:58 PM
>> >> To: test-report@dpdk.org
>> >> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
>> >> robot@bytheb.org
>> >> Subject: || pw64572 lib/eventdev: use custom element size ring for
>> >> event rings
>> >>
>> >> From: robot@bytheb.org
>> >>
>> >> Test-Label: travis-robot
>> >> Test-Status:
>> >> http://dpdk.org/patch/64572
>> >>
>> >> _Travis build: errored_
>> >> Build URL: https://travis-ci.com/ovsrobot/dpdk/builds/144202958


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-14 20:44                 ` Aaron Conole
@ 2020-01-15  0:55                   ` Honnappa Nagarahalli
  2020-01-15  4:43                   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-15  0:55 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, nd

<snip>
> >>
> >> Aaron Conole <aconole@redhat.com> writes:
> >>
> >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> >> >
> >> >> Hi Aaron,
> >> >> I am not able to understand the error, looks like there is no
> >> >> particular error. Can you please take a look?
> >> >
> >> > Gladly.  A number of the systems that were running the build
> >> > stopped their output for an unknown reason (looks like this was a
> >> > 1-time thing).  See the error:
> >> >
> >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> >> test@exe/test_ring_perf.c.o'.
> >> >
> >> >   No output has been received in the last 10m0s, this potentially
> >> >   indicates a stalled build or something wrong with the build itself.
> >>
> >> I see this continually happening (I've kicked it off a number of times).
> >>
> >> This patch might need more investigation, since it's always failing
> >> when building 2164/2165 object.
> > I compiled with clang-7. Compiler seems to hang while compiling
> > test_ring.c
> 
> Cool.  Looks like a good catch, then :)
Good test for Clang CI 😊

<snip>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-14 20:44                 ` Aaron Conole
  2020-01-15  0:55                   ` Honnappa Nagarahalli
@ 2020-01-15  4:43                   ` Honnappa Nagarahalli
  2020-01-15  5:05                     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-15  4:43 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, Honnappa Nagarahalli, nd

<snip>
> >>
> >> Aaron Conole <aconole@redhat.com> writes:
> >>
> >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> >> >
> >> >> Hi Aaron,
> >> >> I am not able to understand the error, looks like there is no
> >> >> particular error. Can you please take a look?
> >> >
> >> > Gladly.  A number of the systems that were running the build
> >> > stopped their output for an unknown reason (looks like this was a
> >> > 1-time thing).  See the error:
> >> >
> >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> >> test@exe/test_ring_perf.c.o'.
> >> >
> >> >   No output has been received in the last 10m0s, this potentially
> >> >   indicates a stalled build or something wrong with the build itself.
> >>
> >> I see this continually happening (I've kicked it off a number of times).
> >>
> >> This patch might need more investigation, since it's always failing
> >> when building 2164/2165 object.
> > I compiled with clang-7. Compiler seems to hang while compiling
> > test_ring.c
> 
> Cool.  Looks like a good catch, then :)
Update:
x86 - compilation succeeds, but take a long time - ~1hr.
On 2 different Arm platforms - compilation succeeds in normal amount of time.
Does anyone have any experience dealing with this kind of issue?

<snip>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-15  4:43                   ` Honnappa Nagarahalli
@ 2020-01-15  5:05                     ` Honnappa Nagarahalli
  2020-01-15 18:22                       ` Aaron Conole
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-15  5:05 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, Honnappa Nagarahalli, nd

<snip>
> > >>
> > >> Aaron Conole <aconole@redhat.com> writes:
> > >>
> > >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> > >> >
> > >> >> Hi Aaron,
> > >> >> I am not able to understand the error, looks like there is no
> > >> >> particular error. Can you please take a look?
> > >> >
> > >> > Gladly.  A number of the systems that were running the build
> > >> > stopped their output for an unknown reason (looks like this was a
> > >> > 1-time thing).  See the error:
> > >> >
> > >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> > >> test@exe/test_ring_perf.c.o'.
> > >> >
> > >> >   No output has been received in the last 10m0s, this potentially
> > >> >   indicates a stalled build or something wrong with the build itself.
> > >>
> > >> I see this continually happening (I've kicked it off a number of times).
> > >>
> > >> This patch might need more investigation, since it's always failing
> > >> when building 2164/2165 object.
> > > I compiled with clang-7. Compiler seems to hang while compiling
> > > test_ring.c
> >
> > Cool.  Looks like a good catch, then :)
> Update:
> x86 - compilation succeeds, but take a long time - ~1hr.
> On 2 different Arm platforms - compilation succeeds in normal amount of
> time.
> Does anyone have any experience dealing with this kind of issue?
> 
I ran this on another x86 server - this patch takes ~8mns. The master (without this patch) takes ~1.02mns.

> <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-15  5:05                     ` Honnappa Nagarahalli
@ 2020-01-15 18:22                       ` Aaron Conole
  2020-01-15 18:38                         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Aaron Conole @ 2020-01-15 18:22 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: Aaron Conole, test-report, ci, dev, nd

Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:

> <snip>
>> > >>
>> > >> Aaron Conole <aconole@redhat.com> writes:
>> > >>
>> > >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>> > >> >
>> > >> >> Hi Aaron,
>> > >> >> I am not able to understand the error, looks like there is no
>> > >> >> particular error. Can you please take a look?
>> > >> >
>> > >> > Gladly.  A number of the systems that were running the build
>> > >> > stopped their output for an unknown reason (looks like this was a
>> > >> > 1-time thing).  See the error:
>> > >> >
>> > >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
>> > >> test@exe/test_ring_perf.c.o'.
>> > >> >
>> > >> >   No output has been received in the last 10m0s, this potentially
>> > >> >   indicates a stalled build or something wrong with the build itself.
>> > >>
>> > >> I see this continually happening (I've kicked it off a number of times).
>> > >>
>> > >> This patch might need more investigation, since it's always failing
>> > >> when building 2164/2165 object.
>> > > I compiled with clang-7. Compiler seems to hang while compiling
>> > > test_ring.c
>> >
>> > Cool.  Looks like a good catch, then :)
>> Update:
>> x86 - compilation succeeds, but take a long time - ~1hr.
>> On 2 different Arm platforms - compilation succeeds in normal amount of
>> time.
>> Does anyone have any experience dealing with this kind of issue?
>> 
> I ran this on another x86 server - this patch takes ~8mns. The master
> (without this patch) takes ~1.02mns.

It doesn't reproduce with clang-8.

>> <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-15 18:22                       ` Aaron Conole
@ 2020-01-15 18:38                         ` Honnappa Nagarahalli
  2020-01-16  5:27                           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-15 18:38 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, Honnappa Nagarahalli, nd

<snip>
> >> > >>
> >> > >> Aaron Conole <aconole@redhat.com> writes:
> >> > >>
> >> > >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> >> > >> >
> >> > >> >> Hi Aaron,
> >> > >> >> I am not able to understand the error, looks like there is no
> >> > >> >> particular error. Can you please take a look?
> >> > >> >
> >> > >> > Gladly.  A number of the systems that were running the build
> >> > >> > stopped their output for an unknown reason (looks like this
> >> > >> > was a 1-time thing).  See the error:
> >> > >> >
> >> > >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> >> > >> test@exe/test_ring_perf.c.o'.
> >> > >> >
> >> > >> >   No output has been received in the last 10m0s, this potentially
> >> > >> >   indicates a stalled build or something wrong with the build itself.
> >> > >>
> >> > >> I see this continually happening (I've kicked it off a number of times).
> >> > >>
> >> > >> This patch might need more investigation, since it's always
> >> > >> failing when building 2164/2165 object.
> >> > > I compiled with clang-7. Compiler seems to hang while compiling
> >> > > test_ring.c
> >> >
> >> > Cool.  Looks like a good catch, then :)
> >> Update:
> >> x86 - compilation succeeds, but take a long time - ~1hr.
> >> On 2 different Arm platforms - compilation succeeds in normal amount
> >> of time.
> >> Does anyone have any experience dealing with this kind of issue?
> >>
> > I ran this on another x86 server - this patch takes ~8mns. The master
> > (without this patch) takes ~1.02mns.
> 
> It doesn't reproduce with clang-8.
Ok, do you want to update the Travis CI and re-run?

> 
> >> <snip>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (13 preceding siblings ...)
  2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-16  5:25   ` Honnappa Nagarahalli
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
                       ` (7 more replies)
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
  15 siblings, 8 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation.

v9
 - Split 'test_ring_burst_bulk_tests' test case into 4 smaller
   functions to address clang compilation time issue.
 - Addressed compilation failure in Intel CI in the hash changes.

v8
 - Changed the 128b copy elements inline function to use 'memcpy'
   to generate unaligned load/store instructions for x86. Generic
   copy function results in performance drop. (Konstantin)
 - Changed the API type #defines to be more clear (Konstantin)
 - Removed the code duplication in performance tests (Konstantin)
 - Fixed memory leak, changed test macros to inline functions (Konstantin)
 - Changed functional tests to test for 20B ring element. Fixed
   a bug in 32b element copy code for enqueue/dequeue(ring size
   needs to be normalized for 32b).
 - Squashed the functional and performance tests in their
   respective single commits.

v7
 - Merged the test cases to test both legacy APIs and
   rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)
 - Performance test cases are merged as well (Konstantin, Olivier)
 - Macros to copy elements are converted into inline functions (Olivier)
 - Added back the changes to hash and event libraries

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)

Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for rte_ring_xxx_elem APIs
  test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  lib/hash: use ring with 32b element size to save memory
  lib/eventdev: use custom element size ring for event rings

 app/test/test_ring.c                 | 1342 +++++++++++++-------------
 app/test/test_ring.h                 |  187 ++++
 app/test/test_ring_perf.c            |  452 +++++----
 lib/librte_eventdev/rte_event_ring.c |  147 +--
 lib/librte_eventdev/rte_event_ring.h |   45 +-
 lib/librte_hash/rte_cuckoo_hash.c    |   94 +-
 lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 13 files changed, 2242 insertions(+), 1081 deletions(-)
 create mode 100644 app/test/test_ring.h
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use division instead of modulo operation to calculate more
accurate cycle count.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test/test_ring_perf.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 70ee46ffe..6c2aca483 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -357,10 +357,10 @@ test_single_enqueue_dequeue(struct rte_ring *r)
 	}
 	const uint64_t mc_end = rte_rdtsc();
 
-	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
-			(sc_end-sc_start) >> iter_shift);
-	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
-			(mc_end-mc_start) >> iter_shift);
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
 }
 
 /*
@@ -395,13 +395,15 @@ test_burst_enqueue_dequeue(struct rte_ring *r)
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
-		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) / bulk_sizes[sz];
-		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) / bulk_sizes[sz];
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
 
-		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				mc_avg);
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-17 16:34       ` Olivier Matz
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 6 files changed, 1045 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 22454b084..917c560ad 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca8a435e9..f2f3ccc88 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,5 +3,9 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..3e15dc398 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,38 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
 {
 	ssize_t sz;
 
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		RTE_LOG(ERR, RING, "element size is not a multiple of 4\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be power of 2, and not exceed %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(sizeof(void *), count);
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +130,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +151,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(esize, count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +198,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, sizeof(void *), count, socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..15d79bf2a
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,1003 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with user defined element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
+			unsigned int count, int socket_id, unsigned int flags);
+
+static __rte_always_inline void
+enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+			ring[idx + 4] = obj[i + 4];
+			ring[idx + 5] = obj[i + 5];
+			ring[idx + 6] = obj[i + 6];
+			ring[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	const uint64_t *obj = (const uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++];
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+/* the actual enqueue of elements on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		enqueue_elems_64(r, prod_head, obj_table, num);
+	else if (esize == 16)
+		enqueue_elems_128(r, prod_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = prod_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+			obj[i + 4] = ring[idx + 4];
+			obj[i + 5] = ring[idx + 5];
+			obj[i + 6] = ring[idx + 6];
+			obj[i + 7] = ring[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	uint64_t *obj = (uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+	}
+}
+
+/* the actual dequeue of elements from the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, uint32_t cons_head, void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		dequeue_elems_64(r, cons_head, obj_table, num);
+	else if (esize == 16)
+		dequeue_elems_128(r, cons_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = cons_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		dequeue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	enqueue_elems(r, prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	dequeue_elems(r, cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 89d84bcf4..7a5328dd5 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,6 +15,8 @@ DPDK_20.0 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-17 17:03       ` Olivier Matz
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add basic infrastructure to test rte_ring_xxx_elem APIs.
Adjust the existing test cases to test for various ring
element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 1342 +++++++++++++++++++++---------------------
 app/test/test_ring.h |  187 ++++++
 2 files changed, 850 insertions(+), 679 deletions(-)
 create mode 100644 app/test/test_ring.h

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index aaf1e70ad..c08500eca 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -23,11 +23,13 @@
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_random.h>
 #include <rte_errno.h>
 #include <rte_hexdump.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -55,8 +57,6 @@
 #define RING_SIZE 4096
 #define MAX_BULK 32
 
-static rte_atomic32_t synchro;
-
 #define	TEST_RING_VERIFY(exp)						\
 	if (!(exp)) {							\
 		printf("error at %s:%d\tcondition " #exp " failed\n",	\
@@ -67,808 +67,792 @@ static rte_atomic32_t synchro;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-/*
- * helper routine for test_ring_basic
- */
-static int
-test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[])
+static int esize[] = {-1, 4, 8, 16, 20};
+
+static void**
+test_ring_inc_ptr(void **obj, int esize, unsigned int n)
 {
-	unsigned i, rand;
-	const unsigned rsz = RING_SIZE - 1;
-
-	printf("Basic full/empty test\n");
-
-	for (i = 0; TEST_RING_FULL_EMTPY_ITER != i; i++) {
-
-		/* random shift in the ring */
-		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
-		printf("%s: iteration %u, random shift: %u;\n",
-		    __func__, i, rand);
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
-				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
-				NULL) == rand);
-
-		/* fill the ring */
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
-		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
-		TEST_RING_VERIFY(rsz == rte_ring_count(r));
-		TEST_RING_VERIFY(rte_ring_full(r));
-		TEST_RING_VERIFY(0 == rte_ring_empty(r));
-
-		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
-				NULL) == rsz);
-		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_full(r));
-		TEST_RING_VERIFY(rte_ring_empty(r));
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return ((void **)obj) + n;
+	else
+		return (void **)(((uint32_t *)obj) +
+					(n * esize / sizeof(uint32_t)));
+}
 
-		/* check data */
-		TEST_RING_VERIFY(0 == memcmp(src, dst, rsz));
-		rte_ring_dump(stdout, r);
-	}
-	return 0;
+static void
+test_ring_mem_init(void *obj, unsigned int count, int esize)
+{
+	unsigned int i;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		for (i = 0; i < count; i++)
+			((void **)obj)[i] = (void *)(unsigned long)i;
+	else
+		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+			((uint32_t *)obj)[i] = i;
 }
 
-static int
-test_ring_basic(struct rte_ring *r)
+static void
+test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
 {
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i, num_elems;
+	printf("\n%s: ", istr);
+
+	if (esize == -1)
+		printf("legacy APIs: ");
+	else
+		printf("elem APIs: element size %dB ", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if (api_type & TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if (api_type & TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if (api_type & TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if (api_type & TEST_RING_ELEM_SINGLE)
+		printf("single\n");
+	else if (api_type & TEST_RING_ELEM_BULK)
+		printf("bulk\n");
+	else if (api_type & TEST_RING_ELEM_BURST)
+		printf("burst\n");
+}
 
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
+/*
+ * Various negative test cases.
+ */
+static int
+test_ring_negative_tests(void)
+{
+	struct rte_ring *rp = NULL;
+	struct rte_ring *rt = NULL;
+	unsigned int i;
 
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret == 0)
-			goto fail;
+	/* Test with esize not a multiple of 4 */
+	rp = test_ring_create("test_bad_element_size", 23,
+				RING_SIZE + 1, SOCKET_ID_ANY, 0);
+	if (rp != NULL) {
+		printf("Test failed to detect invalid element size\n");
+		goto test_fail;
 	}
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
 
-	if (test_ring_basic_full_empty(r, src, dst) != 0)
-		goto fail;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		/* Test if ring size is not power of 2 */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RING_SIZE + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect odd count\n");
+			goto test_fail;
+		}
+
+		/* Test if ring size is exceeding the limit */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RTE_RING_SZ_MASK + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect limits\n");
+			goto test_fail;
+		}
+
+		/* Tests if lookup returns NULL on non-existing ring */
+		rp = rte_ring_lookup("ring_not_found");
+		if (rp != NULL && rte_errno != ENOENT) {
+			printf("Test failed to detect NULL ring lookup\n");
+			goto test_fail;
+		}
+
+		/* Test to if a non-power of 2 count causes the create
+		 * function to fail correctly
+		 */
+		rp = test_ring_create("test_ring_count", esize[i], 4097,
+					SOCKET_ID_ANY, 0);
+		if (rp != NULL)
+			goto test_fail;
+
+		rp = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("test_ring_negative fail to create ring\n");
+			goto test_fail;
+		}
+
+		if (rte_ring_lookup("test_ring_negative") != rp)
+			goto test_fail;
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_nagative ring is not empty but it should be\n");
+			goto test_fail;
+		}
+
+		/* Tests if it would always fail to create ring with an used
+		 * ring name.
+		 */
+		rt = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY, 0);
+		if (rt != NULL)
+			goto test_fail;
+
+		rte_ring_free(rp);
+	}
 
-	cur_src = src;
-	cur_dst = dst;
+	return 0;
 
-	printf("test default bulk enqueue / dequeue\n");
-	num_elems = 16;
+test_fail:
 
-	cur_src = src;
-	cur_dst = dst;
+	rte_ring_free(rp);
+	return -1;
+}
 
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue2\n");
-		goto fail;
-	}
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Random number of elements are enqueued and dequeued.
+ */
+static int
+test_ring_burst_bulk_tests1(unsigned int api_type)
+{
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
+	int rand;
+	const unsigned int rsz = RING_SIZE - 1;
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	cur_src = src;
-	cur_dst = dst;
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-	ret = rte_ring_mp_enqueue(r, cur_src);
-	if (ret != 0)
-		goto fail;
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	ret = rte_ring_mc_dequeue(r, cur_dst);
-	if (ret != 0)
-		goto fail;
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random full/empty test\n");
+
+		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
+			/* random shift in the ring */
+			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+			    __func__, i, rand);
+			ret = test_ring_enqueue(r, cur_src, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret == rand);
+
+			/* fill the ring */
+			ret = test_ring_enqueue(r, cur_src, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
+			TEST_RING_VERIFY(rsz == rte_ring_count(r));
+			TEST_RING_VERIFY(rte_ring_full(r));
+			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
+
+			/* empty the ring */
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret == (int)rsz);
+			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
+			TEST_RING_VERIFY(rte_ring_count(r) == 0);
+			TEST_RING_VERIFY(rte_ring_full(r) == 0);
+			TEST_RING_VERIFY(rte_ring_empty(r));
+
+			/* check data */
+			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
+	}
 
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Sequence of simple enqueues/dequeues and validate the enqueued and
+ * dequeued data.
+ */
 static int
-test_ring_burst_basic(struct rte_ring *r)
+test_ring_burst_bulk_tests2(unsigned int api_type)
 {
+	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
 	int ret;
-	unsigned i;
+	unsigned int i;
 
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("Test SP & SC basic functions \n");
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
-		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
 			goto fail;
-	}
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
-	/* Always one free entry left */
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is full  \n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-	printf("Test enqueue for a full entry  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	if (ret != 0)
-		goto fail;
-
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
 			goto fail;
-	}
+		cur_dst = dst;
 
-	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is empty \n");
-	/* Check if ring is empty */
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test MP & MC basic functions \n");
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
+		printf("enqueue 1 obj\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 1);
 
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("enqueue 2 objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
+
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK);
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		printf("dequeue 1 obj\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 1);
 
-	cur_src = src;
-	cur_dst = dst;
+		printf("dequeue 2 objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
-
-	/* Available memory space for the exact MAX_BULK objects */
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK);
 
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
 			goto fail;
-	}
+		}
 
-	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
 	}
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Covering rte_ring_enqueue_burst functions \n");
-
-	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
 /*
- * it will always fail to create ring with a wrong ring size number in this function
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_creation_with_wrong_size(void)
+test_ring_burst_bulk_tests3(unsigned int api_type)
 {
-	struct rte_ring * rp = NULL;
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
 
-	/* Test if ring size is not power of 2 */
-	rp = rte_ring_create("test_bad_ring_size", RING_SIZE + 1, SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
+
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
+
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("fill and empty the ring\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
 
-	/* Test if ring size is exceeding the limit */
-	rp = rte_ring_create("test_bad_ring_size", (RTE_RING_SZ_MASK + 1), SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
 	}
+
 	return 0;
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
 }
 
 /*
- * it tests if it would always fail to create ring with an used ring name
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_creation_with_an_used_name(void)
+test_ring_burst_bulk_tests4(unsigned int api_type)
 {
-	struct rte_ring * rp;
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
+	unsigned int num_elems;
 
-	rp = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (NULL != rp)
-		return -1;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	return 0;
-}
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-/*
- * Test to if a non-power of 2 count causes the create
- * function to fail correctly
- */
-static int
-test_create_count_odd(void)
-{
-	struct rte_ring *r = rte_ring_create("test_ring_count",
-			4097, SOCKET_ID_ANY, 0 );
-	if(r != NULL){
-		return -1;
-	}
-	return 0;
-}
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-static int
-test_lookup_null(void)
-{
-	struct rte_ring *rlp = rte_ring_lookup("ring_not_found");
-	if (rlp ==NULL)
-	if (rte_errno != ENOENT){
-		printf( "test failed to returnn error on null pointer\n");
-		return -1;
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_ring_enqueue(r, cur_src, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK - 3);
+
+		printf("Test if ring is full\n");
+		if (rte_ring_full(r) != 1)
+			goto fail;
+
+		printf("Test enqueue for a full entry\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
+		if (ret != 0)
+			goto fail;
+
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_ring_dequeue(r, cur_dst, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
+
+		printf("Test if ring is empty\n");
+		/* Check if ring is empty */
+		if (rte_ring_empty(r) != 1)
+			goto fail;
+
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
 	}
+
 	return 0;
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
 }
 
 /*
- * it tests some more basic ring operations
+ * Test default, single element, bulk and burst APIs
  */
 static int
 test_ring_basic_ex(void)
 {
 	int ret = -1;
-	unsigned i;
+	unsigned int i, j;
 	struct rte_ring *rp = NULL;
-	void **obj = NULL;
-
-	obj = rte_calloc("test_ring_basic_ex_malloc", RING_SIZE, sizeof(void *), 0);
-	if (obj == NULL) {
-		printf("test_ring_basic_ex fail to rte_malloc\n");
-		goto fail_test;
-	}
-
-	rp = rte_ring_create("test_ring_basic_ex", RING_SIZE, SOCKET_ID_ANY,
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (rp == NULL) {
-		printf("test_ring_basic_ex fail to create ring\n");
-		goto fail_test;
-	}
-
-	if (rte_ring_lookup("test_ring_basic_ex") != rp) {
-		goto fail_test;
-	}
-
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
-
-	printf("%u ring entries are now free\n", rte_ring_free_count(rp));
+	void *obj = NULL;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		obj = test_ring_calloc(RING_SIZE, esize[i]);
+		if (obj == NULL) {
+			printf("%s: failed to alloc memory\n", __func__);
+			goto fail_test;
+		}
+
+		rp = test_ring_create("test_ring_basic_ex", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("%s: failed to create ring\n", __func__);
+			goto fail_test;
+		}
+
+		if (rte_ring_lookup("test_ring_basic_ex") != rp) {
+			printf("%s: failed to find ring\n", __func__);
+			goto fail_test;
+		}
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("%s: ring is not empty but it should be\n",
+				__func__);
+			goto fail_test;
+		}
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_enqueue(rp, obj[i]);
-	}
+		printf("%u ring entries are now free\n",
+			rte_ring_free_count(rp));
 
-	if (rte_ring_full(rp) != 1) {
-		printf("test_ring_basic_ex ring is not full but it should be\n");
-		goto fail_test;
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_enqueue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_dequeue(rp, &obj[i]);
-	}
+		if (rte_ring_full(rp) != 1) {
+			printf("%s: ring is not full but it should be\n",
+				__func__);
+			goto fail_test;
+		}
 
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_dequeue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
 
-	/* Covering the ring burst operation */
-	ret = rte_ring_enqueue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_enqueue_burst fails \n");
-		goto fail_test;
+		if (rte_ring_empty(rp) != 1) {
+			printf("%s: ring is not empty but it should be\n",
+				__func__);
+			goto fail_test;
+		}
+
+		/* Following tests use the configured flags to decide
+		 * SP/SC or MP/MC.
+		 */
+		/* Covering the ring burst operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("%s: rte_ring_enqueue_burst fails\n", __func__);
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("%s: rte_ring_dequeue_burst fails\n", __func__);
+			goto fail_test;
+		}
+
+		/* Covering the ring bulk operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("%s: rte_ring_enqueue_bulk fails\n", __func__);
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("%s: rte_ring_dequeue_bulk fails\n", __func__);
+			goto fail_test;
+		}
+
+		rte_ring_free(rp);
+		rte_free(obj);
+		rp = NULL;
+		obj = NULL;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
-		goto fail_test;
-	}
+	return 0;
 
-	ret = 0;
 fail_test:
 	rte_ring_free(rp);
 	if (obj != NULL)
 		rte_free(obj);
 
-	return ret;
+	return -1;
 }
 
+/*
+ * Basic test cases with exact size ring.
+ */
 static int
 test_ring_with_exact_size(void)
 {
-	struct rte_ring *std_ring = NULL, *exact_sz_ring = NULL;
-	void *ptr_array[16];
-	static const unsigned int ring_sz = RTE_DIM(ptr_array);
-	unsigned int i;
+	struct rte_ring *std_r = NULL, *exact_sz_r = NULL;
+	void *obj;
+	const unsigned int ring_sz = 16;
+	unsigned int i, j;
 	int ret = -1;
 
-	std_ring = rte_ring_create("std", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (std_ring == NULL) {
-		printf("%s: error, can't create std ring\n", __func__);
-		goto end;
-	}
-	exact_sz_ring = rte_ring_create("exact sz", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
-	if (exact_sz_ring == NULL) {
-		printf("%s: error, can't create exact size ring\n", __func__);
-		goto end;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test exact size ring",
+				TEST_RING_IGNORE_API_TYPE,
+				esize[i]);
+
+		/* alloc object pointers */
+		obj = test_ring_calloc(16, esize[i]);
+		if (obj == NULL)
+			goto test_fail;
+
+		std_r = test_ring_create("std", esize[i], ring_sz,
+					rte_socket_id(),
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (std_r == NULL) {
+			printf("%s: error, can't create std ring\n", __func__);
+			goto test_fail;
+		}
+		exact_sz_r = test_ring_create("exact sz", esize[i], ring_sz,
+				rte_socket_id(),
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (exact_sz_r == NULL) {
+			printf("%s: error, can't create exact size ring\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/*
+		 * Check that the exact size ring is bigger than the
+		 * standard ring
+		 */
+		if (rte_ring_get_size(std_r) >= rte_ring_get_size(exact_sz_r)) {
+			printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
+					__func__,
+					rte_ring_get_size(std_r),
+					rte_ring_get_size(exact_sz_r));
+			goto test_fail;
+		}
+		/*
+		 * check that the exact_sz_ring can hold one more element
+		 * than the standard ring. (16 vs 15 elements)
+		 */
+		for (j = 0; j < ring_sz - 1; j++) {
+			test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+			test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
+		ret = test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret != -ENOBUFS) {
+			printf("%s: error, unexpected successful enqueue\n",
+				__func__);
+			goto test_fail;
+		}
+		ret = test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret == -ENOBUFS) {
+			printf("%s: error, enqueue failed\n", __func__);
+			goto test_fail;
+		}
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_ring_dequeue(exact_sz_r, obj, esize[i], ring_sz,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != (int)ring_sz) {
+			printf("%s: error, failed to dequeue expected nb of elements\n",
+				__func__);
+			goto test_fail;
+		}
 
-	/*
-	 * Check that the exact size ring is bigger than the standard ring
-	 */
-	if (rte_ring_get_size(std_ring) >= rte_ring_get_size(exact_sz_ring)) {
-		printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
-				__func__,
-				rte_ring_get_size(std_ring),
-				rte_ring_get_size(exact_sz_ring));
-		goto end;
-	}
-	/*
-	 * check that the exact_sz_ring can hold one more element than the
-	 * standard ring. (16 vs 15 elements)
-	 */
-	for (i = 0; i < ring_sz - 1; i++) {
-		rte_ring_enqueue(std_ring, NULL);
-		rte_ring_enqueue(exact_sz_ring, NULL);
-	}
-	if (rte_ring_enqueue(std_ring, NULL) != -ENOBUFS) {
-		printf("%s: error, unexpected successful enqueue\n", __func__);
-		goto end;
-	}
-	if (rte_ring_enqueue(exact_sz_ring, NULL) == -ENOBUFS) {
-		printf("%s: error, enqueue failed\n", __func__);
-		goto end;
-	}
+		/* check that the capacity function returns expected value */
+		if (rte_ring_get_capacity(exact_sz_r) != ring_sz) {
+			printf("%s: error, incorrect ring capacity reported\n",
+					__func__);
+			goto test_fail;
+		}
 
-	/* check that dequeue returns the expected number of elements */
-	if (rte_ring_dequeue_burst(exact_sz_ring, ptr_array,
-			RTE_DIM(ptr_array), NULL) != ring_sz) {
-		printf("%s: error, failed to dequeue expected nb of elements\n",
-				__func__);
-		goto end;
+		rte_free(obj);
+		rte_ring_free(std_r);
+		rte_ring_free(exact_sz_r);
 	}
 
-	/* check that the capacity function returns expected value */
-	if (rte_ring_get_capacity(exact_sz_ring) != ring_sz) {
-		printf("%s: error, incorrect ring capacity reported\n",
-				__func__);
-		goto end;
-	}
+	return 0;
 
-	ret = 0; /* all ok if we get here */
-end:
-	rte_ring_free(std_ring);
-	rte_ring_free(exact_sz_ring);
-	return ret;
+test_fail:
+	rte_free(obj);
+	rte_ring_free(std_r);
+	rte_ring_free(exact_sz_r);
+	return -1;
 }
 
 static int
 test_ring(void)
 {
-	struct rte_ring *r = NULL;
-
-	/* some more basic operations */
-	if (test_ring_basic_ex() < 0)
-		goto test_fail;
-
-	rte_atomic32_init(&synchro);
-
-	r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (r == NULL)
-		goto test_fail;
-
-	/* retrieve the ring from its name */
-	if (rte_ring_lookup("test") != r) {
-		printf("Cannot lookup ring from its name\n");
-		goto test_fail;
-	}
-
-	/* burst operations */
-	if (test_ring_burst_basic(r) < 0)
-		goto test_fail;
+	unsigned int i, j;
 
-	/* basic operations */
-	if (test_ring_basic(r) < 0)
+	/* Negative test cases */
+	if (test_ring_negative_tests() < 0)
 		goto test_fail;
 
-	/* basic operations */
-	if ( test_create_count_odd() < 0){
-		printf("Test failed to detect odd count\n");
-		goto test_fail;
-	} else
-		printf("Test detected odd count\n");
-
-	if ( test_lookup_null() < 0){
-		printf("Test failed to detect NULL ring lookup\n");
-		goto test_fail;
-	} else
-		printf("Test detected NULL ring lookup\n");
-
-	/* test of creating ring with wrong size */
-	if (test_ring_creation_with_wrong_size() < 0)
-		goto test_fail;
-
-	/* test of creation ring with an used name */
-	if (test_ring_creation_with_an_used_name() < 0)
+	/* Some basic operations */
+	if (test_ring_basic_ex() < 0)
 		goto test_fail;
 
 	if (test_ring_with_exact_size() < 0)
 		goto test_fail;
 
+	/* Burst and bulk operations with sp/sc, mp/mc and default.
+	 * The test cases are split into smaller test cases to
+	 * help clang compile faster.
+	 */
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests1(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests2(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests3(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests4(i | j) < 0)
+				goto test_fail;
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-	rte_ring_free(r);
-
 	return 0;
 
 test_fail:
-	rte_ring_free(r);
 
 	return -1;
 }
diff --git a/app/test/test_ring.h b/app/test/test_ring.h
new file mode 100644
index 000000000..26716e4f8
--- /dev/null
+++ b/app/test/test_ring.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+/* API type to call
+ * rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>
+ * TEST_RING_THREAD_DEF - Uses configured SPSC/MPMC calls
+ * TEST_RING_THREAD_SPSC - Calls SP or SC API
+ * TEST_RING_THREAD_MPMC - Calls MP or MC API
+ */
+#define TEST_RING_THREAD_DEF 1
+#define TEST_RING_THREAD_SPSC 2
+#define TEST_RING_THREAD_MPMC 4
+
+/* API type to call
+ * SL - Calls single element APIs
+ * BL - Calls bulk APIs
+ * BR - Calls burst APIs
+ */
+#define TEST_RING_ELEM_SINGLE 8
+#define TEST_RING_ELEM_BULK 16
+#define TEST_RING_ELEM_BURST 32
+
+#define TEST_RING_IGNORE_API_TYPE ~0U
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static inline struct rte_ring*
+test_ring_create(const char *name, int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return rte_ring_create((name), (count), (socket_id), (flags));
+	else
+		return rte_ring_create_elem((name), (esize), (count),
+						(socket_id), (flags));
+}
+
+static __rte_always_inline unsigned int
+test_ring_enqueue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+static __rte_always_inline unsigned int
+test_ring_dequeue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static __rte_always_inline void *
+test_ring_calloc(unsigned int rsize, int esize)
+{
+	unsigned int sz;
+	void *p;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		sz = sizeof(void *);
+	else
+		sz = esize;
+
+	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-17 17:12       ` Olivier Matz
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Adjust the performance test cases to test rte_ring_xxx_elem APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 454 +++++++++++++++++++++++---------------
 1 file changed, 273 insertions(+), 181 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 6c2aca483..8d1217951 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -13,6 +13,7 @@
 #include <string.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
@@ -41,6 +42,35 @@ struct lcore_pair {
 
 static volatile unsigned lcore_count = 0;
 
+static void
+test_ring_print_test_string(unsigned int api_type, int esize,
+	unsigned int bsz, double value)
+{
+	if (esize == -1)
+		printf("legacy APIs");
+	else
+		printf("elem APIs: element size %dB", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if ((api_type & TEST_RING_THREAD_DEF) == TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if ((api_type & TEST_RING_THREAD_SPSC) == TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if ((api_type & TEST_RING_THREAD_MPMC) == TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if ((api_type & TEST_RING_ELEM_SINGLE) == TEST_RING_ELEM_SINGLE)
+		printf("single: ");
+	else if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+		printf("bulk (size: %u): ", bsz);
+	else if ((api_type & TEST_RING_ELEM_BURST) == TEST_RING_ELEM_BURST)
+		printf("burst (size: %u): ", bsz);
+
+	printf("%.2F\n", value);
+}
+
 /**** Functions to analyse our core mask to get cores for different tests ***/
 
 static int
@@ -117,27 +147,21 @@ get_two_sockets(struct lcore_pair *lcp)
 
 /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
 static void
-test_empty_dequeue(struct rte_ring *r)
+test_empty_dequeue(struct rte_ring *r, const int esize,
+			const unsigned int api_type)
 {
-	const unsigned iter_shift = 26;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 26;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst[MAX_BURST];
 
-	const uint64_t sc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t sc_end = rte_rdtsc();
+		test_ring_dequeue(r, burst, esize, bulk_sizes[0], api_type);
+	const uint64_t end = rte_rdtsc();
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t mc_end = rte_rdtsc();
-
-	printf("SC empty dequeue: %.2F\n",
-			(double)(sc_end-sc_start) / iterations);
-	printf("MC empty dequeue: %.2F\n",
-			(double)(mc_end-mc_start) / iterations);
+	test_ring_print_test_string(api_type, esize, bulk_sizes[0],
+					((double)(end - start)) / iterations);
 }
 
 /*
@@ -151,19 +175,21 @@ struct thread_params {
 };
 
 /*
- * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
- * thread running dequeue_bulk function
+ * Helper function to call bulk SP/MP enqueue functions.
+ * flag == 0 -> enqueue
+ * flag == 1 -> dequeue
  */
-static int
-enqueue_bulk(void *p)
+static __rte_always_inline int
+enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize,
+	struct thread_params *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
+	int ret;
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	struct rte_ring *r = p->r;
+	unsigned int bsize = p->size;
+	unsigned int i;
+	void *burst = NULL;
 
 #ifdef RTE_USE_C11_MEM_MODEL
 	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
@@ -173,23 +199,67 @@ enqueue_bulk(void *p)
 		while(lcore_count != 2)
 			rte_pause();
 
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
+
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t mp_end = rte_rdtsc();
 
-	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
-	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	p->spsc = ((double)(sp_end - sp_start))/(iterations * bsize);
+	p->mpmc = ((double)(mp_end - mp_start))/(iterations * bsize);
 	return 0;
 }
 
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, -1, params);
+}
+
+static int
+enqueue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, 16, params);
+}
+
 /*
  * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
  * thread running enqueue_bulk function
@@ -197,49 +267,38 @@ enqueue_bulk(void *p)
 static int
 dequeue_bulk(void *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
 	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
-
-#ifdef RTE_USE_C11_MEM_MODEL
-	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
-#else
-	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
-#endif
-		while(lcore_count != 2)
-			rte_pause();
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t sc_end = rte_rdtsc();
+	return enqueue_dequeue_bulk_helper(1, -1, params);
+}
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t mc_end = rte_rdtsc();
+static int
+dequeue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
 
-	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
-	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
-	return 0;
+	return enqueue_dequeue_bulk_helper(1, 16, params);
 }
 
 /*
  * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
  * used to measure ring perf between hyperthreads, cores and sockets.
  */
-static void
-run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
-		lcore_function_t f1, lcore_function_t f2)
+static int
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 {
+	lcore_function_t *f1, *f2;
 	struct thread_params param1 = {0}, param2 = {0};
 	unsigned i;
+
+	if (esize == -1) {
+		f1 = enqueue_bulk;
+		f2 = dequeue_bulk;
+	} else {
+		f1 = enqueue_bulk_16B;
+		f2 = dequeue_bulk_16B;
+	}
+
 	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
 		lcore_count = 0;
 		param1.size = param2.size = bulk_sizes[i];
@@ -251,14 +310,20 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
 		} else {
 			rte_eal_remote_launch(f1, &param1, cores->c1);
 			rte_eal_remote_launch(f2, &param2, cores->c2);
-			rte_eal_wait_lcore(cores->c1);
-			rte_eal_wait_lcore(cores->c2);
+			if (rte_eal_wait_lcore(cores->c1) < 0)
+				return -1;
+			if (rte_eal_wait_lcore(cores->c2) < 0)
+				return -1;
 		}
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.spsc + param2.spsc);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.mpmc + param2.mpmc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.spsc + param2.spsc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.mpmc + param2.mpmc);
 	}
+
+	return 0;
 }
 
 static rte_atomic32_t synchro;
@@ -267,7 +332,7 @@ static uint64_t queue_count[RTE_MAX_LCORE];
 #define TIME_MS 100
 
 static int
-load_loop_fn(void *p)
+load_loop_fn_helper(struct thread_params *p, const int esize)
 {
 	uint64_t time_diff = 0;
 	uint64_t begin = 0;
@@ -275,7 +340,11 @@ load_loop_fn(void *p)
 	uint64_t lcount = 0;
 	const unsigned int lcore = rte_lcore_id();
 	struct thread_params *params = p;
-	void *burst[MAX_BURST] = {0};
+	void *burst = NULL;
+
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
 	/* wait synchro for slaves */
 	if (lcore != rte_get_master_lcore())
@@ -284,22 +353,49 @@ load_loop_fn(void *p)
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
-		rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL);
-		rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL);
+		test_ring_enqueue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
+		test_ring_dequeue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 		lcount++;
 		time_diff = rte_get_timer_cycles() - begin;
 	}
 	queue_count[lcore] = lcount;
+
+	rte_free(burst);
+
 	return 0;
 }
 
 static int
-run_on_all_cores(struct rte_ring *r)
+load_loop_fn(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, -1);
+}
+
+static int
+load_loop_fn_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, 16);
+}
+
+static int
+run_on_all_cores(struct rte_ring *r, const int esize)
 {
 	uint64_t total = 0;
 	struct thread_params param;
+	lcore_function_t *lcore_f;
 	unsigned int i, c;
 
+	if (esize == -1)
+		lcore_f = load_loop_fn;
+	else
+		lcore_f = load_loop_fn_16B;
+
 	memset(&param, 0, sizeof(struct thread_params));
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
 		printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]);
@@ -308,13 +404,12 @@ run_on_all_cores(struct rte_ring *r)
 
 		/* clear synchro and start slaves */
 		rte_atomic32_set(&synchro, 0);
-		if (rte_eal_mp_remote_launch(load_loop_fn, &param,
-			SKIP_MASTER) < 0)
+		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MASTER) < 0)
 			return -1;
 
 		/* start synchro and launch test on master */
 		rte_atomic32_set(&synchro, 1);
-		load_loop_fn(&param);
+		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
 
@@ -335,155 +430,152 @@ run_on_all_cores(struct rte_ring *r)
  * Test function that determines how long an enqueue + dequeue of a single item
  * takes on a single lcore. Result is for comparison with the bulk enq+deq.
  */
-static void
-test_single_enqueue_dequeue(struct rte_ring *r)
+static int
+test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 24;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 24;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst = NULL;
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++) {
-		rte_ring_sp_enqueue(r, burst);
-		rte_ring_sc_dequeue(r, &burst);
-	}
-	const uint64_t sc_end = rte_rdtsc();
+	/* alloc dummy object pointers */
+	burst = test_ring_calloc(1, esize);
+	if (burst == NULL)
+		return -1;
 
-	const uint64_t mc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++) {
-		rte_ring_mp_enqueue(r, burst);
-		rte_ring_mc_dequeue(r, &burst);
+		test_ring_enqueue(r, burst, esize, 1, api_type);
+		test_ring_dequeue(r, burst, esize, 1, api_type);
 	}
-	const uint64_t mc_end = rte_rdtsc();
+	const uint64_t end = rte_rdtsc();
+
+	test_ring_print_test_string(api_type, esize, 1,
+					((double)(end - start)) / iterations);
 
-	printf("SP/SC single enq/dequeue: %.2F\n",
-			((double)(sc_end-sc_start)) / iterations);
-	printf("MP/MC single enq/dequeue: %.2F\n",
-			((double)(mc_end-mc_start)) / iterations);
+	rte_free(burst);
+
+	return 0;
 }
 
 /*
- * Test that does both enqueue and dequeue on a core using the burst() API calls
- * instead of the bulk() calls used in other tests. Results should be the same
- * as for the bulk function called on a single lcore.
+ * Test that does both enqueue and dequeue on a core using the burst/bulk API
+ * calls Results should be the same as for the bulk function called on a
+ * single lcore.
  */
-static void
-test_burst_enqueue_dequeue(struct rte_ring *r)
+static int
+test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int sz, i = 0;
+	void **burst = NULL;
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
-		const uint64_t mc_start = rte_rdtsc();
+	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
+		const uint64_t start = rte_rdtsc();
 		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
+			test_ring_enqueue(r, burst, esize, bulk_sizes[sz],
+						api_type);
+			test_ring_dequeue(r, burst, esize, bulk_sizes[sz],
+						api_type);
 		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
-					bulk_sizes[sz];
-		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
-					bulk_sizes[sz];
+		const uint64_t end = rte_rdtsc();
 
-		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], mc_avg);
+		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
+					((double)(end - start)) / iterations);
 	}
-}
 
-/* Times enqueue and dequeue on a single lcore */
-static void
-test_bulk_enqueue_dequeue(struct rte_ring *r)
-{
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	rte_free(burst);
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
-
-		const uint64_t mc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double sc_avg = ((double)(sc_end-sc_start) /
-				(iterations * bulk_sizes[sz]));
-		double mc_avg = ((double)(mc_end-mc_start) /
-				(iterations * bulk_sizes[sz]));
-
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				mc_avg);
-	}
+	return 0;
 }
 
-static int
-test_ring_perf(void)
+/* Run all tests for a given element size */
+static __rte_always_inline int
+test_ring_perf_esize(const int esize)
 {
 	struct lcore_pair cores;
 	struct rte_ring *r = NULL;
 
-	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
+	/*
+	 * Performance test for legacy/_elem APIs
+	 * SP-SC/MP-MC, single
+	 */
+	r = test_ring_create(RING_NAME, esize, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
 		return -1;
 
-	printf("### Testing single element and burst enq/deq ###\n");
-	test_single_enqueue_dequeue(r);
-	test_burst_enqueue_dequeue(r);
+	printf("\n### Testing single element enq/deq ###\n");
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE) < 0)
+		return -1;
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE) < 0)
+		return -1;
+
+	printf("\n### Testing burst enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST) < 0)
+		return -1;
 
-	printf("\n### Testing empty dequeue ###\n");
-	test_empty_dequeue(r);
+	printf("\n### Testing bulk enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK) < 0)
+		return -1;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK) < 0)
+		return -1;
 
-	printf("\n### Testing using a single lcore ###\n");
-	test_bulk_enqueue_dequeue(r);
+	printf("\n### Testing empty bulk deq ###\n");
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK);
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 	if (get_two_cores(&cores) == 0) {
 		printf("\n### Testing using two physical cores ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 	if (get_two_sockets(&cores) == 0) {
 		printf("\n### Testing using two NUMA nodes ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			return -1;
 	}
 
 	printf("\n### Testing using all slave nodes ###\n");
-	run_on_all_cores(r);
+	if (run_on_all_cores(r, esize) < 0)
+		return -1;
 
 	rte_ring_free(r);
+
+	return 0;
+}
+
+static int
+test_ring_perf(void)
+{
+	/* Run all the tests for different element sizes */
+	if (test_ring_perf_esize(-1) == -1)
+		return -1;
+
+	if (test_ring_perf_esize(16) == -1)
+		return -1;
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-17 20:27       ` David Marchand
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 94 ++++++++++++++++---------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..6c292b6f8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -136,7 +136,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	char ring_name[RTE_RING_NAMESIZE];
 	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
-	unsigned i;
 	unsigned int hw_trans_mem_support = 0, use_local_cache = 0;
 	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
@@ -145,6 +144,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	uint32_t *ext_bkt_to_free = NULL;
 	uint32_t *tbl_chng_cnt = NULL;
 	unsigned int readwrite_concur_lf_support = 0;
+	uint32_t i;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
 
@@ -213,8 +213,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
-			params->socket_id, 0);
+	r = rte_ring_create_elem(ring_name, sizeof(uint32_t),
+			rte_align32pow2(num_key_slots), params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
 		goto err;
@@ -227,7 +227,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_elem(ext_ring_name, sizeof(uint32_t),
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -295,7 +295,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * for next bucket
 		 */
 		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(r_ext, &i, sizeof(uint32_t));
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -434,7 +434,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
 	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(r, &i, sizeof(uint32_t));
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +598,13 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(h->free_slots, &i, sizeof(uint32_t));
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &i,
+							sizeof(uint32_t));
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +623,14 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t));
 }
 
 /* Search a key from bucket and update its data.
@@ -923,9 +924,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
-	uint32_t new_idx, bkt_id;
+	uint32_t slot_id;
+	uint32_t ext_bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
@@ -968,8 +968,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_elem(h->free_slots,
 					cached_free_slots->objs,
+					sizeof(uint32_t),
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
 				return -ENOSPC;
@@ -982,13 +983,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t)) != 0) {
 			return -ENOSPC;
 		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1001,9 +1002,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					short_sig, new_idx, &ret_val);
+					short_sig, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1011,9 +1012,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-				short_sig, prim_bucket_idx, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1021,10 +1022,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-				short_sig, sec_bucket_idx, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, slot_id, &ret_val);
 
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1067,10 +1068,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
-						 new_idx,
+						 slot_id,
 						 __ATOMIC_RELEASE);
 				__hash_rw_writer_unlock(h);
-				return new_idx - 1;
+				return slot_id - 1;
 			}
 		}
 	}
@@ -1078,26 +1079,26 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_elem(h->free_ext_bkts, &ext_bkt_id,
+						sizeof(uint32_t)) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
+	(h->buckets_ext[ext_bkt_id - 1]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
-			 new_idx,
+	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+			 slot_id,
 			 __ATOMIC_RELEASE);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
-	last->next = &h->buckets_ext[bkt_id];
+	last->next = &h->buckets_ext[ext_bkt_id - 1];
 	__hash_rw_writer_unlock(h);
-	return new_idx - 1;
+	return slot_id - 1;
 
 failure:
 	__hash_rw_writer_unlock(h);
@@ -1373,8 +1374,9 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
 				"%s: could not enqueue free slots in global ring\n",
@@ -1383,11 +1385,11 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+							bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_elem(h->free_slots,
+				&bkt->key_idx[i], sizeof(uint32_t));
 	}
 }
 
@@ -1551,7 +1553,8 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1617,8 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1629,19 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_elem(h->free_slots, &key_idx,
+						sizeof(uint32_t));
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2020-01-16  5:25     ` Honnappa Nagarahalli
  2020-01-17 14:41       ` Jerin Jacob
  2020-01-16 16:36     ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
  2020-01-17 17:15     ` Olivier Matz
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:25 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use custom element size ring APIs to replace event ring
implementation. This avoids code duplication.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
---
 lib/librte_eventdev/rte_event_ring.c | 147 ++-------------------------
 lib/librte_eventdev/rte_event_ring.h |  45 ++++----
 2 files changed, 24 insertions(+), 168 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
index 50190de01..d27e23901 100644
--- a/lib/librte_eventdev/rte_event_ring.c
+++ b/lib/librte_eventdev/rte_event_ring.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <sys/queue.h>
@@ -11,13 +12,6 @@
 #include <rte_eal_memconfig.h>
 #include "rte_event_ring.h"
 
-TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
-
-static struct rte_tailq_elem rte_event_ring_tailq = {
-	.name = RTE_TAILQ_EVENT_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_event_ring_tailq)
-
 int
 rte_event_ring_init(struct rte_event_ring *r, const char *name,
 	unsigned int count, unsigned int flags)
@@ -35,150 +29,21 @@ struct rte_event_ring *
 rte_event_ring_create(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_event_ring *r;
-	struct rte_tailq_entry *te;
-	const struct rte_memzone *mz;
-	ssize_t ring_size;
-	int mz_flags = 0;
-	struct rte_event_ring_list *ring_list = NULL;
-	const unsigned int requested_count = count;
-	int ret;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-		rte_event_ring_list);
-
-	/* for an exact size ring, round up from count to a power of two */
-	if (flags & RING_F_EXACT_SZ)
-		count = rte_align32pow2(count + 1);
-	else if (!rte_is_power_of_2(count)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
-
-	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
-		RTE_RING_MZ_PREFIX, name);
-	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-		rte_errno = ENAMETOOLONG;
-		return NULL;
-	}
-
-	te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-	if (te == NULL) {
-		RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	rte_mcfg_tailq_write_lock();
-
-	/*
-	 * reserve a memory zone for this ring. If we can't get rte_config or
-	 * we are secondary process, the memzone_reserve function will set
-	 * rte_errno for us appropriately - hence no check in this this function
-	 */
-	mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
-	if (mz != NULL) {
-		r = mz->addr;
-		/* Check return value in case rte_ring_init() fails on size */
-		int err = rte_event_ring_init(r, name, requested_count, flags);
-		if (err) {
-			RTE_LOG(ERR, RING, "Ring init failed\n");
-			if (rte_memzone_free(mz) != 0)
-				RTE_LOG(ERR, RING, "Cannot free memzone\n");
-			rte_free(te);
-			rte_mcfg_tailq_write_unlock();
-			return NULL;
-		}
-
-		te->data = (void *) r;
-		r->r.memzone = mz;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-	} else {
-		r = NULL;
-		RTE_LOG(ERR, RING, "Cannot reserve memory\n");
-		rte_free(te);
-	}
-	rte_mcfg_tailq_write_unlock();
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_create_elem(name,
+						sizeof(struct rte_event),
+						count, socket_id, flags);
 }
 
 
 struct rte_event_ring *
 rte_event_ring_lookup(const char *name)
 {
-	struct rte_tailq_entry *te;
-	struct rte_event_ring *r = NULL;
-	struct rte_event_ring_list *ring_list;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-
-	rte_mcfg_tailq_read_lock();
-
-	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_event_ring *) te->data;
-		if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
-			break;
-	}
-
-	rte_mcfg_tailq_read_unlock();
-
-	if (te == NULL) {
-		rte_errno = ENOENT;
-		return NULL;
-	}
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_lookup(name);
 }
 
 /* free the ring */
 void
 rte_event_ring_free(struct rte_event_ring *r)
 {
-	struct rte_event_ring_list *ring_list = NULL;
-	struct rte_tailq_entry *te;
-
-	if (r == NULL)
-		return;
-
-	/*
-	 * Ring was not created with rte_event_ring_create,
-	 * therefore, there is no memzone to free.
-	 */
-	if (r->r.memzone == NULL) {
-		RTE_LOG(ERR, RING,
-			"Cannot free ring (not created with rte_event_ring_create()");
-		return;
-	}
-
-	if (rte_memzone_free(r->r.memzone) != 0) {
-		RTE_LOG(ERR, RING, "Cannot free memory\n");
-		return;
-	}
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-	rte_mcfg_tailq_write_lock();
-
-	/* find out tailq entry */
-	TAILQ_FOREACH(te, ring_list, next) {
-		if (te->data == (void *) r)
-			break;
-	}
-
-	if (te == NULL) {
-		rte_mcfg_tailq_write_unlock();
-		return;
-	}
-
-	TAILQ_REMOVE(ring_list, te, next);
-
-	rte_mcfg_tailq_write_unlock();
-
-	rte_free(te);
+	rte_ring_free((struct rte_ring *)r);
 }
diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
index 827a3209e..c0861b0ec 100644
--- a/lib/librte_eventdev/rte_event_ring.h
+++ b/lib/librte_eventdev/rte_event_ring.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2016-2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 /**
@@ -19,6 +20,7 @@
 #include <rte_memory.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
@@ -88,22 +90,17 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
 		const struct rte_event *events,
 		unsigned int n, uint16_t *free_space)
 {
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
+	unsigned int num;
+	uint32_t space;
 
-	n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
+	num = rte_ring_enqueue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&space);
 
-	ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
-
-	update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
-end:
 	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
+		*free_space = space;
+
+	return num;
 }
 
 /**
@@ -129,23 +126,17 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
 		struct rte_event *events,
 		unsigned int n, uint16_t *available)
 {
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
+	unsigned int num;
+	uint32_t remaining;
 
-	DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
+	num = rte_ring_dequeue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&remaining);
 
-	update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
-
-end:
 	if (available != NULL)
-		*available = entries - n;
-	return n;
+		*available = remaining;
+
+	return num;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] FW: || pw64572 lib/eventdev: use custom element size ring for event rings
  2020-01-15 18:38                         ` Honnappa Nagarahalli
@ 2020-01-16  5:27                           ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16  5:27 UTC (permalink / raw)
  To: Aaron Conole; +Cc: test-report, ci, dev, nd, Honnappa Nagarahalli, nd

<snip>
> > >> > >>
> > >> > >> Aaron Conole <aconole@redhat.com> writes:
> > >> > >>
> > >> > >> > Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
> > >> > >> >
> > >> > >> >> Hi Aaron,
> > >> > >> >> I am not able to understand the error, looks like there is
> > >> > >> >> no particular error. Can you please take a look?
> > >> > >> >
> > >> > >> > Gladly.  A number of the systems that were running the build
> > >> > >> > stopped their output for an unknown reason (looks like this
> > >> > >> > was a 1-time thing).  See the error:
> > >> > >> >
> > >> > >> >   [2164/2165] Compiling C object 'app/te...st@@dpdk-
> > >> > >> test@exe/test_ring_perf.c.o'.
> > >> > >> >
> > >> > >> >   No output has been received in the last 10m0s, this potentially
> > >> > >> >   indicates a stalled build or something wrong with the build itself.
> > >> > >>
> > >> > >> I see this continually happening (I've kicked it off a number of times).
> > >> > >>
> > >> > >> This patch might need more investigation, since it's always
> > >> > >> failing when building 2164/2165 object.
> > >> > > I compiled with clang-7. Compiler seems to hang while compiling
> > >> > > test_ring.c
> > >> >
> > >> > Cool.  Looks like a good catch, then :)
> > >> Update:
> > >> x86 - compilation succeeds, but take a long time - ~1hr.
> > >> On 2 different Arm platforms - compilation succeeds in normal
> > >> amount of time.
> > >> Does anyone have any experience dealing with this kind of issue?
> > >>
> > > I ran this on another x86 server - this patch takes ~8mns. The
> > > master (without this patch) takes ~1.02mns.
> >
> > It doesn't reproduce with clang-8.
> Ok, do you want to update the Travis CI and re-run?
I isolated the compilation issue to one of the test cases. It was a large function and I have split it into smaller functions. The compilation time reduces significantly. I have submitted a new version.

> 
> >
> > >> <snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (5 preceding siblings ...)
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
@ 2020-01-16 16:36     ` Honnappa Nagarahalli
  2020-01-17 12:14       ` David Marchand
  2020-01-17 17:15     ` Olivier Matz
  7 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-16 16:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj,
	bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang
  Cc: dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd,
	Honnappa Nagarahalli, nd

I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?

Thanks,
Honnappa

> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Sent: Wednesday, January 15, 2020 11:25 PM
> To: olivier.matz@6wind.com; sthemmin@microsoft.com; jerinj@marvell.com;
> bruce.richardson@intel.com; david.marchand@redhat.com;
> pbhagavatula@marvell.com; konstantin.ananyev@intel.com;
> yipeng1.wang@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Ruifeng
> Wang <Ruifeng.Wang@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; nd
> <nd@arm.com>
> Subject: [PATCH v9 0/6] lib/ring: APIs to support custom element size
> 
> The current rte_ring hard-codes the type of the ring element to 'void *', hence
> the size of the element is hard-coded to 32b/64b. Since the ring element type
> is not an input to rte_ring APIs, it results in couple of issues:
> 
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
> 
> This patch adds new APIs to support configurable ring element size.
> The APIs support custom element sizes by allowing to define the ring element
> to be a multiple of 32b.
> 
> The aim is to achieve same performance as the existing ring implementation.
> 
> v9
>  - Split 'test_ring_burst_bulk_tests' test case into 4 smaller
>    functions to address clang compilation time issue.
>  - Addressed compilation failure in Intel CI in the hash changes.
> 
> v8
>  - Changed the 128b copy elements inline function to use 'memcpy'
>    to generate unaligned load/store instructions for x86. Generic
>    copy function results in performance drop. (Konstantin)
>  - Changed the API type #defines to be more clear (Konstantin)
>  - Removed the code duplication in performance tests (Konstantin)
>  - Fixed memory leak, changed test macros to inline functions (Konstantin)
>  - Changed functional tests to test for 20B ring element. Fixed
>    a bug in 32b element copy code for enqueue/dequeue(ring size
>    needs to be normalized for 32b).
>  - Squashed the functional and performance tests in their
>    respective single commits.
> 
> v7
>  - Merged the test cases to test both legacy APIs and
>    rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)
>  - Performance test cases are merged as well (Konstantin, Olivier)
>  - Macros to copy elements are converted into inline functions (Olivier)
>  - Added back the changes to hash and event libraries
> 
> v6
>  - Labelled as RFC to indicate the better status
>  - Added unit tests to test the rte_ring_xxx_elem APIs
>  - Corrected 'macro based partial memcpy' (5/6) patch
>  - Added Konstantin's method after correction (6/6)
>  - Check Patch shows significant warnings and errors mainly due
>    copying code from existing test cases. None of them are harmful.
>    I will fix them once we have an agreement.
> 
> v5
>  - Use memcpy for chunks of 32B (Konstantin).
>  - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
>    to compare the results easily.
>  - Copying without memcpy is also available in 1/3, if anyone wants to
>    experiment on their platform.
>  - Added other platform owners to test on their respective platforms.
> 
> v4
>  - Few fixes after more performance testing
> 
> v3
>  - Removed macro-fest and used inline functions
>    (Stephen, Bruce)
> 
> v2
>  - Change Event Ring implementation to use ring templates
>    (Jerin, Pavan)
> 
> Honnappa Nagarahalli (6):
>   test/ring: use division for cycle count calculation
>   lib/ring: apis to support configurable element size
>   test/ring: add functional tests for rte_ring_xxx_elem APIs
>   test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
>   lib/hash: use ring with 32b element size to save memory
>   lib/eventdev: use custom element size ring for event rings
> 
>  app/test/test_ring.c                 | 1342 +++++++++++++-------------
>  app/test/test_ring.h                 |  187 ++++
>  app/test/test_ring_perf.c            |  452 +++++----
>  lib/librte_eventdev/rte_event_ring.c |  147 +--
>  lib/librte_eventdev/rte_event_ring.h |   45 +-
>  lib/librte_hash/rte_cuckoo_hash.c    |   94 +-
>  lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
>  lib/librte_ring/Makefile             |    3 +-
>  lib/librte_ring/meson.build          |    4 +
>  lib/librte_ring/rte_ring.c           |   41 +-
>  lib/librte_ring/rte_ring.h           |    1 +
>  lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |    2 +
>  13 files changed, 2242 insertions(+), 1081 deletions(-)  create mode 100644
> app/test/test_ring.h  create mode 100644 lib/librte_ring/rte_ring_elem.h
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-16 16:36     ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-17 12:14       ` David Marchand
  2020-01-17 13:34         ` Jerin Jacob
  2020-01-17 14:28         ` Honnappa Nagarahalli
  0 siblings, 2 replies; 173+ messages in thread
From: David Marchand @ 2020-01-17 12:14 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Aaron Conole

On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?

- Pushed the series
https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch
of mine for checks.
Travis reports:
"
[2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.

No output has been received in the last 10m0s, this potentially
indicates a stalled build or something wrong with the build itself.

Check the details on how to adjust your build configuration on:
https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

The build has been terminated
"

I see you discussed this already with Aaron, did I miss something?


- Besides, I have no ack/review from the hash and eventdev maintainers.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 12:14       ` David Marchand
@ 2020-01-17 13:34         ` Jerin Jacob
  2020-01-17 16:37           ` Mattias Rönnblom
  2020-01-17 14:28         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 173+ messages in thread
From: Jerin Jacob @ 2020-01-17 13:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj,
	bruce.richardson, pbhagavatula, konstantin.ananyev, yipeng1.wang,
	dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, Aaron Conole,
	Van Haaren, Harry, Mattias Rönnblom

On Fri, Jan 17, 2020 at 5:44 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?
>
> - Pushed the series
> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch
> of mine for checks.
> Travis reports:
> "
> [2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.
>
> No output has been received in the last 10m0s, this potentially
> indicates a stalled build or something wrong with the build itself.
>
> Check the details on how to adjust your build configuration on:
> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
>
> The build has been terminated
> "
>
> I see you discussed this already with Aaron, did I miss something?
>
>
> - Besides, I have no ack/review from the hash and eventdev maintainers.

+ Bruce, Harry, Mattias

Even though event ring added in the eventdev common code, it's been
used only by SW eventdev drivers.
So adding evendev ring author and SW driver maintainers for review.

>
>
> --
> David Marchand
>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 12:14       ` David Marchand
  2020-01-17 13:34         ` Jerin Jacob
@ 2020-01-17 14:28         ` Honnappa Nagarahalli
  2020-01-17 14:36           ` Honnappa Nagarahalli
  2020-01-17 16:15           ` David Marchand
  1 sibling, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-17 14:28 UTC (permalink / raw)
  To: David Marchand
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Aaron Conole, Honnappa Nagarahalli,
	nd

<snip>

> 
> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > I see that the none of the CIs (except Travis) have run on this patch. Intel CI
> has reported a compilation error and I fixed it in this version. Does anyone
> know if/when the CI will run on the patches?
> 
> - Pushed the series
> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch of
> mine for checks.
> Travis reports:
> "
> [2155/2156] Compiling C object 'app/te...st@@dpdk-
> test@exe/test_ring_perf.c.o'.
> 
> No output has been received in the last 10m0s, this potentially indicates a
> stalled build or something wrong with the build itself.
> 
> Check the details on how to adjust your build configuration on:
> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-
> because-no-output-was-received
> 
> The build has been terminated
> "
> 
> I see you discussed this already with Aaron, did I miss something?
Aaron has tested it with clang-8 and said it does not show the issue. I am not sure if he tested this on Travis CI.
I have tested it on clang-9 and it does not show any issues. I have modified the test cases to take much lesser time for Clang-7.

Aaron, please let me know if you want to upgrade Travis CI?
> 
> 
> - Besides, I have no ack/review from the hash and eventdev maintainers.
Yipeng, Jerin can you please review your parts?

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 14:28         ` Honnappa Nagarahalli
@ 2020-01-17 14:36           ` Honnappa Nagarahalli
  2020-01-17 16:15           ` David Marchand
  1 sibling, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-17 14:36 UTC (permalink / raw)
  To: David Marchand
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Aaron Conole, Honnappa Nagarahalli,
	nd

<snip>
> 
> >
> > On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com> wrote:
> > >
> > > I see that the none of the CIs (except Travis) have run on this
> > > patch. Intel CI
> > has reported a compilation error and I fixed it in this version. Does
> > anyone know if/when the CI will run on the patches?
> >
> > - Pushed the series
> > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch
> > of mine for checks.
> > Travis reports:
> > "
> > [2155/2156] Compiling C object 'app/te...st@@dpdk-
> > test@exe/test_ring_perf.c.o'.
> >
> > No output has been received in the last 10m0s, this potentially
> > indicates a stalled build or something wrong with the build itself.
> >
> > Check the details on how to adjust your build configuration on:
> > https://docs.travis-ci.com/user/common-build-problems/#build-times-out
> > -
> > because-no-output-was-received
> >
> > The build has been terminated
> > "
> >
> > I see you discussed this already with Aaron, did I miss something?
> Aaron has tested it with clang-8 and said it does not show the issue. I am not
> sure if he tested this on Travis CI.
> I have tested it on clang-9 and it does not show any issues. I have modified
> the test cases to take much lesser time for Clang-7.
> 
> Aaron, please let me know if you want to upgrade Travis CI?
BTW, intel compilation CI is passing now with v9 including clang.

> >
> >
> > - Besides, I have no ack/review from the hash and eventdev maintainers.
> Yipeng, Jerin can you please review your parts?
> 
> >
> >
> > --
> > David Marchand
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
@ 2020-01-17 14:41       ` Jerin Jacob
  2020-01-17 16:12         ` David Marchand
  0 siblings, 1 reply; 173+ messages in thread
From: Jerin Jacob @ 2020-01-17 14:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, Jerin Jacob, Richardson, Bruce,
	David Marchand, Pavan Nikhilesh, Ananyev, Konstantin,
	Yipeng Wang, dpdk-dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu, nd, Van Haaren, Harry

On Thu, Jan 16, 2020 at 10:56 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> Use custom element size ring APIs to replace event ring
> implementation. This avoids code duplication.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>

Please change the subject to eventdev: xxxxxx.

With the above change, LGTM

Reviewed-by: Jerin Jacob <jerinj@marvell.com>

> ---
>  lib/librte_eventdev/rte_event_ring.c | 147 ++-------------------------
>  lib/librte_eventdev/rte_event_ring.h |  45 ++++----
>  2 files changed, 24 insertions(+), 168 deletions(-)
>
> diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
> index 50190de01..d27e23901 100644
> --- a/lib/librte_eventdev/rte_event_ring.c
> +++ b/lib/librte_eventdev/rte_event_ring.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
>
>  #include <sys/queue.h>
> @@ -11,13 +12,6 @@
>  #include <rte_eal_memconfig.h>
>  #include "rte_event_ring.h"
>
> -TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
> -
> -static struct rte_tailq_elem rte_event_ring_tailq = {
> -       .name = RTE_TAILQ_EVENT_RING_NAME,
> -};
> -EAL_REGISTER_TAILQ(rte_event_ring_tailq)
> -
>  int
>  rte_event_ring_init(struct rte_event_ring *r, const char *name,
>         unsigned int count, unsigned int flags)
> @@ -35,150 +29,21 @@ struct rte_event_ring *
>  rte_event_ring_create(const char *name, unsigned int count, int socket_id,
>                 unsigned int flags)
>  {
> -       char mz_name[RTE_MEMZONE_NAMESIZE];
> -       struct rte_event_ring *r;
> -       struct rte_tailq_entry *te;
> -       const struct rte_memzone *mz;
> -       ssize_t ring_size;
> -       int mz_flags = 0;
> -       struct rte_event_ring_list *ring_list = NULL;
> -       const unsigned int requested_count = count;
> -       int ret;
> -
> -       ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
> -               rte_event_ring_list);
> -
> -       /* for an exact size ring, round up from count to a power of two */
> -       if (flags & RING_F_EXACT_SZ)
> -               count = rte_align32pow2(count + 1);
> -       else if (!rte_is_power_of_2(count)) {
> -               rte_errno = EINVAL;
> -               return NULL;
> -       }
> -
> -       ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
> -
> -       ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
> -               RTE_RING_MZ_PREFIX, name);
> -       if (ret < 0 || ret >= (int)sizeof(mz_name)) {
> -               rte_errno = ENAMETOOLONG;
> -               return NULL;
> -       }
> -
> -       te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
> -       if (te == NULL) {
> -               RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
> -               rte_errno = ENOMEM;
> -               return NULL;
> -       }
> -
> -       rte_mcfg_tailq_write_lock();
> -
> -       /*
> -        * reserve a memory zone for this ring. If we can't get rte_config or
> -        * we are secondary process, the memzone_reserve function will set
> -        * rte_errno for us appropriately - hence no check in this this function
> -        */
> -       mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
> -       if (mz != NULL) {
> -               r = mz->addr;
> -               /* Check return value in case rte_ring_init() fails on size */
> -               int err = rte_event_ring_init(r, name, requested_count, flags);
> -               if (err) {
> -                       RTE_LOG(ERR, RING, "Ring init failed\n");
> -                       if (rte_memzone_free(mz) != 0)
> -                               RTE_LOG(ERR, RING, "Cannot free memzone\n");
> -                       rte_free(te);
> -                       rte_mcfg_tailq_write_unlock();
> -                       return NULL;
> -               }
> -
> -               te->data = (void *) r;
> -               r->r.memzone = mz;
> -
> -               TAILQ_INSERT_TAIL(ring_list, te, next);
> -       } else {
> -               r = NULL;
> -               RTE_LOG(ERR, RING, "Cannot reserve memory\n");
> -               rte_free(te);
> -       }
> -       rte_mcfg_tailq_write_unlock();
> -
> -       return r;
> +       return (struct rte_event_ring *)rte_ring_create_elem(name,
> +                                               sizeof(struct rte_event),
> +                                               count, socket_id, flags);
>  }
>
>
>  struct rte_event_ring *
>  rte_event_ring_lookup(const char *name)
>  {
> -       struct rte_tailq_entry *te;
> -       struct rte_event_ring *r = NULL;
> -       struct rte_event_ring_list *ring_list;
> -
> -       ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
> -                       rte_event_ring_list);
> -
> -       rte_mcfg_tailq_read_lock();
> -
> -       TAILQ_FOREACH(te, ring_list, next) {
> -               r = (struct rte_event_ring *) te->data;
> -               if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
> -                       break;
> -       }
> -
> -       rte_mcfg_tailq_read_unlock();
> -
> -       if (te == NULL) {
> -               rte_errno = ENOENT;
> -               return NULL;
> -       }
> -
> -       return r;
> +       return (struct rte_event_ring *)rte_ring_lookup(name);
>  }
>
>  /* free the ring */
>  void
>  rte_event_ring_free(struct rte_event_ring *r)
>  {
> -       struct rte_event_ring_list *ring_list = NULL;
> -       struct rte_tailq_entry *te;
> -
> -       if (r == NULL)
> -               return;
> -
> -       /*
> -        * Ring was not created with rte_event_ring_create,
> -        * therefore, there is no memzone to free.
> -        */
> -       if (r->r.memzone == NULL) {
> -               RTE_LOG(ERR, RING,
> -                       "Cannot free ring (not created with rte_event_ring_create()");
> -               return;
> -       }
> -
> -       if (rte_memzone_free(r->r.memzone) != 0) {
> -               RTE_LOG(ERR, RING, "Cannot free memory\n");
> -               return;
> -       }
> -
> -       ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
> -                       rte_event_ring_list);
> -       rte_mcfg_tailq_write_lock();
> -
> -       /* find out tailq entry */
> -       TAILQ_FOREACH(te, ring_list, next) {
> -               if (te->data == (void *) r)
> -                       break;
> -       }
> -
> -       if (te == NULL) {
> -               rte_mcfg_tailq_write_unlock();
> -               return;
> -       }
> -
> -       TAILQ_REMOVE(ring_list, te, next);
> -
> -       rte_mcfg_tailq_write_unlock();
> -
> -       rte_free(te);
> +       rte_ring_free((struct rte_ring *)r);
>  }
> diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
> index 827a3209e..c0861b0ec 100644
> --- a/lib/librte_eventdev/rte_event_ring.h
> +++ b/lib/librte_eventdev/rte_event_ring.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2016-2017 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
>
>  /**
> @@ -19,6 +20,7 @@
>  #include <rte_memory.h>
>  #include <rte_malloc.h>
>  #include <rte_ring.h>
> +#include <rte_ring_elem.h>
>  #include "rte_eventdev.h"
>
>  #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
> @@ -88,22 +90,17 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
>                 const struct rte_event *events,
>                 unsigned int n, uint16_t *free_space)
>  {
> -       uint32_t prod_head, prod_next;
> -       uint32_t free_entries;
> +       unsigned int num;
> +       uint32_t space;
>
> -       n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
> -                       RTE_RING_QUEUE_VARIABLE,
> -                       &prod_head, &prod_next, &free_entries);
> -       if (n == 0)
> -               goto end;
> +       num = rte_ring_enqueue_burst_elem(&r->r, events,
> +                               sizeof(struct rte_event), n,
> +                               &space);
>
> -       ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
> -
> -       update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
> -end:
>         if (free_space != NULL)
> -               *free_space = free_entries - n;
> -       return n;
> +               *free_space = space;
> +
> +       return num;
>  }
>
>  /**
> @@ -129,23 +126,17 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
>                 struct rte_event *events,
>                 unsigned int n, uint16_t *available)
>  {
> -       uint32_t cons_head, cons_next;
> -       uint32_t entries;
> -
> -       n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
> -                       RTE_RING_QUEUE_VARIABLE,
> -                       &cons_head, &cons_next, &entries);
> -       if (n == 0)
> -               goto end;
> +       unsigned int num;
> +       uint32_t remaining;
>
> -       DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
> +       num = rte_ring_dequeue_burst_elem(&r->r, events,
> +                               sizeof(struct rte_event), n,
> +                               &remaining);
>
> -       update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
> -
> -end:
>         if (available != NULL)
> -               *available = entries - n;
> -       return n;
> +               *available = remaining;
> +
> +       return num;
>  }
>
>  /*
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings
  2020-01-17 14:41       ` Jerin Jacob
@ 2020-01-17 16:12         ` David Marchand
  0 siblings, 0 replies; 173+ messages in thread
From: David Marchand @ 2020-01-17 16:12 UTC (permalink / raw)
  To: Jerin Jacob, Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, Jerin Jacob, Richardson, Bruce,
	Pavan Nikhilesh, Ananyev, Konstantin, Yipeng Wang, dpdk-dev,
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Gavin Hu, nd, Van Haaren, Harry

On Fri, Jan 17, 2020 at 3:42 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 10:56 AM Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com> wrote:
> >
> > Use custom element size ring APIs to replace event ring
> > implementation. This avoids code duplication.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>
> Please change the subject to eventdev: xxxxxx.

I can do while applying.

>
> With the above change, LGTM
>
> Reviewed-by: Jerin Jacob <jerinj@marvell.com>

Thanks Jerin.


--
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 14:28         ` Honnappa Nagarahalli
  2020-01-17 14:36           ` Honnappa Nagarahalli
@ 2020-01-17 16:15           ` David Marchand
  2020-01-17 16:32             ` Honnappa Nagarahalli
  1 sibling, 1 reply; 173+ messages in thread
From: David Marchand @ 2020-01-17 16:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Aaron Conole

On Fri, Jan 17, 2020 at 3:29 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > - Pushed the series
> > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch of
> > mine for checks.
> > Travis reports:
> > "
> > [2155/2156] Compiling C object 'app/te...st@@dpdk-
> > test@exe/test_ring_perf.c.o'.
> >
> > No output has been received in the last 10m0s, this potentially indicates a
> > stalled build or something wrong with the build itself.
> >
> > Check the details on how to adjust your build configuration on:
> > https://docs.travis-ci.com/user/common-build-problems/#build-times-out-
> > because-no-output-was-received
> >
> > The build has been terminated
> > "
> >
> > I see you discussed this already with Aaron, did I miss something?
> Aaron has tested it with clang-8 and said it does not show the issue. I am not sure if he tested this on Travis CI.
> I have tested it on clang-9 and it does not show any issues. I have modified the test cases to take much lesser time for Clang-7.

The problem is seen with clang 7 in Ubuntu 16.04.
https://travis-ci.com/david-marchand/dpdk/jobs/276790838

>
> Aaron, please let me know if you want to upgrade Travis CI?

Do you mean upgrading Travis to hide this issue?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 16:15           ` David Marchand
@ 2020-01-17 16:32             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-17 16:32 UTC (permalink / raw)
  To: David Marchand
  Cc: olivier.matz, sthemmin, jerinj, bruce.richardson, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Aaron Conole, Honnappa Nagarahalli,
	nd

<snip>

> On Fri, Jan 17, 2020 at 3:29 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > - Pushed the series
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a
> > > branch of mine for checks.
> > > Travis reports:
> > > "
> > > [2155/2156] Compiling C object 'app/te...st@@dpdk-
> > > test@exe/test_ring_perf.c.o'.
> > >
> > > No output has been received in the last 10m0s, this potentially
> > > indicates a stalled build or something wrong with the build itself.
> > >
> > > Check the details on how to adjust your build configuration on:
> > > https://docs.travis-ci.com/user/common-build-problems/#build-times-o
> > > ut-
> > > because-no-output-was-received
> > >
> > > The build has been terminated
> > > "
> > >
> > > I see you discussed this already with Aaron, did I miss something?
> > Aaron has tested it with clang-8 and said it does not show the issue. I am not
> sure if he tested this on Travis CI.
> > I have tested it on clang-9 and it does not show any issues. I have modified
> the test cases to take much lesser time for Clang-7.
> 
> The problem is seen with clang 7 in Ubuntu 16.04.
> https://travis-ci.com/david-marchand/dpdk/jobs/276790838
Agree. I have split the test cases into multiple functions to address clang 7 compile time. But it does not suffice for Travis CI.

> 
> >
> > Aaron, please let me know if you want to upgrade Travis CI?
> 
> Do you mean upgrading Travis to hide this issue?
Yes, upgrading to use clang-8 or clang-9.

> 
> 
> --
> David Marchand

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2020-01-17 16:34       ` Olivier Matz
  2020-01-17 16:45         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Olivier Matz @ 2020-01-17 16:34 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu, nd

Hi Honnappa,

On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote:
> Current APIs assume ring elements to be pointers. However, in many
> use cases, the size can be different. Add new APIs to support
> configurable ring element sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/Makefile             |    3 +-
>  lib/librte_ring/meson.build          |    4 +
>  lib/librte_ring/rte_ring.c           |   41 +-
>  lib/librte_ring/rte_ring.h           |    1 +
>  lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_version.map |    2 +
>  6 files changed, 1045 insertions(+), 9 deletions(-)
>  create mode 100644 lib/librte_ring/rte_ring_elem.h
> 

[...]

> +static __rte_always_inline void
> +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	uint32_t *ring = (uint32_t *)&r[1];
> +	const uint32_t *obj = (const uint32_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> +			ring[idx] = obj[i];
> +			ring[idx + 1] = obj[i + 1];
> +			ring[idx + 2] = obj[i + 2];
> +			ring[idx + 3] = obj[i + 3];
> +			ring[idx + 4] = obj[i + 4];
> +			ring[idx + 5] = obj[i + 5];
> +			ring[idx + 6] = obj[i + 6];
> +			ring[idx + 7] = obj[i + 7];
> +		}
> +		switch (n & 0x7) {
> +		case 7:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 6:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 5:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 4:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 3:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			ring[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			ring[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	const uint32_t size = r->size;
> +	uint32_t idx = prod_head & r->mask;
> +	uint64_t *ring = (uint64_t *)&r[1];
> +	const uint64_t *obj = (const uint64_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> +			ring[idx] = obj[i];
> +			ring[idx + 1] = obj[i + 1];
> +			ring[idx + 2] = obj[i + 2];
> +			ring[idx + 3] = obj[i + 3];
> +		}
> +		switch (n & 0x3) {
> +		case 3:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 2:
> +			ring[idx++] = obj[i++]; /* fallthrough */
> +		case 1:
> +			ring[idx++] = obj[i++];
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			ring[idx] = obj[i];
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			ring[idx] = obj[i];
> +	}
> +}
> +
> +static __rte_always_inline void
> +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> +		const void *obj_table, uint32_t n)
> +{
> +	unsigned int i;
> +	const uint32_t size = r->size;
> +	uint32_t idx = prod_head & r->mask;
> +	rte_int128_t *ring = (rte_int128_t *)&r[1];
> +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> +	if (likely(idx + n < size)) {
> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> +			memcpy((void *)(ring + idx),
> +				(const void *)(obj + i), 32);
> +		switch (n & 0x1) {
> +		case 1:
> +			memcpy((void *)(ring + idx),
> +				(const void *)(obj + i), 16);
> +		}
> +	} else {
> +		for (i = 0; idx < size; i++, idx++)
> +			memcpy((void *)(ring + idx),
> +				(const void *)(obj + i), 16);
> +		/* Start at the beginning */
> +		for (idx = 0; i < n; i++, idx++)
> +			memcpy((void *)(ring + idx),
> +				(const void *)(obj + i), 16);
> +	}
> +}
> +
> +/* the actual enqueue of elements on the ring.
> + * Placed here since identical code needed in both
> + * single and multi producer enqueue functions.
> + */
> +static __rte_always_inline void
> +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
> +		uint32_t esize, uint32_t num)
> +{
> +	/* 8B and 16B copies implemented individually to retain
> +	 * the current performance.
> +	 */
> +	if (esize == 8)
> +		enqueue_elems_64(r, prod_head, obj_table, num);
> +	else if (esize == 16)
> +		enqueue_elems_128(r, prod_head, obj_table, num);
> +	else {
> +		uint32_t idx, scale, nr_idx, nr_num, nr_size;
> +
> +		/* Normalize to uint32_t */
> +		scale = esize / sizeof(uint32_t);
> +		nr_num = num * scale;
> +		idx = prod_head & r->mask;
> +		nr_idx = idx * scale;
> +		nr_size = r->size * scale;
> +		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
> +	}
> +}

Following Konstatin's comment on v7, enqueue_elems_128() was modified to
ensure it won't crash if the object is unaligned. Are we sure that this
same problem cannot also occurs with 64b copies on all supported
architectures? (I mean 64b access that is only aligned on 32b)

Out of curiosity, would it make a big perf difference to only use
enqueue_elems_32()?

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-17 13:34         ` Jerin Jacob
@ 2020-01-17 16:37           ` Mattias Rönnblom
  0 siblings, 0 replies; 173+ messages in thread
From: Mattias Rönnblom @ 2020-01-17 16:37 UTC (permalink / raw)
  To: Jerin Jacob, David Marchand
  Cc: Honnappa Nagarahalli, olivier.matz, sthemmin, jerinj,
	bruce.richardson, pbhagavatula, konstantin.ananyev, yipeng1.wang,
	dev, Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, Aaron Conole,
	Van Haaren, Harry

On 2020-01-17 14:34, Jerin Jacob wrote:
> On Fri, Jan 17, 2020 at 5:44 PM David Marchand
> <david.marchand@redhat.com> wrote:
>> On Thu, Jan 16, 2020 at 5:36 PM Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com> wrote:
>>> I see that the none of the CIs (except Travis) have run on this patch. Intel CI has reported a compilation error and I fixed it in this version. Does anyone know if/when the CI will run on the patches?
>> - Pushed the series
>> https://patchwork.dpdk.org/project/dpdk/list/?series=8147 to a branch
>> of mine for checks.
>> Travis reports:
>> "
>> [2155/2156] Compiling C object 'app/te...st@@dpdk-test@exe/test_ring_perf.c.o'.
>>
>> No output has been received in the last 10m0s, this potentially
>> indicates a stalled build or something wrong with the build itself.
>>
>> Check the details on how to adjust your build configuration on:
>> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
>>
>> The build has been terminated
>> "
>>
>> I see you discussed this already with Aaron, did I miss something?
>>
>>
>> - Besides, I have no ack/review from the hash and eventdev maintainers.
> + Bruce, Harry, Mattias
>
> Even though event ring added in the eventdev common code, it's been
> used only by SW eventdev drivers.
> So adding evendev ring author and SW driver maintainers for review.

I endorse this change, although I hadn't have time to review the code.

DSW throughput increases ~5% on x86_64 after applying this patchset, so 
the goal of maintaining legacy performance seems to have been met (and 
exceeded).

I will look into using these new custom-sized rings for the DSW control 
rings (which uses regular DPDK void-pointer rings today).



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-17 16:34       ` Olivier Matz
@ 2020-01-17 16:45         ` Honnappa Nagarahalli
  2020-01-17 18:10           ` David Christensen
  2020-01-18 12:32           ` Ananyev, Konstantin
  0 siblings, 2 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-17 16:45 UTC (permalink / raw)
  To: Olivier Matz
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Honnappa Nagarahalli,
	David Christensen, nd

<snip>

> 
> Hi Honnappa,
Thanks Olivier for your review, appreciate your feedback.

> 
> On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote:
> > Current APIs assume ring elements to be pointers. However, in many use
> > cases, the size can be different. Add new APIs to support configurable
> > ring element sizes.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/Makefile             |    3 +-
> >  lib/librte_ring/meson.build          |    4 +
> >  lib/librte_ring/rte_ring.c           |   41 +-
> >  lib/librte_ring/rte_ring.h           |    1 +
> >  lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
> >  lib/librte_ring/rte_ring_version.map |    2 +
> >  6 files changed, 1045 insertions(+), 9 deletions(-)  create mode
> > 100644 lib/librte_ring/rte_ring_elem.h
> >
> 
> [...]
> 
> > +static __rte_always_inline void
> > +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
> > +		const void *obj_table, uint32_t n)
> > +{
> > +	unsigned int i;
> > +	uint32_t *ring = (uint32_t *)&r[1];
> > +	const uint32_t *obj = (const uint32_t *)obj_table;
> > +	if (likely(idx + n < size)) {
> > +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> > +			ring[idx] = obj[i];
> > +			ring[idx + 1] = obj[i + 1];
> > +			ring[idx + 2] = obj[i + 2];
> > +			ring[idx + 3] = obj[i + 3];
> > +			ring[idx + 4] = obj[i + 4];
> > +			ring[idx + 5] = obj[i + 5];
> > +			ring[idx + 6] = obj[i + 6];
> > +			ring[idx + 7] = obj[i + 7];
> > +		}
> > +		switch (n & 0x7) {
> > +		case 7:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 6:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 5:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 4:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 3:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 2:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 1:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		}
> > +	} else {
> > +		for (i = 0; idx < size; i++, idx++)
> > +			ring[idx] = obj[i];
> > +		/* Start at the beginning */
> > +		for (idx = 0; i < n; i++, idx++)
> > +			ring[idx] = obj[i];
> > +	}
> > +}
> > +
> > +static __rte_always_inline void
> > +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
> > +		const void *obj_table, uint32_t n)
> > +{
> > +	unsigned int i;
> > +	const uint32_t size = r->size;
> > +	uint32_t idx = prod_head & r->mask;
> > +	uint64_t *ring = (uint64_t *)&r[1];
> > +	const uint64_t *obj = (const uint64_t *)obj_table;
> > +	if (likely(idx + n < size)) {
> > +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > +			ring[idx] = obj[i];
> > +			ring[idx + 1] = obj[i + 1];
> > +			ring[idx + 2] = obj[i + 2];
> > +			ring[idx + 3] = obj[i + 3];
> > +		}
> > +		switch (n & 0x3) {
> > +		case 3:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 2:
> > +			ring[idx++] = obj[i++]; /* fallthrough */
> > +		case 1:
> > +			ring[idx++] = obj[i++];
> > +		}
> > +	} else {
> > +		for (i = 0; idx < size; i++, idx++)
> > +			ring[idx] = obj[i];
> > +		/* Start at the beginning */
> > +		for (idx = 0; i < n; i++, idx++)
> > +			ring[idx] = obj[i];
> > +	}
> > +}
> > +
> > +static __rte_always_inline void
> > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> > +		const void *obj_table, uint32_t n)
> > +{
> > +	unsigned int i;
> > +	const uint32_t size = r->size;
> > +	uint32_t idx = prod_head & r->mask;
> > +	rte_int128_t *ring = (rte_int128_t *)&r[1];
> > +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> > +	if (likely(idx + n < size)) {
> > +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> > +			memcpy((void *)(ring + idx),
> > +				(const void *)(obj + i), 32);
> > +		switch (n & 0x1) {
> > +		case 1:
> > +			memcpy((void *)(ring + idx),
> > +				(const void *)(obj + i), 16);
> > +		}
> > +	} else {
> > +		for (i = 0; idx < size; i++, idx++)
> > +			memcpy((void *)(ring + idx),
> > +				(const void *)(obj + i), 16);
> > +		/* Start at the beginning */
> > +		for (idx = 0; i < n; i++, idx++)
> > +			memcpy((void *)(ring + idx),
> > +				(const void *)(obj + i), 16);
> > +	}
> > +}
> > +
> > +/* the actual enqueue of elements on the ring.
> > + * Placed here since identical code needed in both
> > + * single and multi producer enqueue functions.
> > + */
> > +static __rte_always_inline void
> > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> *obj_table,
> > +		uint32_t esize, uint32_t num)
> > +{
> > +	/* 8B and 16B copies implemented individually to retain
> > +	 * the current performance.
> > +	 */
> > +	if (esize == 8)
> > +		enqueue_elems_64(r, prod_head, obj_table, num);
> > +	else if (esize == 16)
> > +		enqueue_elems_128(r, prod_head, obj_table, num);
> > +	else {
> > +		uint32_t idx, scale, nr_idx, nr_num, nr_size;
> > +
> > +		/* Normalize to uint32_t */
> > +		scale = esize / sizeof(uint32_t);
> > +		nr_num = num * scale;
> > +		idx = prod_head & r->mask;
> > +		nr_idx = idx * scale;
> > +		nr_size = r->size * scale;
> > +		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
> > +	}
> > +}
> 
> Following Konstatin's comment on v7, enqueue_elems_128() was modified to
> ensure it won't crash if the object is unaligned. Are we sure that this same
> problem cannot also occurs with 64b copies on all supported architectures? (I
> mean 64b access that is only aligned on 32b)
Konstantin mentioned that the 64b load/store instructions on x86 can handle unaligned access. On aarch64, the load/store (non-atomic, which will be used in this case) can handle unaligned access.

+ David Christensen to comment for PPC

> 
> Out of curiosity, would it make a big perf difference to only use
> enqueue_elems_32()?
Yes, this was having a significant impact on 128b elements. I did not try on 64b elements.
I will run the perf test with 32b copy for 64b element size and get back.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2020-01-17 17:03       ` Olivier Matz
  2020-01-18 16:27         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Olivier Matz @ 2020-01-17 17:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu, nd

On Wed, Jan 15, 2020 at 11:25:08PM -0600, Honnappa Nagarahalli wrote:
> Add basic infrastructure to test rte_ring_xxx_elem APIs.
> Adjust the existing test cases to test for various ring
> element sizes.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring.c | 1342 +++++++++++++++++++++---------------------
>  app/test/test_ring.h |  187 ++++++
>  2 files changed, 850 insertions(+), 679 deletions(-)
>  create mode 100644 app/test/test_ring.h
> 
> diff --git a/app/test/test_ring.c b/app/test/test_ring.c
> index aaf1e70ad..c08500eca 100644
> --- a/app/test/test_ring.c
> +++ b/app/test/test_ring.c
> @@ -23,11 +23,13 @@
>  #include <rte_branch_prediction.h>
>  #include <rte_malloc.h>
>  #include <rte_ring.h>
> +#include <rte_ring_elem.h>
>  #include <rte_random.h>
>  #include <rte_errno.h>
>  #include <rte_hexdump.h>
>  
>  #include "test.h"
> +#include "test_ring.h"
>  
>  /*
>   * Ring

As you are changing a lot of things, maybe it's an opportunity to update
or remove the comment at the beginning of the file.


> @@ -55,8 +57,6 @@
>  #define RING_SIZE 4096
>  #define MAX_BULK 32
>  
> -static rte_atomic32_t synchro;
> -
>  #define	TEST_RING_VERIFY(exp)						\
>  	if (!(exp)) {							\
>  		printf("error at %s:%d\tcondition " #exp " failed\n",	\
> @@ -67,808 +67,792 @@ static rte_atomic32_t synchro;
>  
>  #define	TEST_RING_FULL_EMTPY_ITER	8
>  
> -/*
> - * helper routine for test_ring_basic
> - */
> -static int
> -test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[])
> +static int esize[] = {-1, 4, 8, 16, 20};

it could be const

[...]

> +/*
> + * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
> + * Random number of elements are enqueued and dequeued.
> + */
> +static int
> +test_ring_burst_bulk_tests1(unsigned int api_type)
> +{
> +	struct rte_ring *r;
> +	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
> +	int ret;
> +	unsigned int i, j;
> +	int rand;
> +	const unsigned int rsz = RING_SIZE - 1;
>  
> -	/* check data */
> -	if (memcmp(src, dst, cur_dst - dst)) {
> -		rte_hexdump(stdout, "src", src, cur_src - src);
> -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> -		printf("data after dequeue is not the same\n");
> -		goto fail;
> -	}
> +	for (i = 0; i < RTE_DIM(esize); i++) {
> +		test_ring_print_test_string("Test standard ring", api_type,
> +						esize[i]);
>  
> -	cur_src = src;
> -	cur_dst = dst;
> +		/* Create the ring */
> +		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
> +					RING_SIZE, SOCKET_ID_ANY, 0);
>  
> -	ret = rte_ring_mp_enqueue(r, cur_src);
> -	if (ret != 0)
> -		goto fail;
> +		/* alloc dummy object pointers */
> +		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
> +		if (src == NULL)
> +			goto fail;
> +		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
> +		cur_src = src;
>  
> -	ret = rte_ring_mc_dequeue(r, cur_dst);
> -	if (ret != 0)
> -		goto fail;
> +		/* alloc some room for copied objects */
> +		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
> +		if (dst == NULL)
> +			goto fail;
> +		cur_dst = dst;
> +
> +		printf("Random full/empty test\n");
> +
> +		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
> +			/* random shift in the ring */
> +			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
> +			printf("%s: iteration %u, random shift: %u;\n",
> +			    __func__, i, rand);
> +			ret = test_ring_enqueue(r, cur_src, esize[i], rand,
> +							api_type);
> +			TEST_RING_VERIFY(ret != 0);
> +
> +			ret = test_ring_dequeue(r, cur_dst, esize[i], rand,
> +							api_type);
> +			TEST_RING_VERIFY(ret == rand);
> +
> +			/* fill the ring */
> +			ret = test_ring_enqueue(r, cur_src, esize[i], rsz,
> +							api_type);
> +			TEST_RING_VERIFY(ret != 0);
> +
> +			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
> +			TEST_RING_VERIFY(rsz == rte_ring_count(r));
> +			TEST_RING_VERIFY(rte_ring_full(r));
> +			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
> +
> +			/* empty the ring */
> +			ret = test_ring_dequeue(r, cur_dst, esize[i], rsz,
> +							api_type);
> +			TEST_RING_VERIFY(ret == (int)rsz);
> +			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
> +			TEST_RING_VERIFY(rte_ring_count(r) == 0);
> +			TEST_RING_VERIFY(rte_ring_full(r) == 0);
> +			TEST_RING_VERIFY(rte_ring_empty(r));
> +
> +			/* check data */
> +			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
> +		}
> +
> +		/* Free memory before test completed */
> +		rte_ring_free(r);
> +		rte_free(src);
> +		rte_free(dst);

I think they should be reset to NULL to avoid a double free
if next iteration fails.

There are several places like this, I think it can be done even
if not really needed.

[...]

> diff --git a/app/test/test_ring.h b/app/test/test_ring.h
> new file mode 100644
> index 000000000..26716e4f8
> --- /dev/null
> +++ b/app/test/test_ring.h
> @@ -0,0 +1,187 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Arm Limited
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_ring.h>
> +#include <rte_ring_elem.h>
> +
> +/* API type to call
> + * rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>
> + * TEST_RING_THREAD_DEF - Uses configured SPSC/MPMC calls
> + * TEST_RING_THREAD_SPSC - Calls SP or SC API
> + * TEST_RING_THREAD_MPMC - Calls MP or MC API
> + */
> +#define TEST_RING_THREAD_DEF 1
> +#define TEST_RING_THREAD_SPSC 2
> +#define TEST_RING_THREAD_MPMC 4
> +
> +/* API type to call
> + * SL - Calls single element APIs
> + * BL - Calls bulk APIs
> + * BR - Calls burst APIs
> + */

The comment was not updated according to macro name.



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
@ 2020-01-17 17:12       ` Olivier Matz
  2020-01-18 16:28         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: Olivier Matz @ 2020-01-17 17:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu, nd

On Wed, Jan 15, 2020 at 11:25:09PM -0600, Honnappa Nagarahalli wrote:
> Adjust the performance test cases to test rte_ring_xxx_elem APIs.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_ring_perf.c | 454 +++++++++++++++++++++++---------------
>  1 file changed, 273 insertions(+), 181 deletions(-)
> 
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> index 6c2aca483..8d1217951 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c

[...]

> -static int
> -test_ring_perf(void)
> +/* Run all tests for a given element size */
> +static __rte_always_inline int
> +test_ring_perf_esize(const int esize)
>  {
>  	struct lcore_pair cores;
>  	struct rte_ring *r = NULL;
>  
> -	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
> +	/*
> +	 * Performance test for legacy/_elem APIs
> +	 * SP-SC/MP-MC, single
> +	 */
> +	r = test_ring_create(RING_NAME, esize, RING_SIZE, rte_socket_id(), 0);
>  	if (r == NULL)
>  		return -1;
>  
> -	printf("### Testing single element and burst enq/deq ###\n");
> -	test_single_enqueue_dequeue(r);
> -	test_burst_enqueue_dequeue(r);
> +	printf("\n### Testing single element enq/deq ###\n");
> +	if (test_single_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE) < 0)
> +		return -1;

the ring is not freed on error (same below)

> +	if (test_single_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE) < 0)
> +		return -1;
> +
> +	printf("\n### Testing burst enq/deq ###\n");
> +	if (test_burst_bulk_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST) < 0)
> +		return -1;
> +	if (test_burst_bulk_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST) < 0)
> +		return -1;
>  
> -	printf("\n### Testing empty dequeue ###\n");
> -	test_empty_dequeue(r);
> +	printf("\n### Testing bulk enq/deq ###\n");
> +	if (test_burst_bulk_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK) < 0)
> +		return -1;
> +	if (test_burst_bulk_enqueue_dequeue(r, esize,
> +			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK) < 0)
> +		return -1;
>  
> -	printf("\n### Testing using a single lcore ###\n");
> -	test_bulk_enqueue_dequeue(r);
> +	printf("\n### Testing empty bulk deq ###\n");
> +	test_empty_dequeue(r, esize,
> +			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK);
> +	test_empty_dequeue(r, esize,
> +			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
>  
>  	if (get_two_hyperthreads(&cores) == 0) {
>  		printf("\n### Testing using two hyperthreads ###\n");
> -		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
> +		if (run_on_core_pair(&cores, r, esize) < 0)
> +			return -1;
>  	}
>  	if (get_two_cores(&cores) == 0) {
>  		printf("\n### Testing using two physical cores ###\n");
> -		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
> +		if (run_on_core_pair(&cores, r, esize) < 0)
> +			return -1;
>  	}
>  	if (get_two_sockets(&cores) == 0) {
>  		printf("\n### Testing using two NUMA nodes ###\n");
> -		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
> +		if (run_on_core_pair(&cores, r, esize) < 0)
> +			return -1;
>  	}
>  
>  	printf("\n### Testing using all slave nodes ###\n");
> -	run_on_all_cores(r);
> +	if (run_on_all_cores(r, esize) < 0)
> +		return -1;
>  
>  	rte_ring_free(r);
> +
> +	return 0;
> +}


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
                       ` (6 preceding siblings ...)
  2020-01-16 16:36     ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-17 17:15     ` Olivier Matz
  7 siblings, 0 replies; 173+ messages in thread
From: Olivier Matz @ 2020-01-17 17:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, dharmik.thakkar,
	ruifeng.wang, gavin.hu, nd

On Wed, Jan 15, 2020 at 11:25:05PM -0600, Honnappa Nagarahalli wrote:
> The current rte_ring hard-codes the type of the ring element to 'void *',
> hence the size of the element is hard-coded to 32b/64b. Since the ring
> element type is not an input to rte_ring APIs, it results in couple
> of issues:
> 
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
> 
> This patch adds new APIs to support configurable ring element size.
> The APIs support custom element sizes by allowing to define the ring
> element to be a multiple of 32b.
> 
> The aim is to achieve same performance as the existing ring
> implementation.
> 

This is a nice improvement to the ring library, thanks!

It looks globally good to me, few comments as reply to individual
patches.


Olivier

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-17 16:45         ` Honnappa Nagarahalli
@ 2020-01-17 18:10           ` David Christensen
  2020-01-18 12:32           ` Ananyev, Konstantin
  1 sibling, 0 replies; 173+ messages in thread
From: David Christensen @ 2020-01-17 18:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Olivier Matz
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd

>>> +static __rte_always_inline void
>>> +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
>>> +		const void *obj_table, uint32_t n)
>>> +{
>>> +	unsigned int i;
>>> +	const uint32_t size = r->size;
>>> +	uint32_t idx = prod_head & r->mask;
>>> +	rte_int128_t *ring = (rte_int128_t *)&r[1];
>>> +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
>>> +	if (likely(idx + n < size)) {
>>> +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>>> +			memcpy((void *)(ring + idx),
>>> +				(const void *)(obj + i), 32);
>>> +		switch (n & 0x1) {
>>> +		case 1:
>>> +			memcpy((void *)(ring + idx),
>>> +				(const void *)(obj + i), 16);
>>> +		}
>>> +	} else {
>>> +		for (i = 0; idx < size; i++, idx++)
>>> +			memcpy((void *)(ring + idx),
>>> +				(const void *)(obj + i), 16);
>>> +		/* Start at the beginning */
>>> +		for (idx = 0; i < n; i++, idx++)
>>> +			memcpy((void *)(ring + idx),
>>> +				(const void *)(obj + i), 16);
>>> +	}
>>> +}
>>> +
>>> +/* the actual enqueue of elements on the ring.
>>> + * Placed here since identical code needed in both
>>> + * single and multi producer enqueue functions.
>>> + */
>>> +static __rte_always_inline void
>>> +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
>> *obj_table,
>>> +		uint32_t esize, uint32_t num)
>>> +{
>>> +	/* 8B and 16B copies implemented individually to retain
>>> +	 * the current performance.
>>> +	 */
>>> +	if (esize == 8)
>>> +		enqueue_elems_64(r, prod_head, obj_table, num);
>>> +	else if (esize == 16)
>>> +		enqueue_elems_128(r, prod_head, obj_table, num);
>>> +	else {
>>> +		uint32_t idx, scale, nr_idx, nr_num, nr_size;
>>> +
>>> +		/* Normalize to uint32_t */
>>> +		scale = esize / sizeof(uint32_t);
>>> +		nr_num = num * scale;
>>> +		idx = prod_head & r->mask;
>>> +		nr_idx = idx * scale;
>>> +		nr_size = r->size * scale;
>>> +		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
>>> +	}
>>> +}
>>
>> Following Konstatin's comment on v7, enqueue_elems_128() was modified to
>> ensure it won't crash if the object is unaligned. Are we sure that this same
>> problem cannot also occurs with 64b copies on all supported architectures? (I
>> mean 64b access that is only aligned on 32b)
> Konstantin mentioned that the 64b load/store instructions on x86 can handle unaligned access. On aarch64, the load/store (non-atomic, which will be used in this case) can handle unaligned access.
> 
> + David Christensen to comment for PPC

The vectorized version of memcpy for Power can handle unaligned access 
as well.

Dave

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2020-01-17 20:27       ` David Marchand
  2020-01-17 20:54         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 173+ messages in thread
From: David Marchand @ 2020-01-17 20:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, Jerin Jacob Kollanukkaran,
	Bruce Richardson, Pavan Nikhilesh, Ananyev, Konstantin, Wang,
	Yipeng1, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu, nd, Thomas Monjalon

On Thu, Jan 16, 2020 at 6:25 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> The freelist and external bucket indices are 32b. Using rings
> that use 32b element sizes will save memory.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 94 ++++++++++++++++---------------
>  lib/librte_hash/rte_cuckoo_hash.h |  2 +-
>  2 files changed, 50 insertions(+), 46 deletions(-)
>

[snip]

> diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
> index fb19bb27d..345de6bf9 100644
> --- a/lib/librte_hash/rte_cuckoo_hash.h
> +++ b/lib/librte_hash/rte_cuckoo_hash.h
> @@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
>
>  struct lcore_cache {
>         unsigned len; /**< Cache len */
> -       void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
> +       uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */

This triggers a warning in ABI checks:

1 function with some indirect sub-type change:

  [C]'function int32_t rte_hash_add_key(const rte_hash*, void*)' at
rte_cuckoo_hash.c:1118:1 has some indirect sub-type changes:
    parameter 1 of type 'const rte_hash*' has sub-type changes:
      in pointed to type 'const rte_hash':
        in unqualified underlying type 'struct rte_hash' at
rte_cuckoo_hash.h:160:1:
          type size hasn't changed
          1 data member change:
           type of 'lcore_cache* rte_hash::local_free_slots' changed:
             in pointed to type 'struct lcore_cache' at rte_cuckoo_hash.h:125:1:
               type size changed from 4608 to 2560 (in bits)
               1 data member change:
                type of 'void* lcore_cache::objs[64]' changed:
                  array element type 'void*' changed:
                    entity changed from 'void*' to 'typedef uint32_t'
at stdint-uintn.h:26:1
                    type size changed from 64 to 32 (in bits)
                  type name changed from 'void*[64]' to 'uint32_t[64]'
                  array type size changed from 4096 to 2048
                and offset changed from 64 to 32 (in bits) (by -32 bits)

As far as I can see, the local_free_slots field in rte_hash is
supposed to be internal and should just be hidden from users.
lib/librte_hash/rte_cuckoo_hash.c:              h->local_free_slots =
rte_zmalloc_socket(NULL,
lib/librte_hash/rte_cuckoo_hash.c:              rte_free(h->local_free_slots);
lib/librte_hash/rte_cuckoo_hash.c:                      cached_cnt +=
h->local_free_slots[i].len;
lib/librte_hash/rte_cuckoo_hash.c:
h->local_free_slots[i].len = 0;
lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
&h->local_free_slots[lcore_id];
lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
&h->local_free_slots[lcore_id];
lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
&h->local_free_slots[lcore_id];
lib/librte_hash/rte_cuckoo_hash.h:      struct lcore_cache *local_free_slots;

Not sure how users could make use of this.
But the abi check flags this as a breakage since this type was exported.

I can see three options:
- we stick to our "no abi breakage" policy, this change is postponed
to the next ABI breakage, and at the same time, we hide this type and
inspect the rest of the rte_hash API to avoid new issues in the
future,
- we duplicate structures and API by using function versioning to keep
the exact rte_hash v20.0 ABI and a v20.0.1 ABI with the resized and
cleaned structures,
- we override the ABI freeze here by ruling that this was an internal
structure that users should not access (ugh..)

Seeing how this is an optimisation, my preference goes to the first option.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-17 20:27       ` David Marchand
@ 2020-01-17 20:54         ` Honnappa Nagarahalli
  2020-01-17 21:07           ` David Marchand
  0 siblings, 1 reply; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-17 20:54 UTC (permalink / raw)
  To: David Marchand
  Cc: Olivier Matz, Stephen Hemminger, jerinj, Bruce Richardson,
	Pavan Nikhilesh, Ananyev, Konstantin, Wang, Yipeng1, dev,
	Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, thomas,
	Honnappa Nagarahalli, nd

<snip>

> 
> On Thu, Jan 16, 2020 at 6:25 AM Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com> wrote:
> >
> > The freelist and external bucket indices are 32b. Using rings that use
> > 32b element sizes will save memory.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > ---
> >  lib/librte_hash/rte_cuckoo_hash.c | 94
> > ++++++++++++++++---------------  lib/librte_hash/rte_cuckoo_hash.h |
> > 2 +-
> >  2 files changed, 50 insertions(+), 46 deletions(-)
> >
> 
> [snip]
> 
> > diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> > b/lib/librte_hash/rte_cuckoo_hash.h
> > index fb19bb27d..345de6bf9 100644
> > --- a/lib/librte_hash/rte_cuckoo_hash.h
> > +++ b/lib/librte_hash/rte_cuckoo_hash.h
> > @@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t
> > cmp_jump_table[NUM_KEY_CMP_CASES] = {
> >
> >  struct lcore_cache {
> >         unsigned len; /**< Cache len */
> > -       void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
> > +       uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
> 
> This triggers a warning in ABI checks:
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int32_t rte_hash_add_key(const rte_hash*, void*)' at
> rte_cuckoo_hash.c:1118:1 has some indirect sub-type changes:
>     parameter 1 of type 'const rte_hash*' has sub-type changes:
>       in pointed to type 'const rte_hash':
>         in unqualified underlying type 'struct rte_hash' at
> rte_cuckoo_hash.h:160:1:
>           type size hasn't changed
>           1 data member change:
>            type of 'lcore_cache* rte_hash::local_free_slots' changed:
>              in pointed to type 'struct lcore_cache' at rte_cuckoo_hash.h:125:1:
>                type size changed from 4608 to 2560 (in bits)
>                1 data member change:
>                 type of 'void* lcore_cache::objs[64]' changed:
>                   array element type 'void*' changed:
>                     entity changed from 'void*' to 'typedef uint32_t'
> at stdint-uintn.h:26:1
>                     type size changed from 64 to 32 (in bits)
>                   type name changed from 'void*[64]' to 'uint32_t[64]'
>                   array type size changed from 4096 to 2048
>                 and offset changed from 64 to 32 (in bits) (by -32 bits)
> 
> As far as I can see, the local_free_slots field in rte_hash is supposed to be
> internal and should just be hidden from users.
> lib/librte_hash/rte_cuckoo_hash.c:              h->local_free_slots =
> rte_zmalloc_socket(NULL,
> lib/librte_hash/rte_cuckoo_hash.c:              rte_free(h->local_free_slots);
> lib/librte_hash/rte_cuckoo_hash.c:                      cached_cnt +=
> h->local_free_slots[i].len;
> lib/librte_hash/rte_cuckoo_hash.c:
> h->local_free_slots[i].len = 0;
> lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> &h->local_free_slots[lcore_id];
> lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> &h->local_free_slots[lcore_id];
> lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> &h->local_free_slots[lcore_id];
> lib/librte_hash/rte_cuckoo_hash.h:      struct lcore_cache *local_free_slots;
> 
> Not sure how users could make use of this.
> But the abi check flags this as a breakage since this type was exported.
I think this is a false positive.

Users include 'rte_hash.h' file which does not define the structure. It just has the declaration 'struct rte_hash'. The actual structure is defined in 'rte_cuckoo_hash.h'. But this is not included by the user. So, the application does not have visibility into 'struct rte_hash' as defined in 'rte_cuckoo_hash.h'.

The 'rte_create_hash' API returns a pointer to the 'struct rte_hash'. All the APIs are non-inline and just take this pointer as the argument. So, the 'struct rte_hash' as defined in 'rte_cuckoo_hash.h' is not used by the user.

You can take a look at test_hash_readwrite_lf.c and function 'check_bucket'. This function is written as the test case cannot access the 'struct rte_hash' from 'rte_cuckoo_hash.h'. 
 
> 
> I can see three options:
> - we stick to our "no abi breakage" policy, this change is postponed to the next
> ABI breakage, and at the same time, we hide this type and inspect the rest of
> the rte_hash API to avoid new issues in the future,
> - we duplicate structures and API by using function versioning to keep the
> exact rte_hash v20.0 ABI and a v20.0.1 ABI with the resized and cleaned
> structures,
> - we override the ABI freeze here by ruling that this was an internal structure
> that users should not access (ugh..)
> 
> Seeing how this is an optimisation, my preference goes to the first option.
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-17 20:54         ` Honnappa Nagarahalli
@ 2020-01-17 21:07           ` David Marchand
  2020-01-17 22:24             ` Wang, Yipeng1
  0 siblings, 1 reply; 173+ messages in thread
From: David Marchand @ 2020-01-17 21:07 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, jerinj, Bruce Richardson,
	Pavan Nikhilesh, Ananyev, Konstantin, Wang, Yipeng1, dev,
	Dharmik Thakkar, Ruifeng Wang, Gavin Hu, nd, thomas

On Fri, Jan 17, 2020 at 9:54 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> >
> > On Thu, Jan 16, 2020 at 6:25 AM Honnappa Nagarahalli
> > <honnappa.nagarahalli@arm.com> wrote:
> > >
> > > The freelist and external bucket indices are 32b. Using rings that use
> > > 32b element sizes will save memory.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > > ---
> > >  lib/librte_hash/rte_cuckoo_hash.c | 94
> > > ++++++++++++++++---------------  lib/librte_hash/rte_cuckoo_hash.h |
> > > 2 +-
> > >  2 files changed, 50 insertions(+), 46 deletions(-)
> > >
> >
> > [snip]
> >
> > > diff --git a/lib/librte_hash/rte_cuckoo_hash.h
> > > b/lib/librte_hash/rte_cuckoo_hash.h
> > > index fb19bb27d..345de6bf9 100644
> > > --- a/lib/librte_hash/rte_cuckoo_hash.h
> > > +++ b/lib/librte_hash/rte_cuckoo_hash.h
> > > @@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t
> > > cmp_jump_table[NUM_KEY_CMP_CASES] = {
> > >
> > >  struct lcore_cache {
> > >         unsigned len; /**< Cache len */
> > > -       void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
> > > +       uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
> >
> > This triggers a warning in ABI checks:
> >
> > 1 function with some indirect sub-type change:
> >
> >   [C]'function int32_t rte_hash_add_key(const rte_hash*, void*)' at
> > rte_cuckoo_hash.c:1118:1 has some indirect sub-type changes:
> >     parameter 1 of type 'const rte_hash*' has sub-type changes:
> >       in pointed to type 'const rte_hash':
> >         in unqualified underlying type 'struct rte_hash' at
> > rte_cuckoo_hash.h:160:1:
> >           type size hasn't changed
> >           1 data member change:
> >            type of 'lcore_cache* rte_hash::local_free_slots' changed:
> >              in pointed to type 'struct lcore_cache' at rte_cuckoo_hash.h:125:1:
> >                type size changed from 4608 to 2560 (in bits)
> >                1 data member change:
> >                 type of 'void* lcore_cache::objs[64]' changed:
> >                   array element type 'void*' changed:
> >                     entity changed from 'void*' to 'typedef uint32_t'
> > at stdint-uintn.h:26:1
> >                     type size changed from 64 to 32 (in bits)
> >                   type name changed from 'void*[64]' to 'uint32_t[64]'
> >                   array type size changed from 4096 to 2048
> >                 and offset changed from 64 to 32 (in bits) (by -32 bits)
> >
> > As far as I can see, the local_free_slots field in rte_hash is supposed to be
> > internal and should just be hidden from users.
> > lib/librte_hash/rte_cuckoo_hash.c:              h->local_free_slots =
> > rte_zmalloc_socket(NULL,
> > lib/librte_hash/rte_cuckoo_hash.c:              rte_free(h->local_free_slots);
> > lib/librte_hash/rte_cuckoo_hash.c:                      cached_cnt +=
> > h->local_free_slots[i].len;
> > lib/librte_hash/rte_cuckoo_hash.c:
> > h->local_free_slots[i].len = 0;
> > lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> > &h->local_free_slots[lcore_id];
> > lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> > &h->local_free_slots[lcore_id];
> > lib/librte_hash/rte_cuckoo_hash.c:              cached_free_slots =
> > &h->local_free_slots[lcore_id];
> > lib/librte_hash/rte_cuckoo_hash.h:      struct lcore_cache *local_free_slots;
> >
> > Not sure how users could make use of this.
> > But the abi check flags this as a breakage since this type was exported.
> I think this is a false positive.
>
> Users include 'rte_hash.h' file which does not define the structure. It just has the declaration 'struct rte_hash'. The actual structure is defined in 'rte_cuckoo_hash.h'. But this is not included by the user. So, the application does not have visibility into 'struct rte_hash' as defined in 'rte_cuckoo_hash.h'.
>
> The 'rte_create_hash' API returns a pointer to the 'struct rte_hash'. All the APIs are non-inline and just take this pointer as the argument. So, the 'struct rte_hash' as defined in 'rte_cuckoo_hash.h' is not used by the user.

Indeed, it seems properly hidden.
Scratching the rest of the mail.

Looked at abidiff, I can see it takes a public headers directory to
filter the ABI changes.
Need to make this work.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-17 21:07           ` David Marchand
@ 2020-01-17 22:24             ` Wang, Yipeng1
  0 siblings, 0 replies; 173+ messages in thread
From: Wang, Yipeng1 @ 2020-01-17 22:24 UTC (permalink / raw)
  To: David Marchand, Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, jerinj, Richardson, Bruce,
	Pavan Nikhilesh, Ananyev, Konstantin, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, thomas

>-----Original Message-----
>> > Not sure how users could make use of this.
>> > But the abi check flags this as a breakage since this type was exported.
>> I think this is a false positive.
>>
>> Users include 'rte_hash.h' file which does not define the structure. It just has the declaration 'struct rte_hash'. The actual structure is
>defined in 'rte_cuckoo_hash.h'. But this is not included by the user. So, the application does not have visibility into 'struct rte_hash' as
>defined in 'rte_cuckoo_hash.h'.
>>
>> The 'rte_create_hash' API returns a pointer to the 'struct rte_hash'. All the APIs are non-inline and just take this pointer as the
>argument. So, the 'struct rte_hash' as defined in 'rte_cuckoo_hash.h' is not used by the user.
>
>Indeed, it seems properly hidden.
>Scratching the rest of the mail.
>
>Looked at abidiff, I can see it takes a public headers directory to
>filter the ABI changes.
>Need to make this work.
>
>
>--
>David Marchand

[Wang, Yipeng] 
Hi, Honnappa, I read the new API defs and this patch to rte_hash looks good to me.
Passed unit tests as well.
And I agree with you that the internals of rte_hash is hidden from users.
As long as the false warning in abi-check script is no concern:

Acked-by: Yipeng Wang <yipeng1.wang@intel.com>

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-17 16:45         ` Honnappa Nagarahalli
  2020-01-17 18:10           ` David Christensen
@ 2020-01-18 12:32           ` Ananyev, Konstantin
  2020-01-18 15:01             ` Honnappa Nagarahalli
  1 sibling, 1 reply; 173+ messages in thread
From: Ananyev, Konstantin @ 2020-01-18 12:32 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Olivier Matz
  Cc: sthemmin, jerinj, Richardson, Bruce, david.marchand,
	pbhagavatula, Wang, Yipeng1, dev, Dharmik Thakkar, Ruifeng Wang,
	Gavin Hu, nd, David Christensen, nd


> > On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote:
> > > Current APIs assume ring elements to be pointers. However, in many use
> > > cases, the size can be different. Add new APIs to support configurable
> > > ring element sizes.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/Makefile             |    3 +-
> > >  lib/librte_ring/meson.build          |    4 +
> > >  lib/librte_ring/rte_ring.c           |   41 +-
> > >  lib/librte_ring/rte_ring.h           |    1 +
> > >  lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
> > >  lib/librte_ring/rte_ring_version.map |    2 +
> > >  6 files changed, 1045 insertions(+), 9 deletions(-)  create mode
> > > 100644 lib/librte_ring/rte_ring_elem.h
> > >
> >
> > [...]
> >
> > > +static __rte_always_inline void
> > > +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
> > > +		const void *obj_table, uint32_t n)
> > > +{
> > > +	unsigned int i;
> > > +	uint32_t *ring = (uint32_t *)&r[1];
> > > +	const uint32_t *obj = (const uint32_t *)obj_table;
> > > +	if (likely(idx + n < size)) {
> > > +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> > > +			ring[idx] = obj[i];
> > > +			ring[idx + 1] = obj[i + 1];
> > > +			ring[idx + 2] = obj[i + 2];
> > > +			ring[idx + 3] = obj[i + 3];
> > > +			ring[idx + 4] = obj[i + 4];
> > > +			ring[idx + 5] = obj[i + 5];
> > > +			ring[idx + 6] = obj[i + 6];
> > > +			ring[idx + 7] = obj[i + 7];
> > > +		}
> > > +		switch (n & 0x7) {
> > > +		case 7:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 6:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 5:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 4:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 3:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 2:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 1:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		}
> > > +	} else {
> > > +		for (i = 0; idx < size; i++, idx++)
> > > +			ring[idx] = obj[i];
> > > +		/* Start at the beginning */
> > > +		for (idx = 0; i < n; i++, idx++)
> > > +			ring[idx] = obj[i];
> > > +	}
> > > +}
> > > +
> > > +static __rte_always_inline void
> > > +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
> > > +		const void *obj_table, uint32_t n)
> > > +{
> > > +	unsigned int i;
> > > +	const uint32_t size = r->size;
> > > +	uint32_t idx = prod_head & r->mask;
> > > +	uint64_t *ring = (uint64_t *)&r[1];
> > > +	const uint64_t *obj = (const uint64_t *)obj_table;
> > > +	if (likely(idx + n < size)) {
> > > +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > +			ring[idx] = obj[i];
> > > +			ring[idx + 1] = obj[i + 1];
> > > +			ring[idx + 2] = obj[i + 2];
> > > +			ring[idx + 3] = obj[i + 3];
> > > +		}
> > > +		switch (n & 0x3) {
> > > +		case 3:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 2:
> > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > +		case 1:
> > > +			ring[idx++] = obj[i++];
> > > +		}
> > > +	} else {
> > > +		for (i = 0; idx < size; i++, idx++)
> > > +			ring[idx] = obj[i];
> > > +		/* Start at the beginning */
> > > +		for (idx = 0; i < n; i++, idx++)
> > > +			ring[idx] = obj[i];
> > > +	}
> > > +}
> > > +
> > > +static __rte_always_inline void
> > > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> > > +		const void *obj_table, uint32_t n)
> > > +{
> > > +	unsigned int i;
> > > +	const uint32_t size = r->size;
> > > +	uint32_t idx = prod_head & r->mask;
> > > +	rte_int128_t *ring = (rte_int128_t *)&r[1];
> > > +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> > > +	if (likely(idx + n < size)) {
> > > +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> > > +			memcpy((void *)(ring + idx),
> > > +				(const void *)(obj + i), 32);
> > > +		switch (n & 0x1) {
> > > +		case 1:
> > > +			memcpy((void *)(ring + idx),
> > > +				(const void *)(obj + i), 16);
> > > +		}
> > > +	} else {
> > > +		for (i = 0; idx < size; i++, idx++)
> > > +			memcpy((void *)(ring + idx),
> > > +				(const void *)(obj + i), 16);
> > > +		/* Start at the beginning */
> > > +		for (idx = 0; i < n; i++, idx++)
> > > +			memcpy((void *)(ring + idx),
> > > +				(const void *)(obj + i), 16);
> > > +	}
> > > +}
> > > +
> > > +/* the actual enqueue of elements on the ring.
> > > + * Placed here since identical code needed in both
> > > + * single and multi producer enqueue functions.
> > > + */
> > > +static __rte_always_inline void
> > > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> > *obj_table,
> > > +		uint32_t esize, uint32_t num)
> > > +{
> > > +	/* 8B and 16B copies implemented individually to retain
> > > +	 * the current performance.
> > > +	 */
> > > +	if (esize == 8)
> > > +		enqueue_elems_64(r, prod_head, obj_table, num);
> > > +	else if (esize == 16)
> > > +		enqueue_elems_128(r, prod_head, obj_table, num);
> > > +	else {
> > > +		uint32_t idx, scale, nr_idx, nr_num, nr_size;
> > > +
> > > +		/* Normalize to uint32_t */
> > > +		scale = esize / sizeof(uint32_t);
> > > +		nr_num = num * scale;
> > > +		idx = prod_head & r->mask;
> > > +		nr_idx = idx * scale;
> > > +		nr_size = r->size * scale;
> > > +		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
> > > +	}
> > > +}
> >
> > Following Konstatin's comment on v7, enqueue_elems_128() was modified to
> > ensure it won't crash if the object is unaligned. Are we sure that this same
> > problem cannot also occurs with 64b copies on all supported architectures? (I
> > mean 64b access that is only aligned on 32b)
> Konstantin mentioned that the 64b load/store instructions on x86 can handle unaligned access.

Yep, I think we are ok here for IA and IA-32.

> On aarch64, the load/store (non-atomic,
> which will be used in this case) can handle unaligned access.
> 
> + David Christensen to comment for PPC

If we are in doubt here, probably worth to add a new test-case(s) for UT?

> 
> >
> > Out of curiosity, would it make a big perf difference to only use
> > enqueue_elems_32()?
> Yes, this was having a significant impact on 128b elements. I did not try on 64b elements.
> I will run the perf test with 32b copy for 64b element size and get back.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size
  2020-01-18 12:32           ` Ananyev, Konstantin
@ 2020-01-18 15:01             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 15:01 UTC (permalink / raw)
  To: Ananyev, Konstantin, Olivier Matz
  Cc: sthemmin, jerinj, Richardson, Bruce, david.marchand,
	pbhagavatula, Wang, Yipeng1, dev, Dharmik Thakkar, Ruifeng Wang,
	Gavin Hu, nd, David Christensen, Honnappa Nagarahalli, nd

<snip>

> 
> > > On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote:
> > > > Current APIs assume ring elements to be pointers. However, in many
> > > > use cases, the size can be different. Add new APIs to support
> > > > configurable ring element sizes.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  lib/librte_ring/Makefile             |    3 +-
> > > >  lib/librte_ring/meson.build          |    4 +
> > > >  lib/librte_ring/rte_ring.c           |   41 +-
> > > >  lib/librte_ring/rte_ring.h           |    1 +
> > > >  lib/librte_ring/rte_ring_elem.h      | 1003
> ++++++++++++++++++++++++++
> > > >  lib/librte_ring/rte_ring_version.map |    2 +
> > > >  6 files changed, 1045 insertions(+), 9 deletions(-)  create mode
> > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > >
> > >
> > > [...]
> > >
> > > > +static __rte_always_inline void
> > > > +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
> > > > +		const void *obj_table, uint32_t n) {
> > > > +	unsigned int i;
> > > > +	uint32_t *ring = (uint32_t *)&r[1];
> > > > +	const uint32_t *obj = (const uint32_t *)obj_table;
> > > > +	if (likely(idx + n < size)) {
> > > > +		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
> > > > +			ring[idx] = obj[i];
> > > > +			ring[idx + 1] = obj[i + 1];
> > > > +			ring[idx + 2] = obj[i + 2];
> > > > +			ring[idx + 3] = obj[i + 3];
> > > > +			ring[idx + 4] = obj[i + 4];
> > > > +			ring[idx + 5] = obj[i + 5];
> > > > +			ring[idx + 6] = obj[i + 6];
> > > > +			ring[idx + 7] = obj[i + 7];
> > > > +		}
> > > > +		switch (n & 0x7) {
> > > > +		case 7:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 6:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 5:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 4:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 3:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 2:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 1:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		}
> > > > +	} else {
> > > > +		for (i = 0; idx < size; i++, idx++)
> > > > +			ring[idx] = obj[i];
> > > > +		/* Start at the beginning */
> > > > +		for (idx = 0; i < n; i++, idx++)
> > > > +			ring[idx] = obj[i];
> > > > +	}
> > > > +}
> > > > +
> > > > +static __rte_always_inline void
> > > > +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
> > > > +		const void *obj_table, uint32_t n) {
> > > > +	unsigned int i;
> > > > +	const uint32_t size = r->size;
> > > > +	uint32_t idx = prod_head & r->mask;
> > > > +	uint64_t *ring = (uint64_t *)&r[1];
> > > > +	const uint64_t *obj = (const uint64_t *)obj_table;
> > > > +	if (likely(idx + n < size)) {
> > > > +		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
> > > > +			ring[idx] = obj[i];
> > > > +			ring[idx + 1] = obj[i + 1];
> > > > +			ring[idx + 2] = obj[i + 2];
> > > > +			ring[idx + 3] = obj[i + 3];
> > > > +		}
> > > > +		switch (n & 0x3) {
> > > > +		case 3:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 2:
> > > > +			ring[idx++] = obj[i++]; /* fallthrough */
> > > > +		case 1:
> > > > +			ring[idx++] = obj[i++];
> > > > +		}
> > > > +	} else {
> > > > +		for (i = 0; idx < size; i++, idx++)
> > > > +			ring[idx] = obj[i];
> > > > +		/* Start at the beginning */
> > > > +		for (idx = 0; i < n; i++, idx++)
> > > > +			ring[idx] = obj[i];
> > > > +	}
> > > > +}
> > > > +
> > > > +static __rte_always_inline void
> > > > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
> > > > +		const void *obj_table, uint32_t n) {
> > > > +	unsigned int i;
> > > > +	const uint32_t size = r->size;
> > > > +	uint32_t idx = prod_head & r->mask;
> > > > +	rte_int128_t *ring = (rte_int128_t *)&r[1];
> > > > +	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> > > > +	if (likely(idx + n < size)) {
> > > > +		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
> > > > +			memcpy((void *)(ring + idx),
> > > > +				(const void *)(obj + i), 32);
> > > > +		switch (n & 0x1) {
> > > > +		case 1:
> > > > +			memcpy((void *)(ring + idx),
> > > > +				(const void *)(obj + i), 16);
> > > > +		}
> > > > +	} else {
> > > > +		for (i = 0; idx < size; i++, idx++)
> > > > +			memcpy((void *)(ring + idx),
> > > > +				(const void *)(obj + i), 16);
> > > > +		/* Start at the beginning */
> > > > +		for (idx = 0; i < n; i++, idx++)
> > > > +			memcpy((void *)(ring + idx),
> > > > +				(const void *)(obj + i), 16);
> > > > +	}
> > > > +}
> > > > +
> > > > +/* the actual enqueue of elements on the ring.
> > > > + * Placed here since identical code needed in both
> > > > + * single and multi producer enqueue functions.
> > > > + */
> > > > +static __rte_always_inline void
> > > > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void
> > > *obj_table,
> > > > +		uint32_t esize, uint32_t num)
> > > > +{
> > > > +	/* 8B and 16B copies implemented individually to retain
> > > > +	 * the current performance.
> > > > +	 */
> > > > +	if (esize == 8)
> > > > +		enqueue_elems_64(r, prod_head, obj_table, num);
> > > > +	else if (esize == 16)
> > > > +		enqueue_elems_128(r, prod_head, obj_table, num);
> > > > +	else {
> > > > +		uint32_t idx, scale, nr_idx, nr_num, nr_size;
> > > > +
> > > > +		/* Normalize to uint32_t */
> > > > +		scale = esize / sizeof(uint32_t);
> > > > +		nr_num = num * scale;
> > > > +		idx = prod_head & r->mask;
> > > > +		nr_idx = idx * scale;
> > > > +		nr_size = r->size * scale;
> > > > +		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
> > > > +	}
> > > > +}
> > >
> > > Following Konstatin's comment on v7, enqueue_elems_128() was
> > > modified to ensure it won't crash if the object is unaligned. Are we
> > > sure that this same problem cannot also occurs with 64b copies on
> > > all supported architectures? (I mean 64b access that is only aligned
> > > on 32b)
> > Konstantin mentioned that the 64b load/store instructions on x86 can
> handle unaligned access.
> 
> Yep, I think we are ok here for IA and IA-32.
> 
> > On aarch64, the load/store (non-atomic, which will be used in this
> > case) can handle unaligned access.
> >
> > + David Christensen to comment for PPC
> 
> If we are in doubt here, probably worth to add a new test-case(s) for UT?
I will modify one of the functional tests to use unaligned address.

> 
> >
> > >
> > > Out of curiosity, would it make a big perf difference to only use
> > > enqueue_elems_32()?
> > Yes, this was having a significant impact on 128b elements. I did not try on
> 64b elements.
> > I will run the perf test with 32b copy for 64b element size and get back.
Following is the data. Mostly, the 64b element copy performs better.

8B with 64b element copy                                                                                           8B with 32b element copy
RTE>>ring_perf_autotest                                                                                            RTE>>ring_perf_autotest
		
### Testing single element enq/deq ###	                                                                 ### Testing single element enq/deq ###
elem APIs: element size 8B: SP/SC: single: 10.02                                                    elem APIs: element size 8B: SP/SC: single: 11.21
elem APIs: element size 8B: MP/MC: single: 51.21                                                elem APIs: element size 8B: MP/MC: single: 46.22
	
### Testing burst enq/deq ###                                                                                  ### Testing burst enq/deq ###
elem APIs: element size 8B: SP/SC: burst (size: 8): 39.30                                      elem APIs: element size 8B: SP/SC: burst (size: 8): 62.01
elem APIs: element size 8B: SP/SC: burst (size: 32): 92.73                                   elem APIs: element size 8B: SP/SC: burst (size: 32): 189.15
elem APIs: element size 8B: MP/MC: burst (size: 8): 76.92                                  elem APIs: element size 8B: MP/MC: burst (size: 8): 109.28
elem APIs: element size 8B: MP/MC: burst (size: 32): 134.84                             elem APIs: element size 8B: MP/MC: burst (size: 32): 229.98
		
### Testing bulk enq/deq ###                                                                                   ### Testing bulk enq/deq ###
elem APIs: element size 8B: SP/SC: bulk (size: 8): 32.56                                       elem APIs: element size 8B: SP/SC: bulk (size: 8): 62.01
elem APIs: element size 8B: SP/SC: bulk (size: 32): 93.53                                     elem APIs: element size 8B: SP/SC: bulk (size: 32): 189.78
elem APIs: element size 8B: MP/MC: bulk (size: 8): 76.76	                                   elem APIs: element size 8B: MP/MC: bulk (size: 8): 109.16
elem APIs: element size 8B: MP/MC: bulk (size: 32): 134.74                               elem APIs: element size 8B: MP/MC: bulk (size: 32): 229.87

### Testing empty bulk deq ###                                                                               ### Testing empty bulk deq ###
elem APIs: element size 8B: SP/SC: bulk (size: 8): 5.00                                         elem APIs: element size 8B: SP/SC: bulk (size: 8): 3.00
elem APIs: element size 8B: MP/MC: bulk (size: 8): 6.00                                     elem APIs: element size 8B: MP/MC: bulk (size: 8): 6.00

### Testing using two physical cores ###                                                               ### Testing using two physical cores ###
elem APIs: element size 8B: SP/SC: bulk (size: 8): 26.33                                      elem APIs: element size 8B: SP/SC: bulk (size: 8): 47.03
elem APIs: element size 8B: MP/MC: bulk (size: 8): 83.22	                                   elem APIs: element size 8B: MP/MC: bulk (size: 8): 83.12
elem APIs: element size 8B: SP/SC: bulk (size: 32): 12.48                                    elem APIs: element size 8B: SP/SC: bulk (size: 32): 20.73
elem APIs: element size 8B: MP/MC: bulk (size: 32): 24.53                                elem APIs: element size 8B: MP/MC: bulk (size: 32): 23.18

### Testing using two NUMA nodes ###                                                               ### Testing using two NUMA nodes ###
elem APIs: element size 8B: SP/SC: bulk (size: 8): 54.35                                     elem APIs: element size 8B: SP/SC: bulk (size: 8): 92.40
elem APIs: element size 8B: MP/MC: bulk (size: 8): 197.59                               elem APIs: element size 8B: MP/MC: bulk (size: 8): 198.98
elem APIs: element size 8B: SP/SC: bulk (size: 32): 36.54                                  elem APIs: element size 8B: SP/SC: bulk (size: 32): 57.22
elem APIs: element size 8B: MP/MC: bulk (size: 32): 72.57                              elem APIs: element size 8B: MP/MC: bulk (size: 32): 65.65

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-17 17:03       ` Olivier Matz
@ 2020-01-18 16:27         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 16:27 UTC (permalink / raw)
  To: Olivier Matz
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Honnappa Nagarahalli, nd

<snip>

> On Wed, Jan 15, 2020 at 11:25:08PM -0600, Honnappa Nagarahalli wrote:
> > Add basic infrastructure to test rte_ring_xxx_elem APIs.
> > Adjust the existing test cases to test for various ring element sizes.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring.c | 1342
> > +++++++++++++++++++++---------------------
> >  app/test/test_ring.h |  187 ++++++
> >  2 files changed, 850 insertions(+), 679 deletions(-)  create mode
> > 100644 app/test/test_ring.h
> >
> > diff --git a/app/test/test_ring.c b/app/test/test_ring.c index
> > aaf1e70ad..c08500eca 100644
> > --- a/app/test/test_ring.c
> > +++ b/app/test/test_ring.c
> > @@ -23,11 +23,13 @@
> >  #include <rte_branch_prediction.h>
> >  #include <rte_malloc.h>
> >  #include <rte_ring.h>
> > +#include <rte_ring_elem.h>
> >  #include <rte_random.h>
> >  #include <rte_errno.h>
> >  #include <rte_hexdump.h>
> >
> >  #include "test.h"
> > +#include "test_ring.h"
> >
> >  /*
> >   * Ring
> 
> As you are changing a lot of things, maybe it's an opportunity to update or
> remove the comment at the beginning of the file.
I have removed specific comments. I have converted it into generic comments.

> 
> 
> > @@ -55,8 +57,6 @@
> >  #define RING_SIZE 4096
> >  #define MAX_BULK 32
> >
> > -static rte_atomic32_t synchro;
> > -
> >  #define	TEST_RING_VERIFY(exp)
> 		\
> >  	if (!(exp)) {							\
> >  		printf("error at %s:%d\tcondition " #exp " failed\n",	\
> > @@ -67,808 +67,792 @@ static rte_atomic32_t synchro;
> >
> >  #define	TEST_RING_FULL_EMTPY_ITER	8
> >
> > -/*
> > - * helper routine for test_ring_basic
> > - */
> > -static int
> > -test_ring_basic_full_empty(struct rte_ring *r, void * const src[],
> > void *dst[])
> > +static int esize[] = {-1, 4, 8, 16, 20};
> 
> it could be const
Yes

> 
> [...]
> 
> > +/*
> > + * Burst and bulk operations with sp/sc, mp/mc and default (during
> > +creation)
> > + * Random number of elements are enqueued and dequeued.
> > + */
> > +static int
> > +test_ring_burst_bulk_tests1(unsigned int api_type) {
> > +	struct rte_ring *r;
> > +	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
> > +	int ret;
> > +	unsigned int i, j;
> > +	int rand;
> > +	const unsigned int rsz = RING_SIZE - 1;
> >
> > -	/* check data */
> > -	if (memcmp(src, dst, cur_dst - dst)) {
> > -		rte_hexdump(stdout, "src", src, cur_src - src);
> > -		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
> > -		printf("data after dequeue is not the same\n");
> > -		goto fail;
> > -	}
> > +	for (i = 0; i < RTE_DIM(esize); i++) {
> > +		test_ring_print_test_string("Test standard ring", api_type,
> > +						esize[i]);
> >
> > -	cur_src = src;
> > -	cur_dst = dst;
> > +		/* Create the ring */
> > +		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
> > +					RING_SIZE, SOCKET_ID_ANY, 0);
> >
> > -	ret = rte_ring_mp_enqueue(r, cur_src);
> > -	if (ret != 0)
> > -		goto fail;
> > +		/* alloc dummy object pointers */
> > +		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > +		if (src == NULL)
> > +			goto fail;
> > +		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
> > +		cur_src = src;
> >
> > -	ret = rte_ring_mc_dequeue(r, cur_dst);
> > -	if (ret != 0)
> > -		goto fail;
> > +		/* alloc some room for copied objects */
> > +		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
> > +		if (dst == NULL)
> > +			goto fail;
> > +		cur_dst = dst;
> > +
> > +		printf("Random full/empty test\n");
> > +
> > +		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
> > +			/* random shift in the ring */
> > +			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
> > +			printf("%s: iteration %u, random shift: %u;\n",
> > +			    __func__, i, rand);
> > +			ret = test_ring_enqueue(r, cur_src, esize[i], rand,
> > +							api_type);
> > +			TEST_RING_VERIFY(ret != 0);
> > +
> > +			ret = test_ring_dequeue(r, cur_dst, esize[i], rand,
> > +							api_type);
> > +			TEST_RING_VERIFY(ret == rand);
> > +
> > +			/* fill the ring */
> > +			ret = test_ring_enqueue(r, cur_src, esize[i], rsz,
> > +							api_type);
> > +			TEST_RING_VERIFY(ret != 0);
> > +
> > +			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
> > +			TEST_RING_VERIFY(rsz == rte_ring_count(r));
> > +			TEST_RING_VERIFY(rte_ring_full(r));
> > +			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
> > +
> > +			/* empty the ring */
> > +			ret = test_ring_dequeue(r, cur_dst, esize[i], rsz,
> > +							api_type);
> > +			TEST_RING_VERIFY(ret == (int)rsz);
> > +			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
> > +			TEST_RING_VERIFY(rte_ring_count(r) == 0);
> > +			TEST_RING_VERIFY(rte_ring_full(r) == 0);
> > +			TEST_RING_VERIFY(rte_ring_empty(r));
> > +
> > +			/* check data */
> > +			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
> > +		}
> > +
> > +		/* Free memory before test completed */
> > +		rte_ring_free(r);
> > +		rte_free(src);
> > +		rte_free(dst);
> 
> I think they should be reset to NULL to avoid a double free if next iteration
> fails.
> 
> There are several places like this, I think it can be done even if not really
> needed.
Will change all of them

> 
> [...]
> 
> > diff --git a/app/test/test_ring.h b/app/test/test_ring.h new file mode
> > 100644 index 000000000..26716e4f8
> > --- /dev/null
> > +++ b/app/test/test_ring.h
> > @@ -0,0 +1,187 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2019 Arm Limited
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_ring.h>
> > +#include <rte_ring_elem.h>
> > +
> > +/* API type to call
> > + * rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>
> > + * TEST_RING_THREAD_DEF - Uses configured SPSC/MPMC calls
> > + * TEST_RING_THREAD_SPSC - Calls SP or SC API
> > + * TEST_RING_THREAD_MPMC - Calls MP or MC API  */ #define
> > +TEST_RING_THREAD_DEF 1 #define TEST_RING_THREAD_SPSC 2 #define
> > +TEST_RING_THREAD_MPMC 4
> > +
> > +/* API type to call
> > + * SL - Calls single element APIs
> > + * BL - Calls bulk APIs
> > + * BR - Calls burst APIs
> > + */
> 
> The comment was not updated according to macro name.
Will fix

> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  2020-01-17 17:12       ` Olivier Matz
@ 2020-01-18 16:28         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 16:28 UTC (permalink / raw)
  To: Olivier Matz
  Cc: sthemmin, jerinj, bruce.richardson, david.marchand, pbhagavatula,
	konstantin.ananyev, yipeng1.wang, dev, Dharmik Thakkar,
	Ruifeng Wang, Gavin Hu, nd, Honnappa Nagarahalli, nd

<snip>

> 
> On Wed, Jan 15, 2020 at 11:25:09PM -0600, Honnappa Nagarahalli wrote:
> > Adjust the performance test cases to test rte_ring_xxx_elem APIs.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  app/test/test_ring_perf.c | 454
> > +++++++++++++++++++++++---------------
> >  1 file changed, 273 insertions(+), 181 deletions(-)
> >
> > diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
> > index 6c2aca483..8d1217951 100644
> > --- a/app/test/test_ring_perf.c
> > +++ b/app/test/test_ring_perf.c
> 
> [...]
> 
> > -static int
> > -test_ring_perf(void)
> > +/* Run all tests for a given element size */ static
> > +__rte_always_inline int test_ring_perf_esize(const int esize)
> >  {
> >  	struct lcore_pair cores;
> >  	struct rte_ring *r = NULL;
> >
> > -	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
> > +	/*
> > +	 * Performance test for legacy/_elem APIs
> > +	 * SP-SC/MP-MC, single
> > +	 */
> > +	r = test_ring_create(RING_NAME, esize, RING_SIZE, rte_socket_id(),
> > +0);
> >  	if (r == NULL)
> >  		return -1;
> >
> > -	printf("### Testing single element and burst enq/deq ###\n");
> > -	test_single_enqueue_dequeue(r);
> > -	test_burst_enqueue_dequeue(r);
> > +	printf("\n### Testing single element enq/deq ###\n");
> > +	if (test_single_enqueue_dequeue(r, esize,
> > +			TEST_RING_THREAD_SPSC |
> TEST_RING_ELEM_SINGLE) < 0)
> > +		return -1;
> 
> the ring is not freed on error (same below)
Will fix.

<snip>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size
  2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
                     ` (14 preceding siblings ...)
  2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
@ 2020-01-18 19:32   ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
                       ` (6 more replies)
  15 siblings, 7 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The current rte_ring hard-codes the type of the ring element to 'void *',
hence the size of the element is hard-coded to 32b/64b. Since the ring
element type is not an input to rte_ring APIs, it results in couple
of issues:

1) If an application requires to store an element which is not 64b, it
   needs to write its own ring APIs similar to rte_event_ring APIs. This
   creates additional burden on the programmers, who end up making
   work-arounds and often waste memory.
2) If there are multiple libraries that store elements of the same
   type, currently they would have to write their own rte_ring APIs. This
   results in code duplication.

This patch adds new APIs to support configurable ring element size.
The APIs support custom element sizes by allowing to define the ring
element to be a multiple of 32b.

The aim is to achieve same performance as the existing ring
implementation.

v10
 - Improved comments in test case files (Olivier)
 - Fixed possible memory leaks (Olivier)
 - Changed 'test_ring_with_exact_size' to use unaligned
   addresses (Konstantin)
 - Changed the commit message for eventdev (Jerin)

v9
 - Split 'test_ring_burst_bulk_tests' test case into 4 smaller
   functions to address clang compilation time issue.
 - Addressed compilation failure in Intel CI in the hash changes.

v8
 - Changed the 128b copy elements inline function to use 'memcpy'
   to generate unaligned load/store instructions for x86. Generic
   copy function results in performance drop. (Konstantin)
 - Changed the API type #defines to be more clear (Konstantin)
 - Removed the code duplication in performance tests (Konstantin)
 - Fixed memory leak, changed test macros to inline functions (Konstantin)
 - Changed functional tests to test for 20B ring element. Fixed
   a bug in 32b element copy code for enqueue/dequeue(ring size
   needs to be normalized for 32b).
 - Squashed the functional and performance tests in their
   respective single commits.

v7
 - Merged the test cases to test both legacy APIs and
   rte_ring_xxx_elem APIs without code duplication (Konstantin, Olivier)
 - Performance test cases are merged as well (Konstantin, Olivier)
 - Macros to copy elements are converted into inline functions (Olivier)
 - Added back the changes to hash and event libraries

v6
 - Labelled as RFC to indicate the better status
 - Added unit tests to test the rte_ring_xxx_elem APIs
 - Corrected 'macro based partial memcpy' (5/6) patch
 - Added Konstantin's method after correction (6/6)
 - Check Patch shows significant warnings and errors mainly due
   copying code from existing test cases. None of them are harmful.
   I will fix them once we have an agreement.

v5
 - Use memcpy for chunks of 32B (Konstantin).
 - Both 'ring_perf_autotest' and 'ring_perf_elem_autotest' are available
   to compare the results easily.
 - Copying without memcpy is also available in 1/3, if anyone wants to
   experiment on their platform.
 - Added other platform owners to test on their respective platforms.

v4
 - Few fixes after more performance testing

v3
 - Removed macro-fest and used inline functions
   (Stephen, Bruce)

v2
 - Change Event Ring implementation to use ring templates
   (Jerin, Pavan)


Honnappa Nagarahalli (6):
  test/ring: use division for cycle count calculation
  lib/ring: apis to support configurable element size
  test/ring: add functional tests for rte_ring_xxx_elem APIs
  test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  lib/hash: use ring with 32b element size to save memory
  eventdev: use custom element size ring for event rings

 app/test/test_ring.c                 | 1383 +++++++++++++-------------
 app/test/test_ring.h                 |  187 ++++
 app/test/test_ring_perf.c            |  476 +++++----
 lib/librte_eventdev/rte_event_ring.c |  147 +--
 lib/librte_eventdev/rte_event_ring.h |   45 +-
 lib/librte_hash/rte_cuckoo_hash.c    |   94 +-
 lib/librte_hash/rte_cuckoo_hash.h    |    2 +-
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 +++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 13 files changed, 2279 insertions(+), 1109 deletions(-)
 create mode 100644 app/test/test_ring.h
 create mode 100644 lib/librte_ring/rte_ring_elem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use division instead of modulo operation to calculate more
accurate cycle count.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/test/test_ring_perf.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 70ee46ffe..6c2aca483 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -357,10 +357,10 @@ test_single_enqueue_dequeue(struct rte_ring *r)
 	}
 	const uint64_t mc_end = rte_rdtsc();
 
-	printf("SP/SC single enq/dequeue: %"PRIu64"\n",
-			(sc_end-sc_start) >> iter_shift);
-	printf("MP/MC single enq/dequeue: %"PRIu64"\n",
-			(mc_end-mc_start) >> iter_shift);
+	printf("SP/SC single enq/dequeue: %.2F\n",
+			((double)(sc_end-sc_start)) / iterations);
+	printf("MP/MC single enq/dequeue: %.2F\n",
+			((double)(mc_end-mc_start)) / iterations);
 }
 
 /*
@@ -395,13 +395,15 @@ test_burst_enqueue_dequeue(struct rte_ring *r)
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
-		uint64_t mc_avg = ((mc_end-mc_start) >> iter_shift) / bulk_sizes[sz];
-		uint64_t sc_avg = ((sc_end-sc_start) >> iter_shift) / bulk_sizes[sz];
+		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
+					bulk_sizes[sz];
+		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
+					bulk_sizes[sz];
 
-		printf("SP/SC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %"PRIu64"\n", bulk_sizes[sz],
-				mc_avg);
+		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], sc_avg);
+		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
+				bulk_sizes[sz], mc_avg);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 2/6] lib/ring: apis to support configurable element size
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Current APIs assume ring elements to be pointers. However, in many
use cases, the size can be different. Add new APIs to support
configurable ring element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/Makefile             |    3 +-
 lib/librte_ring/meson.build          |    4 +
 lib/librte_ring/rte_ring.c           |   41 +-
 lib/librte_ring/rte_ring.h           |    1 +
 lib/librte_ring/rte_ring_elem.h      | 1003 ++++++++++++++++++++++++++
 lib/librte_ring/rte_ring_version.map |    2 +
 6 files changed, 1045 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_elem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 22454b084..917c560ad 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_ring.a
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca8a435e9..f2f3ccc88 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,5 +3,9 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index d9b308036..3e15dc398 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -33,6 +33,7 @@
 #include <rte_tailq.h>
 
 #include "rte_ring.h"
+#include "rte_ring_elem.h"
 
 TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
 
@@ -46,23 +47,38 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
 {
 	ssize_t sz;
 
+	/* Check if element size is a multiple of 4B */
+	if (esize % 4 != 0) {
+		RTE_LOG(ERR, RING, "element size is not a multiple of 4\n");
+
+		return -EINVAL;
+	}
+
 	/* count must be a power of 2 */
 	if ((!POWEROF2(count)) || (count > RTE_RING_SZ_MASK )) {
 		RTE_LOG(ERR, RING,
-			"Requested size is invalid, must be power of 2, and "
-			"do not exceed the size limit %u\n", RTE_RING_SZ_MASK);
+			"Requested number of elements is invalid, must be power of 2, and not exceed %u\n",
+			RTE_RING_SZ_MASK);
+
 		return -EINVAL;
 	}
 
-	sz = sizeof(struct rte_ring) + count * sizeof(void *);
+	sz = sizeof(struct rte_ring) + count * esize;
 	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
 	return sz;
 }
 
+/* return the size of memory occupied by a ring */
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_ring_get_memsize_elem(sizeof(void *), count);
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
@@ -114,10 +130,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	return 0;
 }
 
-/* create the ring */
+/* create the ring for a given element size */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
-		unsigned flags)
+rte_ring_create_elem(const char *name, unsigned int esize, unsigned int count,
+		int socket_id, unsigned int flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	struct rte_ring *r;
@@ -135,7 +151,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	if (flags & RING_F_EXACT_SZ)
 		count = rte_align32pow2(count + 1);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_ring_get_memsize_elem(esize, count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -182,6 +198,15 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	return r;
 }
 
+/* create the ring */
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_ring_create_elem(name, sizeof(void *), count, socket_id,
+		flags);
+}
+
 /* free the ring */
 void
 rte_ring_free(struct rte_ring *r)
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..18fc5d845 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -216,6 +216,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
+
 /**
  * De-allocate all memory used by the ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
new file mode 100644
index 000000000..15d79bf2a
--- /dev/null
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -0,0 +1,1003 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2019 Arm Limited
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_ELEM_H_
+#define _RTE_RING_ELEM_H_
+
+/**
+ * @file
+ * RTE Ring with user defined element size
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+
+#include "rte_ring.h"
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Calculate the memory size needed for a ring with given element size
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it and the size of the element. This value
+ * is the sum of the size of the structure rte_ring and the size of the
+ * memory needed for storing the elements. The value is aligned to a cache
+ * line size.
+ *
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ */
+__rte_experimental
+ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a new ring named *name* that stores elements with given size.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - esize is not a multiple of 4 or count provided is not a
+ *		 power of 2.
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+__rte_experimental
+struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
+			unsigned int count, int socket_id, unsigned int flags);
+
+static __rte_always_inline void
+enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	const uint32_t *obj = (const uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+			ring[idx + 4] = obj[i + 4];
+			ring[idx + 5] = obj[i + 5];
+			ring[idx + 6] = obj[i + 6];
+			ring[idx + 7] = obj[i + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 6:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 5:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 4:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	const uint64_t *obj = (const uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			ring[idx] = obj[i];
+			ring[idx + 1] = obj[i + 1];
+			ring[idx + 2] = obj[i + 2];
+			ring[idx + 3] = obj[i + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 2:
+			ring[idx++] = obj[i++]; /* fallthrough */
+		case 1:
+			ring[idx++] = obj[i++];
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			ring[idx] = obj[i];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			ring[idx] = obj[i];
+	}
+}
+
+static __rte_always_inline void
+enqueue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		const void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	const rte_int128_t *obj = (const rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(ring + idx),
+				(const void *)(obj + i), 16);
+	}
+}
+
+/* the actual enqueue of elements on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		enqueue_elems_64(r, prod_head, obj_table, num);
+	else if (esize == 16)
+		enqueue_elems_128(r, prod_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = prod_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	uint32_t *ring = (uint32_t *)&r[1];
+	uint32_t *obj = (uint32_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+			obj[i + 4] = ring[idx + 4];
+			obj[i + 5] = ring[idx + 5];
+			obj[i + 6] = ring[idx + 6];
+			obj[i + 7] = ring[idx + 7];
+		}
+		switch (n & 0x7) {
+		case 7:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 6:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 5:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 4:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_64(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	uint64_t *ring = (uint64_t *)&r[1];
+	uint64_t *obj = (uint64_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
+			obj[i] = ring[idx];
+			obj[i + 1] = ring[idx + 1];
+			obj[i + 2] = ring[idx + 2];
+			obj[i + 3] = ring[idx + 3];
+		}
+		switch (n & 0x3) {
+		case 3:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 2:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		case 1:
+			obj[i++] = ring[idx++]; /* fallthrough */
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			obj[i] = ring[idx];
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			obj[i] = ring[idx];
+	}
+}
+
+static __rte_always_inline void
+dequeue_elems_128(struct rte_ring *r, uint32_t prod_head,
+		void *obj_table, uint32_t n)
+{
+	unsigned int i;
+	const uint32_t size = r->size;
+	uint32_t idx = prod_head & r->mask;
+	rte_int128_t *ring = (rte_int128_t *)&r[1];
+	rte_int128_t *obj = (rte_int128_t *)obj_table;
+	if (likely(idx + n < size)) {
+		for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 32);
+		switch (n & 0x1) {
+		case 1:
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		}
+	} else {
+		for (i = 0; idx < size; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+		/* Start at the beginning */
+		for (idx = 0; i < n; i++, idx++)
+			memcpy((void *)(obj + i), (void *)(ring + idx), 16);
+	}
+}
+
+/* the actual dequeue of elements from the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions.
+ */
+static __rte_always_inline void
+dequeue_elems(struct rte_ring *r, uint32_t cons_head, void *obj_table,
+		uint32_t esize, uint32_t num)
+{
+	/* 8B and 16B copies implemented individually to retain
+	 * the current performance.
+	 */
+	if (esize == 8)
+		dequeue_elems_64(r, cons_head, obj_table, num);
+	else if (esize == 16)
+		dequeue_elems_128(r, cons_head, obj_table, num);
+	else {
+		uint32_t idx, scale, nr_idx, nr_num, nr_size;
+
+		/* Normalize to uint32_t */
+		scale = esize / sizeof(uint32_t);
+		nr_num = num * scale;
+		idx = cons_head & r->mask;
+		nr_idx = idx * scale;
+		nr_size = r->size * scale;
+		dequeue_elems_32(r, nr_size, nr_idx, obj_table, nr_num);
+	}
+}
+
+/* Between load and load. there might be cpu reorder in weak model
+ * (powerpc/arm).
+ * There are 2 choices for the users
+ * 1.use rmb() memory barrier
+ * 2.use one-direction load_acquire/store_release barrier,defined by
+ * CONFIG_RTE_USE_C11_MEM_MODEL=y
+ * It depends on performance test results.
+ * By default, move common functions to rte_ring_generic.h
+ */
+#ifdef RTE_USE_C11_MEM_MODEL
+#include "rte_ring_c11_mem.h"
+#else
+#include "rte_ring_generic.h"
+#endif
+
+/**
+ * @internal Enqueue several objects on the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param is_sp
+ *   Indicates whether to use single producer or multi-producer head update
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sp,
+		unsigned int *free_space)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t free_entries;
+
+	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
+			&prod_head, &prod_next, &free_entries);
+	if (n == 0)
+		goto end;
+
+	enqueue_elems(r, prod_head, obj_table, esize, n);
+
+	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+end:
+	if (free_space != NULL)
+		*free_space = free_entries - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the ring
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param is_sc
+ *   Indicates whether to use single consumer or multi-consumer head update
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n,
+		enum rte_ring_queue_behavior behavior, unsigned int is_sc,
+		unsigned int *available)
+{
+	uint32_t cons_head, cons_next;
+	uint32_t entries;
+
+	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
+			&cons_head, &cons_next, &entries);
+	if (n == 0)
+		goto end;
+
+	dequeue_elems(r, cons_head, obj_table, esize, n);
+
+	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+
+end:
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_mp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_sp_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static __rte_always_inline int
+rte_ring_enqueue_elem(struct rte_ring *r, void *obj, unsigned int esize)
+{
+	return rte_ring_enqueue_bulk_elem(r, obj, esize, 1, NULL) ? 0 :
+								-ENOBUFS;
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_mc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL)  ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_sc_dequeue_elem(struct rte_ring *r, void *obj_p,
+				unsigned int esize)
+{
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static __rte_always_inline int
+rte_ring_dequeue_elem(struct rte_ring *r, void *obj_p, unsigned int esize)
+{
+	return rte_ring_dequeue_bulk_elem(r, obj_p, esize, 1, NULL) ? 0 :
+								-ENOENT;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring
+ *
+ * @warning This API is NOT multi-producers safe
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - Number of objects dequeued
+ */
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
+				RTE_RING_QUEUE_VARIABLE,
+				r->cons.single, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 89d84bcf4..7a5328dd5 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,6 +15,8 @@ DPDK_20.0 {
 EXPERIMENTAL {
 	global:
 
+	rte_ring_create_elem;
+	rte_ring_get_memsize_elem;
 	rte_ring_reset;
 
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Add basic infrastructure to test rte_ring_xxx_elem APIs.
Adjust the existing test cases to test for various ring
element sizes.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring.c | 1383 +++++++++++++++++++++---------------------
 app/test/test_ring.h |  187 ++++++
 2 files changed, 875 insertions(+), 695 deletions(-)
 create mode 100644 app/test/test_ring.h

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index aaf1e70ad..fbcd109b1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -23,40 +23,29 @@
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_random.h>
 #include <rte_errno.h>
 #include <rte_hexdump.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
  * Ring
  * ====
  *
- * #. Basic tests: done on one core:
+ * #. Functional tests. Tests single/bulk/burst, default/SPSC/MPMC,
+ *    legacy/custom element size (4B, 8B, 16B, 20B) APIs.
+ *    Some tests incorporate unaligned addresses for objects.
+ *    The enqueued/dequeued data is validated for correctness.
  *
- *    - Using single producer/single consumer functions:
- *
- *      - Enqueue one object, two objects, MAX_BULK objects
- *      - Dequeue one object, two objects, MAX_BULK objects
- *      - Check that dequeued pointers are correct
- *
- *    - Using multi producers/multi consumers functions:
- *
- *      - Enqueue one object, two objects, MAX_BULK objects
- *      - Dequeue one object, two objects, MAX_BULK objects
- *      - Check that dequeued pointers are correct
- *
- * #. Performance tests.
- *
- * Tests done in test_ring_perf.c
+ * #. Performance tests are in test_ring_perf.c
  */
 
 #define RING_SIZE 4096
 #define MAX_BULK 32
 
-static rte_atomic32_t synchro;
-
 #define	TEST_RING_VERIFY(exp)						\
 	if (!(exp)) {							\
 		printf("error at %s:%d\tcondition " #exp " failed\n",	\
@@ -67,808 +56,812 @@ static rte_atomic32_t synchro;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-/*
- * helper routine for test_ring_basic
- */
-static int
-test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[])
+static const int esize[] = {-1, 4, 8, 16, 20};
+
+static void**
+test_ring_inc_ptr(void **obj, int esize, unsigned int n)
 {
-	unsigned i, rand;
-	const unsigned rsz = RING_SIZE - 1;
-
-	printf("Basic full/empty test\n");
-
-	for (i = 0; TEST_RING_FULL_EMTPY_ITER != i; i++) {
-
-		/* random shift in the ring */
-		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
-		printf("%s: iteration %u, random shift: %u;\n",
-		    __func__, i, rand);
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
-				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
-				NULL) == rand);
-
-		/* fill the ring */
-		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
-		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
-		TEST_RING_VERIFY(rsz == rte_ring_count(r));
-		TEST_RING_VERIFY(rte_ring_full(r));
-		TEST_RING_VERIFY(0 == rte_ring_empty(r));
-
-		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
-				NULL) == rsz);
-		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_count(r));
-		TEST_RING_VERIFY(0 == rte_ring_full(r));
-		TEST_RING_VERIFY(rte_ring_empty(r));
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return ((void **)obj) + n;
+	else
+		return (void **)(((uint32_t *)obj) +
+					(n * esize / sizeof(uint32_t)));
+}
 
-		/* check data */
-		TEST_RING_VERIFY(0 == memcmp(src, dst, rsz));
-		rte_ring_dump(stdout, r);
-	}
-	return 0;
+static void
+test_ring_mem_init(void *obj, unsigned int count, int esize)
+{
+	unsigned int i;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		for (i = 0; i < count; i++)
+			((void **)obj)[i] = (void *)(unsigned long)i;
+	else
+		for (i = 0; i < (count * esize / sizeof(uint32_t)); i++)
+			((uint32_t *)obj)[i] = i;
 }
 
-static int
-test_ring_basic(struct rte_ring *r)
+static void
+test_ring_print_test_string(const char *istr, unsigned int api_type, int esize)
 {
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i, num_elems;
+	printf("\n%s: ", istr);
+
+	if (esize == -1)
+		printf("legacy APIs: ");
+	else
+		printf("elem APIs: element size %dB ", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if (api_type & TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if (api_type & TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if (api_type & TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if (api_type & TEST_RING_ELEM_SINGLE)
+		printf("single\n");
+	else if (api_type & TEST_RING_ELEM_BULK)
+		printf("bulk\n");
+	else if (api_type & TEST_RING_ELEM_BURST)
+		printf("burst\n");
+}
 
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
+/*
+ * Various negative test cases.
+ */
+static int
+test_ring_negative_tests(void)
+{
+	struct rte_ring *rp = NULL;
+	struct rte_ring *rt = NULL;
+	unsigned int i;
 
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret == 0)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret == 0)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret == 0)
-			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret == 0)
-			goto fail;
+	/* Test with esize not a multiple of 4 */
+	rp = test_ring_create("test_bad_element_size", 23,
+				RING_SIZE + 1, SOCKET_ID_ANY, 0);
+	if (rp != NULL) {
+		printf("Test failed to detect invalid element size\n");
+		goto test_fail;
 	}
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
 
-	if (test_ring_basic_full_empty(r, src, dst) != 0)
-		goto fail;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		/* Test if ring size is not power of 2 */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RING_SIZE + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect odd count\n");
+			goto test_fail;
+		}
+
+		/* Test if ring size is exceeding the limit */
+		rp = test_ring_create("test_bad_ring_size", esize[i],
+					RTE_RING_SZ_MASK + 1, SOCKET_ID_ANY, 0);
+		if (rp != NULL) {
+			printf("Test failed to detect limits\n");
+			goto test_fail;
+		}
+
+		/* Tests if lookup returns NULL on non-existing ring */
+		rp = rte_ring_lookup("ring_not_found");
+		if (rp != NULL && rte_errno != ENOENT) {
+			printf("Test failed to detect NULL ring lookup\n");
+			goto test_fail;
+		}
+
+		/* Test to if a non-power of 2 count causes the create
+		 * function to fail correctly
+		 */
+		rp = test_ring_create("test_ring_count", esize[i], 4097,
+					SOCKET_ID_ANY, 0);
+		if (rp != NULL)
+			goto test_fail;
+
+		rp = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("test_ring_negative fail to create ring\n");
+			goto test_fail;
+		}
+
+		if (rte_ring_lookup("test_ring_negative") != rp)
+			goto test_fail;
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("test_ring_nagative ring is not empty but it should be\n");
+			goto test_fail;
+		}
+
+		/* Tests if it would always fail to create ring with an used
+		 * ring name.
+		 */
+		rt = test_ring_create("test_ring_negative", esize[i], RING_SIZE,
+					SOCKET_ID_ANY, 0);
+		if (rt != NULL)
+			goto test_fail;
+
+		rte_ring_free(rp);
+		rp = NULL;
+	}
 
-	cur_src = src;
-	cur_dst = dst;
+	return 0;
 
-	printf("test default bulk enqueue / dequeue\n");
-	num_elems = 16;
+test_fail:
 
-	cur_src = src;
-	cur_dst = dst;
+	rte_ring_free(rp);
+	return -1;
+}
 
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems, NULL);
-	cur_src += num_elems;
-	if (ret == 0) {
-		printf("Cannot enqueue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue\n");
-		goto fail;
-	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
-	cur_dst += num_elems;
-	if (ret == 0) {
-		printf("Cannot dequeue2\n");
-		goto fail;
-	}
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Random number of elements are enqueued and dequeued.
+ */
+static int
+test_ring_burst_bulk_tests1(unsigned int api_type)
+{
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
+	int rand;
+	const unsigned int rsz = RING_SIZE - 1;
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	cur_src = src;
-	cur_dst = dst;
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-	ret = rte_ring_mp_enqueue(r, cur_src);
-	if (ret != 0)
-		goto fail;
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	ret = rte_ring_mc_dequeue(r, cur_dst);
-	if (ret != 0)
-		goto fail;
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Random full/empty test\n");
+
+		for (j = 0; j != TEST_RING_FULL_EMTPY_ITER; j++) {
+			/* random shift in the ring */
+			rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
+			printf("%s: iteration %u, random shift: %u;\n",
+			    __func__, i, rand);
+			ret = test_ring_enqueue(r, cur_src, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rand,
+							api_type);
+			TEST_RING_VERIFY(ret == rand);
+
+			/* fill the ring */
+			ret = test_ring_enqueue(r, cur_src, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret != 0);
+
+			TEST_RING_VERIFY(rte_ring_free_count(r) == 0);
+			TEST_RING_VERIFY(rsz == rte_ring_count(r));
+			TEST_RING_VERIFY(rte_ring_full(r));
+			TEST_RING_VERIFY(rte_ring_empty(r) == 0);
+
+			/* empty the ring */
+			ret = test_ring_dequeue(r, cur_dst, esize[i], rsz,
+							api_type);
+			TEST_RING_VERIFY(ret == (int)rsz);
+			TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
+			TEST_RING_VERIFY(rte_ring_count(r) == 0);
+			TEST_RING_VERIFY(rte_ring_full(r) == 0);
+			TEST_RING_VERIFY(rte_ring_empty(r));
+
+			/* check data */
+			TEST_RING_VERIFY(memcmp(src, dst, rsz) == 0);
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
+		r = NULL;
+		src = NULL;
+		dst = NULL;
+	}
 
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
+/*
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Sequence of simple enqueues/dequeues and validate the enqueued and
+ * dequeued data.
+ */
 static int
-test_ring_burst_basic(struct rte_ring *r)
+test_ring_burst_bulk_tests2(unsigned int api_type)
 {
+	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
 	int ret;
-	unsigned i;
-
-	/* alloc dummy object pointers */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
+	unsigned int i;
 
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-	cur_src = src;
-
-	/* alloc some room for copied objects */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-	cur_dst = dst;
-
-	printf("Test SP & SC basic functions \n");
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	cur_src = src;
-	cur_dst = dst;
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK - 1); i++) {
-		ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
 			goto fail;
-	}
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-	printf("Enqueue 2 objects, free entries = MAX_BULK - 2  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("Enqueue the remaining entries = MAX_BULK - 2  \n");
-	/* Always one free entry left */
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is full  \n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-	printf("Test enqueue for a full entry  \n");
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	if (ret != 0)
-		goto fail;
-
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
 			goto fail;
-	}
-
-	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	printf("Test if ring is empty \n");
-	/* Check if ring is empty */
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
-
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Test MP & MC basic functions \n");
-
-	printf("enqueue 1 obj\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 1, NULL);
-	cur_src += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("enqueue 2 objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("enqueue MAX_BULK objs\n");
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
-	cur_dst += 1;
-	if (ret != 1)
-		goto fail;
-
-	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK;
-	if (ret != MAX_BULK)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		cur_dst = dst;
 
-	cur_src = src;
-	cur_dst = dst;
+		printf("enqueue 1 obj\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 1);
 
-	printf("fill and empty the ring\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
-		if (ret != MAX_BULK)
+		printf("enqueue 2 objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
+
+		printf("enqueue MAX_BULK objs\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK);
 
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
-	}
+		printf("dequeue 1 obj\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 1, api_type);
+		if (ret != 1)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 1);
 
-	cur_src = src;
-	cur_dst = dst;
+		printf("dequeue 2 objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
 
-	printf("Test enqueue without enough memory space \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-		cur_src += MAX_BULK;
+		printf("dequeue MAX_BULK objs\n");
+		ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+						api_type);
 		if (ret != MAX_BULK)
 			goto fail;
-	}
-
-	/* Available memory space for the exact MAX_BULK objects */
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK);
 
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, MAX_BULK, NULL);
-	cur_src += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-
-	printf("Test dequeue without enough objects \n");
-	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-		cur_dst += MAX_BULK;
-		if (ret != MAX_BULK)
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
 			goto fail;
-	}
+		}
 
-	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
-	cur_dst += MAX_BULK - 3;
-	if (ret != MAX_BULK - 3)
-		goto fail;
-
-	/* check data */
-	if (memcmp(src, dst, cur_dst - dst)) {
-		rte_hexdump(stdout, "src", src, cur_src - src);
-		rte_hexdump(stdout, "dst", dst, cur_dst - dst);
-		printf("data after dequeue is not the same\n");
-		goto fail;
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
+		r = NULL;
+		src = NULL;
+		dst = NULL;
 	}
 
-	cur_src = src;
-	cur_dst = dst;
-
-	printf("Covering rte_ring_enqueue_burst functions \n");
-
-	ret = rte_ring_enqueue_burst(r, cur_src, 2, NULL);
-	cur_src += 2;
-	if (ret != 2)
-		goto fail;
-
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
-	cur_dst += 2;
-	if (ret != 2)
-		goto fail;
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
 	return 0;
-
- fail:
-	free(src);
-	free(dst);
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
 	return -1;
 }
 
 /*
- * it will always fail to create ring with a wrong ring size number in this function
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_creation_with_wrong_size(void)
+test_ring_burst_bulk_tests3(unsigned int api_type)
 {
-	struct rte_ring * rp = NULL;
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
 
-	/* Test if ring size is not power of 2 */
-	rp = rte_ring_create("test_bad_ring_size", RING_SIZE + 1, SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
+
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-	/* Test if ring size is exceeding the limit */
-	rp = rte_ring_create("test_bad_ring_size", (RTE_RING_SZ_MASK + 1), SOCKET_ID_ANY, 0);
-	if (NULL != rp) {
-		return -1;
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
+
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("fill and empty the ring\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK; j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
+		r = NULL;
+		src = NULL;
+		dst = NULL;
 	}
+
 	return 0;
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
 }
 
 /*
- * it tests if it would always fail to create ring with an used ring name
+ * Burst and bulk operations with sp/sc, mp/mc and default (during creation)
+ * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_creation_with_an_used_name(void)
+test_ring_burst_bulk_tests4(unsigned int api_type)
 {
-	struct rte_ring * rp;
+	struct rte_ring *r;
+	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
+	int ret;
+	unsigned int i, j;
+	unsigned int num_elems;
 
-	rp = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (NULL != rp)
-		return -1;
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test standard ring", api_type,
+						esize[i]);
 
-	return 0;
-}
+		/* Create the ring */
+		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
+					RING_SIZE, SOCKET_ID_ANY, 0);
 
-/*
- * Test to if a non-power of 2 count causes the create
- * function to fail correctly
- */
-static int
-test_create_count_odd(void)
-{
-	struct rte_ring *r = rte_ring_create("test_ring_count",
-			4097, SOCKET_ID_ANY, 0 );
-	if(r != NULL){
-		return -1;
-	}
-	return 0;
-}
+		/* alloc dummy object pointers */
+		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (src == NULL)
+			goto fail;
+		test_ring_mem_init(src, RING_SIZE * 2, esize[i]);
+		cur_src = src;
 
-static int
-test_lookup_null(void)
-{
-	struct rte_ring *rlp = rte_ring_lookup("ring_not_found");
-	if (rlp ==NULL)
-	if (rte_errno != ENOENT){
-		printf( "test failed to returnn error on null pointer\n");
-		return -1;
+		/* alloc some room for copied objects */
+		dst = test_ring_calloc(RING_SIZE * 2, esize[i]);
+		if (dst == NULL)
+			goto fail;
+		cur_dst = dst;
+
+		printf("Test enqueue without enough memory space\n");
+		for (j = 0; j < (RING_SIZE/MAX_BULK - 1); j++) {
+			ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_src = test_ring_inc_ptr(cur_src, esize[i],
+								MAX_BULK);
+		}
+
+		printf("Enqueue 2 objects, free entries = MAX_BULK - 2\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], 2);
+
+		printf("Enqueue the remaining entries = MAX_BULK - 3\n");
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		/* Always one free entry left */
+		ret = test_ring_enqueue(r, cur_src, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_src = test_ring_inc_ptr(cur_src, esize[i], MAX_BULK - 3);
+
+		printf("Test if ring is full\n");
+		if (rte_ring_full(r) != 1)
+			goto fail;
+
+		printf("Test enqueue for a full entry\n");
+		ret = test_ring_enqueue(r, cur_src, esize[i], MAX_BULK,
+						api_type);
+		if (ret != 0)
+			goto fail;
+
+		printf("Test dequeue without enough objects\n");
+		for (j = 0; j < RING_SIZE / MAX_BULK - 1; j++) {
+			ret = test_ring_dequeue(r, cur_dst, esize[i], MAX_BULK,
+							api_type);
+			if (ret != MAX_BULK)
+				goto fail;
+			cur_dst = test_ring_inc_ptr(cur_dst, esize[i],
+								MAX_BULK);
+		}
+
+		/* Available memory space for the exact MAX_BULK entries */
+		ret = test_ring_dequeue(r, cur_dst, esize[i], 2, api_type);
+		if (ret != 2)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], 2);
+
+		/* Bulk APIs enqueue exact number of elements */
+		if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+			num_elems = MAX_BULK - 3;
+		else
+			num_elems = MAX_BULK;
+		ret = test_ring_dequeue(r, cur_dst, esize[i], num_elems,
+						api_type);
+		if (ret != MAX_BULK - 3)
+			goto fail;
+		cur_dst = test_ring_inc_ptr(cur_dst, esize[i], MAX_BULK - 3);
+
+		printf("Test if ring is empty\n");
+		/* Check if ring is empty */
+		if (rte_ring_empty(r) != 1)
+			goto fail;
+
+		/* check data */
+		if (memcmp(src, dst, cur_dst - dst)) {
+			rte_hexdump(stdout, "src", src, cur_src - src);
+			rte_hexdump(stdout, "dst", dst, cur_dst - dst);
+			printf("data after dequeue is not the same\n");
+			goto fail;
+		}
+
+		/* Free memory before test completed */
+		rte_ring_free(r);
+		rte_free(src);
+		rte_free(dst);
+		r = NULL;
+		src = NULL;
+		dst = NULL;
 	}
+
 	return 0;
+fail:
+	rte_ring_free(r);
+	rte_free(src);
+	rte_free(dst);
+	return -1;
 }
 
 /*
- * it tests some more basic ring operations
+ * Test default, single element, bulk and burst APIs
  */
 static int
 test_ring_basic_ex(void)
 {
 	int ret = -1;
-	unsigned i;
+	unsigned int i, j;
 	struct rte_ring *rp = NULL;
-	void **obj = NULL;
-
-	obj = rte_calloc("test_ring_basic_ex_malloc", RING_SIZE, sizeof(void *), 0);
-	if (obj == NULL) {
-		printf("test_ring_basic_ex fail to rte_malloc\n");
-		goto fail_test;
-	}
-
-	rp = rte_ring_create("test_ring_basic_ex", RING_SIZE, SOCKET_ID_ANY,
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (rp == NULL) {
-		printf("test_ring_basic_ex fail to create ring\n");
-		goto fail_test;
-	}
-
-	if (rte_ring_lookup("test_ring_basic_ex") != rp) {
-		goto fail_test;
-	}
-
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
-
-	printf("%u ring entries are now free\n", rte_ring_free_count(rp));
+	void *obj = NULL;
+
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		obj = test_ring_calloc(RING_SIZE, esize[i]);
+		if (obj == NULL) {
+			printf("%s: failed to alloc memory\n", __func__);
+			goto fail_test;
+		}
+
+		rp = test_ring_create("test_ring_basic_ex", esize[i], RING_SIZE,
+					SOCKET_ID_ANY,
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (rp == NULL) {
+			printf("%s: failed to create ring\n", __func__);
+			goto fail_test;
+		}
+
+		if (rte_ring_lookup("test_ring_basic_ex") != rp) {
+			printf("%s: failed to find ring\n", __func__);
+			goto fail_test;
+		}
+
+		if (rte_ring_empty(rp) != 1) {
+			printf("%s: ring is not empty but it should be\n",
+				__func__);
+			goto fail_test;
+		}
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_enqueue(rp, obj[i]);
-	}
+		printf("%u ring entries are now free\n",
+			rte_ring_free_count(rp));
 
-	if (rte_ring_full(rp) != 1) {
-		printf("test_ring_basic_ex ring is not full but it should be\n");
-		goto fail_test;
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_enqueue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
 
-	for (i = 0; i < RING_SIZE; i ++) {
-		rte_ring_dequeue(rp, &obj[i]);
-	}
+		if (rte_ring_full(rp) != 1) {
+			printf("%s: ring is not full but it should be\n",
+				__func__);
+			goto fail_test;
+		}
 
-	if (rte_ring_empty(rp) != 1) {
-		printf("test_ring_basic_ex ring is not empty but it should be\n");
-		goto fail_test;
-	}
+		for (j = 0; j < RING_SIZE; j++) {
+			test_ring_dequeue(rp, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
 
-	/* Covering the ring burst operation */
-	ret = rte_ring_enqueue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_enqueue_burst fails \n");
-		goto fail_test;
+		if (rte_ring_empty(rp) != 1) {
+			printf("%s: ring is not empty but it should be\n",
+				__func__);
+			goto fail_test;
+		}
+
+		/* Following tests use the configured flags to decide
+		 * SP/SC or MP/MC.
+		 */
+		/* Covering the ring burst operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("%s: rte_ring_enqueue_burst fails\n", __func__);
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != 2) {
+			printf("%s: rte_ring_dequeue_burst fails\n", __func__);
+			goto fail_test;
+		}
+
+		/* Covering the ring bulk operation */
+		ret = test_ring_enqueue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("%s: rte_ring_enqueue_bulk fails\n", __func__);
+			goto fail_test;
+		}
+
+		ret = test_ring_dequeue(rp, obj, esize[i], 2,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK);
+		if (ret != 2) {
+			printf("%s: rte_ring_dequeue_bulk fails\n", __func__);
+			goto fail_test;
+		}
+
+		rte_ring_free(rp);
+		rte_free(obj);
+		rp = NULL;
+		obj = NULL;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
-	if (ret != 2) {
-		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
-		goto fail_test;
-	}
+	return 0;
 
-	ret = 0;
 fail_test:
 	rte_ring_free(rp);
 	if (obj != NULL)
 		rte_free(obj);
 
-	return ret;
+	return -1;
 }
 
+/*
+ * Basic test cases with exact size ring.
+ */
 static int
 test_ring_with_exact_size(void)
 {
-	struct rte_ring *std_ring = NULL, *exact_sz_ring = NULL;
-	void *ptr_array[16];
-	static const unsigned int ring_sz = RTE_DIM(ptr_array);
-	unsigned int i;
+	struct rte_ring *std_r = NULL, *exact_sz_r = NULL;
+	void *obj_orig;
+	void *obj;
+	const unsigned int ring_sz = 16;
+	unsigned int i, j;
 	int ret = -1;
 
-	std_ring = rte_ring_create("std", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ);
-	if (std_ring == NULL) {
-		printf("%s: error, can't create std ring\n", __func__);
-		goto end;
-	}
-	exact_sz_ring = rte_ring_create("exact sz", ring_sz, rte_socket_id(),
-			RING_F_SP_ENQ | RING_F_SC_DEQ | RING_F_EXACT_SZ);
-	if (exact_sz_ring == NULL) {
-		printf("%s: error, can't create exact size ring\n", __func__);
-		goto end;
-	}
+	for (i = 0; i < RTE_DIM(esize); i++) {
+		test_ring_print_test_string("Test exact size ring",
+				TEST_RING_IGNORE_API_TYPE,
+				esize[i]);
+
+		/* alloc object pointers. Allocate one extra object
+		 * and create an unaligned address.
+		 */
+		obj_orig = test_ring_calloc(17, esize[i]);
+		if (obj_orig == NULL)
+			goto test_fail;
+		obj = ((char *)obj_orig) + 1;
+
+		std_r = test_ring_create("std", esize[i], ring_sz,
+					rte_socket_id(),
+					RING_F_SP_ENQ | RING_F_SC_DEQ);
+		if (std_r == NULL) {
+			printf("%s: error, can't create std ring\n", __func__);
+			goto test_fail;
+		}
+		exact_sz_r = test_ring_create("exact sz", esize[i], ring_sz,
+				rte_socket_id(),
+				RING_F_SP_ENQ | RING_F_SC_DEQ |
+				RING_F_EXACT_SZ);
+		if (exact_sz_r == NULL) {
+			printf("%s: error, can't create exact size ring\n",
+					__func__);
+			goto test_fail;
+		}
+
+		/*
+		 * Check that the exact size ring is bigger than the
+		 * standard ring
+		 */
+		if (rte_ring_get_size(std_r) >= rte_ring_get_size(exact_sz_r)) {
+			printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
+					__func__,
+					rte_ring_get_size(std_r),
+					rte_ring_get_size(exact_sz_r));
+			goto test_fail;
+		}
+		/*
+		 * check that the exact_sz_ring can hold one more element
+		 * than the standard ring. (16 vs 15 elements)
+		 */
+		for (j = 0; j < ring_sz - 1; j++) {
+			test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+			test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		}
+		ret = test_ring_enqueue(std_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret != -ENOBUFS) {
+			printf("%s: error, unexpected successful enqueue\n",
+				__func__);
+			goto test_fail;
+		}
+		ret = test_ring_enqueue(exact_sz_r, obj, esize[i], 1,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE);
+		if (ret == -ENOBUFS) {
+			printf("%s: error, enqueue failed\n", __func__);
+			goto test_fail;
+		}
+
+		/* check that dequeue returns the expected number of elements */
+		ret = test_ring_dequeue(exact_sz_r, obj, esize[i], ring_sz,
+				TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST);
+		if (ret != (int)ring_sz) {
+			printf("%s: error, failed to dequeue expected nb of elements\n",
+				__func__);
+			goto test_fail;
+		}
 
-	/*
-	 * Check that the exact size ring is bigger than the standard ring
-	 */
-	if (rte_ring_get_size(std_ring) >= rte_ring_get_size(exact_sz_ring)) {
-		printf("%s: error, std ring (size: %u) is not smaller than exact size one (size %u)\n",
-				__func__,
-				rte_ring_get_size(std_ring),
-				rte_ring_get_size(exact_sz_ring));
-		goto end;
-	}
-	/*
-	 * check that the exact_sz_ring can hold one more element than the
-	 * standard ring. (16 vs 15 elements)
-	 */
-	for (i = 0; i < ring_sz - 1; i++) {
-		rte_ring_enqueue(std_ring, NULL);
-		rte_ring_enqueue(exact_sz_ring, NULL);
-	}
-	if (rte_ring_enqueue(std_ring, NULL) != -ENOBUFS) {
-		printf("%s: error, unexpected successful enqueue\n", __func__);
-		goto end;
-	}
-	if (rte_ring_enqueue(exact_sz_ring, NULL) == -ENOBUFS) {
-		printf("%s: error, enqueue failed\n", __func__);
-		goto end;
-	}
+		/* check that the capacity function returns expected value */
+		if (rte_ring_get_capacity(exact_sz_r) != ring_sz) {
+			printf("%s: error, incorrect ring capacity reported\n",
+					__func__);
+			goto test_fail;
+		}
 
-	/* check that dequeue returns the expected number of elements */
-	if (rte_ring_dequeue_burst(exact_sz_ring, ptr_array,
-			RTE_DIM(ptr_array), NULL) != ring_sz) {
-		printf("%s: error, failed to dequeue expected nb of elements\n",
-				__func__);
-		goto end;
+		rte_free(obj_orig);
+		rte_ring_free(std_r);
+		rte_ring_free(exact_sz_r);
+		obj_orig = NULL;
+		std_r = NULL;
+		exact_sz_r = NULL;
 	}
 
-	/* check that the capacity function returns expected value */
-	if (rte_ring_get_capacity(exact_sz_ring) != ring_sz) {
-		printf("%s: error, incorrect ring capacity reported\n",
-				__func__);
-		goto end;
-	}
+	return 0;
 
-	ret = 0; /* all ok if we get here */
-end:
-	rte_ring_free(std_ring);
-	rte_ring_free(exact_sz_ring);
-	return ret;
+test_fail:
+	rte_free(obj_orig);
+	rte_ring_free(std_r);
+	rte_ring_free(exact_sz_r);
+	return -1;
 }
 
 static int
 test_ring(void)
 {
-	struct rte_ring *r = NULL;
-
-	/* some more basic operations */
-	if (test_ring_basic_ex() < 0)
-		goto test_fail;
-
-	rte_atomic32_init(&synchro);
-
-	r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0);
-	if (r == NULL)
-		goto test_fail;
-
-	/* retrieve the ring from its name */
-	if (rte_ring_lookup("test") != r) {
-		printf("Cannot lookup ring from its name\n");
-		goto test_fail;
-	}
-
-	/* burst operations */
-	if (test_ring_burst_basic(r) < 0)
-		goto test_fail;
+	unsigned int i, j;
 
-	/* basic operations */
-	if (test_ring_basic(r) < 0)
+	/* Negative test cases */
+	if (test_ring_negative_tests() < 0)
 		goto test_fail;
 
-	/* basic operations */
-	if ( test_create_count_odd() < 0){
-		printf("Test failed to detect odd count\n");
-		goto test_fail;
-	} else
-		printf("Test detected odd count\n");
-
-	if ( test_lookup_null() < 0){
-		printf("Test failed to detect NULL ring lookup\n");
-		goto test_fail;
-	} else
-		printf("Test detected NULL ring lookup\n");
-
-	/* test of creating ring with wrong size */
-	if (test_ring_creation_with_wrong_size() < 0)
-		goto test_fail;
-
-	/* test of creation ring with an used name */
-	if (test_ring_creation_with_an_used_name() < 0)
+	/* Some basic operations */
+	if (test_ring_basic_ex() < 0)
 		goto test_fail;
 
 	if (test_ring_with_exact_size() < 0)
 		goto test_fail;
 
+	/* Burst and bulk operations with sp/sc, mp/mc and default.
+	 * The test cases are split into smaller test cases to
+	 * help clang compile faster.
+	 */
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests1(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests2(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests3(i | j) < 0)
+				goto test_fail;
+
+	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
+		for (i = TEST_RING_THREAD_DEF;
+					i <= TEST_RING_THREAD_MPMC; i <<= 1)
+			if (test_ring_burst_bulk_tests4(i | j) < 0)
+				goto test_fail;
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-	rte_ring_free(r);
-
 	return 0;
 
 test_fail:
-	rte_ring_free(r);
 
 	return -1;
 }
diff --git a/app/test/test_ring.h b/app/test/test_ring.h
new file mode 100644
index 000000000..aa6ae67ca
--- /dev/null
+++ b/app/test/test_ring.h
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Arm Limited
+ */
+
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+/* API type to call
+ * rte_ring_<sp/mp or sc/mc>_enqueue_<bulk/burst>
+ * TEST_RING_THREAD_DEF - Uses configured SPSC/MPMC calls
+ * TEST_RING_THREAD_SPSC - Calls SP or SC API
+ * TEST_RING_THREAD_MPMC - Calls MP or MC API
+ */
+#define TEST_RING_THREAD_DEF 1
+#define TEST_RING_THREAD_SPSC 2
+#define TEST_RING_THREAD_MPMC 4
+
+/* API type to call
+ * TEST_RING_ELEM_SINGLE - Calls single element APIs
+ * TEST_RING_ELEM_BULK - Calls bulk APIs
+ * TEST_RING_ELEM_BURST - Calls burst APIs
+ */
+#define TEST_RING_ELEM_SINGLE 8
+#define TEST_RING_ELEM_BULK 16
+#define TEST_RING_ELEM_BURST 32
+
+#define TEST_RING_IGNORE_API_TYPE ~0U
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static inline struct rte_ring*
+test_ring_create(const char *name, int esize, unsigned int count,
+		int socket_id, unsigned int flags)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		return rte_ring_create((name), (count), (socket_id), (flags));
+	else
+		return rte_ring_create_elem((name), (esize), (count),
+						(socket_id), (flags));
+}
+
+static __rte_always_inline unsigned int
+test_ring_enqueue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mp_enqueue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mp_enqueue_bulk_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mp_enqueue_burst_elem(r, obj, esize, n,
+								NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+static __rte_always_inline unsigned int
+test_ring_dequeue(struct rte_ring *r, void **obj, int esize, unsigned int n,
+			unsigned int api_type)
+{
+	/* Legacy queue APIs? */
+	if ((esize) == -1)
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue(r, obj);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue(r, obj);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue(r, obj);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk(r, obj, n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst(r, obj, n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst(r, obj, n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+	else
+		switch (api_type) {
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_SINGLE):
+			return rte_ring_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_sc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE):
+			return rte_ring_mc_dequeue_elem(r, obj, esize);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BULK):
+			return rte_ring_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK):
+			return rte_ring_sc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK):
+			return rte_ring_mc_dequeue_bulk_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_DEF | TEST_RING_ELEM_BURST):
+			return rte_ring_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST):
+			return rte_ring_sc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		case (TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST):
+			return rte_ring_mc_dequeue_burst_elem(r, obj, esize,
+								n, NULL);
+		default:
+			printf("Invalid API type\n");
+			return 0;
+		}
+}
+
+/* This function is placed here as it is required for both
+ * performance and functional tests.
+ */
+static __rte_always_inline void *
+test_ring_calloc(unsigned int rsize, int esize)
+{
+	unsigned int sz;
+	void *p;
+
+	/* Legacy queue APIs? */
+	if (esize == -1)
+		sz = sizeof(void *);
+	else
+		sz = esize;
+
+	p = rte_zmalloc(NULL, rsize * sz, RTE_CACHE_LINE_SIZE);
+	if (p == NULL)
+		printf("Failed to allocate memory\n");
+
+	return p;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 4/6] test/ring: modify perf test cases to use rte_ring_xxx_elem APIs
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Adjust the performance test cases to test rte_ring_xxx_elem APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_ring_perf.c | 478 +++++++++++++++++++++++---------------
 1 file changed, 285 insertions(+), 193 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 6c2aca483..ce23ee737 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -13,16 +13,11 @@
 #include <string.h>
 
 #include "test.h"
+#include "test_ring.h"
 
 /*
- * Ring
- * ====
- *
- * Measures performance of various operations using rdtsc
- *  * Empty ring dequeue
- *  * Enqueue/dequeue of bursts in 1 threads
- *  * Enqueue/dequeue of bursts in 2 threads
- *  * Enqueue/dequeue of bursts in all available threads
+ * Ring performance test cases, measures performance of various operations
+ * using rdtsc for legacy and 16B size ring elements.
  */
 
 #define RING_NAME "RING_PERF"
@@ -41,6 +36,35 @@ struct lcore_pair {
 
 static volatile unsigned lcore_count = 0;
 
+static void
+test_ring_print_test_string(unsigned int api_type, int esize,
+	unsigned int bsz, double value)
+{
+	if (esize == -1)
+		printf("legacy APIs");
+	else
+		printf("elem APIs: element size %dB", esize);
+
+	if (api_type == TEST_RING_IGNORE_API_TYPE)
+		return;
+
+	if ((api_type & TEST_RING_THREAD_DEF) == TEST_RING_THREAD_DEF)
+		printf(": default enqueue/dequeue: ");
+	else if ((api_type & TEST_RING_THREAD_SPSC) == TEST_RING_THREAD_SPSC)
+		printf(": SP/SC: ");
+	else if ((api_type & TEST_RING_THREAD_MPMC) == TEST_RING_THREAD_MPMC)
+		printf(": MP/MC: ");
+
+	if ((api_type & TEST_RING_ELEM_SINGLE) == TEST_RING_ELEM_SINGLE)
+		printf("single: ");
+	else if ((api_type & TEST_RING_ELEM_BULK) == TEST_RING_ELEM_BULK)
+		printf("bulk (size: %u): ", bsz);
+	else if ((api_type & TEST_RING_ELEM_BURST) == TEST_RING_ELEM_BURST)
+		printf("burst (size: %u): ", bsz);
+
+	printf("%.2F\n", value);
+}
+
 /**** Functions to analyse our core mask to get cores for different tests ***/
 
 static int
@@ -117,27 +141,21 @@ get_two_sockets(struct lcore_pair *lcp)
 
 /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */
 static void
-test_empty_dequeue(struct rte_ring *r)
+test_empty_dequeue(struct rte_ring *r, const int esize,
+			const unsigned int api_type)
 {
-	const unsigned iter_shift = 26;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 26;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst[MAX_BURST];
 
-	const uint64_t sc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t sc_end = rte_rdtsc();
+		test_ring_dequeue(r, burst, esize, bulk_sizes[0], api_type);
+	const uint64_t end = rte_rdtsc();
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
-	const uint64_t mc_end = rte_rdtsc();
-
-	printf("SC empty dequeue: %.2F\n",
-			(double)(sc_end-sc_start) / iterations);
-	printf("MC empty dequeue: %.2F\n",
-			(double)(mc_end-mc_start) / iterations);
+	test_ring_print_test_string(api_type, esize, bulk_sizes[0],
+					((double)(end - start)) / iterations);
 }
 
 /*
@@ -151,19 +169,21 @@ struct thread_params {
 };
 
 /*
- * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
- * thread running dequeue_bulk function
+ * Helper function to call bulk SP/MP enqueue functions.
+ * flag == 0 -> enqueue
+ * flag == 1 -> dequeue
  */
-static int
-enqueue_bulk(void *p)
+static __rte_always_inline int
+enqueue_dequeue_bulk_helper(const unsigned int flag, const int esize,
+	struct thread_params *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
+	int ret;
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	struct rte_ring *r = p->r;
+	unsigned int bsize = p->size;
+	unsigned int i;
+	void *burst = NULL;
 
 #ifdef RTE_USE_C11_MEM_MODEL
 	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
@@ -173,23 +193,67 @@ enqueue_bulk(void *p)
 		while(lcore_count != 2)
 			rte_pause();
 
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
+
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_SPSC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
+		do {
+			if (flag == 0)
+				ret = test_ring_enqueue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			else if (flag == 1)
+				ret = test_ring_dequeue(r, burst, esize, bsize,
+						TEST_RING_THREAD_MPMC |
+						TEST_RING_ELEM_BULK);
+			if (ret == 0)
+				rte_pause();
+		} while (!ret);
 	const uint64_t mp_end = rte_rdtsc();
 
-	params->spsc = ((double)(sp_end - sp_start))/(iterations*size);
-	params->mpmc = ((double)(mp_end - mp_start))/(iterations*size);
+	p->spsc = ((double)(sp_end - sp_start))/(iterations * bsize);
+	p->mpmc = ((double)(mp_end - mp_start))/(iterations * bsize);
 	return 0;
 }
 
+/*
+ * Function that uses rdtsc to measure timing for ring enqueue. Needs pair
+ * thread running dequeue_bulk function
+ */
+static int
+enqueue_bulk(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, -1, params);
+}
+
+static int
+enqueue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return enqueue_dequeue_bulk_helper(0, 16, params);
+}
+
 /*
  * Function that uses rdtsc to measure timing for ring dequeue. Needs pair
  * thread running enqueue_bulk function
@@ -197,49 +261,38 @@ enqueue_bulk(void *p)
 static int
 dequeue_bulk(void *p)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
 	struct thread_params *params = p;
-	struct rte_ring *r = params->r;
-	const unsigned size = params->size;
-	unsigned i;
-	void *burst[MAX_BURST] = {0};
-
-#ifdef RTE_USE_C11_MEM_MODEL
-	if (__atomic_add_fetch(&lcore_count, 1, __ATOMIC_RELAXED) != 2)
-#else
-	if (__sync_add_and_fetch(&lcore_count, 1) != 2)
-#endif
-		while(lcore_count != 2)
-			rte_pause();
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t sc_end = rte_rdtsc();
+	return enqueue_dequeue_bulk_helper(1, -1, params);
+}
 
-	const uint64_t mc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
-			rte_pause();
-	const uint64_t mc_end = rte_rdtsc();
+static int
+dequeue_bulk_16B(void *p)
+{
+	struct thread_params *params = p;
 
-	params->spsc = ((double)(sc_end - sc_start))/(iterations*size);
-	params->mpmc = ((double)(mc_end - mc_start))/(iterations*size);
-	return 0;
+	return enqueue_dequeue_bulk_helper(1, 16, params);
 }
 
 /*
  * Function that calls the enqueue and dequeue bulk functions on pairs of cores.
  * used to measure ring perf between hyperthreads, cores and sockets.
  */
-static void
-run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
-		lcore_function_t f1, lcore_function_t f2)
+static int
+run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 {
+	lcore_function_t *f1, *f2;
 	struct thread_params param1 = {0}, param2 = {0};
 	unsigned i;
+
+	if (esize == -1) {
+		f1 = enqueue_bulk;
+		f2 = dequeue_bulk;
+	} else {
+		f1 = enqueue_bulk_16B;
+		f2 = dequeue_bulk_16B;
+	}
+
 	for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) {
 		lcore_count = 0;
 		param1.size = param2.size = bulk_sizes[i];
@@ -251,14 +304,20 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r,
 		} else {
 			rte_eal_remote_launch(f1, &param1, cores->c1);
 			rte_eal_remote_launch(f2, &param2, cores->c2);
-			rte_eal_wait_lcore(cores->c1);
-			rte_eal_wait_lcore(cores->c2);
+			if (rte_eal_wait_lcore(cores->c1) < 0)
+				return -1;
+			if (rte_eal_wait_lcore(cores->c2) < 0)
+				return -1;
 		}
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.spsc + param2.spsc);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[i],
-				param1.mpmc + param2.mpmc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.spsc + param2.spsc);
+		test_ring_print_test_string(
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK,
+			esize, bulk_sizes[i], param1.mpmc + param2.mpmc);
 	}
+
+	return 0;
 }
 
 static rte_atomic32_t synchro;
@@ -267,7 +326,7 @@ static uint64_t queue_count[RTE_MAX_LCORE];
 #define TIME_MS 100
 
 static int
-load_loop_fn(void *p)
+load_loop_fn_helper(struct thread_params *p, const int esize)
 {
 	uint64_t time_diff = 0;
 	uint64_t begin = 0;
@@ -275,7 +334,11 @@ load_loop_fn(void *p)
 	uint64_t lcount = 0;
 	const unsigned int lcore = rte_lcore_id();
 	struct thread_params *params = p;
-	void *burst[MAX_BURST] = {0};
+	void *burst = NULL;
+
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
 	/* wait synchro for slaves */
 	if (lcore != rte_get_master_lcore())
@@ -284,22 +347,49 @@ load_loop_fn(void *p)
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
-		rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL);
-		rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL);
+		test_ring_enqueue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
+		test_ring_dequeue(params->r, burst, esize, params->size,
+				TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 		lcount++;
 		time_diff = rte_get_timer_cycles() - begin;
 	}
 	queue_count[lcore] = lcount;
+
+	rte_free(burst);
+
 	return 0;
 }
 
 static int
-run_on_all_cores(struct rte_ring *r)
+load_loop_fn(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, -1);
+}
+
+static int
+load_loop_fn_16B(void *p)
+{
+	struct thread_params *params = p;
+
+	return load_loop_fn_helper(params, 16);
+}
+
+static int
+run_on_all_cores(struct rte_ring *r, const int esize)
 {
 	uint64_t total = 0;
 	struct thread_params param;
+	lcore_function_t *lcore_f;
 	unsigned int i, c;
 
+	if (esize == -1)
+		lcore_f = load_loop_fn;
+	else
+		lcore_f = load_loop_fn_16B;
+
 	memset(&param, 0, sizeof(struct thread_params));
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
 		printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]);
@@ -308,13 +398,12 @@ run_on_all_cores(struct rte_ring *r)
 
 		/* clear synchro and start slaves */
 		rte_atomic32_set(&synchro, 0);
-		if (rte_eal_mp_remote_launch(load_loop_fn, &param,
-			SKIP_MASTER) < 0)
+		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MASTER) < 0)
 			return -1;
 
 		/* start synchro and launch test on master */
 		rte_atomic32_set(&synchro, 1);
-		load_loop_fn(&param);
+		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
 
@@ -335,155 +424,158 @@ run_on_all_cores(struct rte_ring *r)
  * Test function that determines how long an enqueue + dequeue of a single item
  * takes on a single lcore. Result is for comparison with the bulk enq+deq.
  */
-static void
-test_single_enqueue_dequeue(struct rte_ring *r)
+static int
+test_single_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 24;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned i = 0;
+	const unsigned int iter_shift = 24;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int i = 0;
 	void *burst = NULL;
 
-	const uint64_t sc_start = rte_rdtsc();
-	for (i = 0; i < iterations; i++) {
-		rte_ring_sp_enqueue(r, burst);
-		rte_ring_sc_dequeue(r, &burst);
-	}
-	const uint64_t sc_end = rte_rdtsc();
+	/* alloc dummy object pointers */
+	burst = test_ring_calloc(1, esize);
+	if (burst == NULL)
+		return -1;
 
-	const uint64_t mc_start = rte_rdtsc();
+	const uint64_t start = rte_rdtsc();
 	for (i = 0; i < iterations; i++) {
-		rte_ring_mp_enqueue(r, burst);
-		rte_ring_mc_dequeue(r, &burst);
+		test_ring_enqueue(r, burst, esize, 1, api_type);
+		test_ring_dequeue(r, burst, esize, 1, api_type);
 	}
-	const uint64_t mc_end = rte_rdtsc();
+	const uint64_t end = rte_rdtsc();
+
+	test_ring_print_test_string(api_type, esize, 1,
+					((double)(end - start)) / iterations);
 
-	printf("SP/SC single enq/dequeue: %.2F\n",
-			((double)(sc_end-sc_start)) / iterations);
-	printf("MP/MC single enq/dequeue: %.2F\n",
-			((double)(mc_end-mc_start)) / iterations);
+	rte_free(burst);
+
+	return 0;
 }
 
 /*
- * Test that does both enqueue and dequeue on a core using the burst() API calls
- * instead of the bulk() calls used in other tests. Results should be the same
- * as for the bulk function called on a single lcore.
+ * Test that does both enqueue and dequeue on a core using the burst/bulk API
+ * calls Results should be the same as for the bulk function called on a
+ * single lcore.
  */
-static void
-test_burst_enqueue_dequeue(struct rte_ring *r)
+static int
+test_burst_bulk_enqueue_dequeue(struct rte_ring *r, const int esize,
+	const unsigned int api_type)
 {
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
+	const unsigned int iter_shift = 23;
+	const unsigned int iterations = 1 << iter_shift;
+	unsigned int sz, i = 0;
+	void **burst = NULL;
 
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
+	burst = test_ring_calloc(MAX_BURST, esize);
+	if (burst == NULL)
+		return -1;
 
-		const uint64_t mc_start = rte_rdtsc();
+	for (sz = 0; sz < RTE_DIM(bulk_sizes); sz++) {
+		const uint64_t start = rte_rdtsc();
 		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_burst(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst,
-					bulk_sizes[sz], NULL);
+			test_ring_enqueue(r, burst, esize, bulk_sizes[sz],
+						api_type);
+			test_ring_dequeue(r, burst, esize, bulk_sizes[sz],
+						api_type);
 		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double mc_avg = ((double)(mc_end-mc_start) / iterations) /
-					bulk_sizes[sz];
-		double sc_avg = ((double)(sc_end-sc_start) / iterations) /
-					bulk_sizes[sz];
+		const uint64_t end = rte_rdtsc();
 
-		printf("SP/SC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], sc_avg);
-		printf("MP/MC burst enq/dequeue (size: %u): %.2F\n",
-				bulk_sizes[sz], mc_avg);
+		test_ring_print_test_string(api_type, esize, bulk_sizes[sz],
+					((double)(end - start)) / iterations);
 	}
-}
 
-/* Times enqueue and dequeue on a single lcore */
-static void
-test_bulk_enqueue_dequeue(struct rte_ring *r)
-{
-	const unsigned iter_shift = 23;
-	const unsigned iterations = 1<<iter_shift;
-	unsigned sz, i = 0;
-	void *burst[MAX_BURST] = {0};
-
-	for (sz = 0; sz < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); sz++) {
-		const uint64_t sc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_sp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t sc_end = rte_rdtsc();
-
-		const uint64_t mc_start = rte_rdtsc();
-		for (i = 0; i < iterations; i++) {
-			rte_ring_mp_enqueue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst,
-					bulk_sizes[sz], NULL);
-		}
-		const uint64_t mc_end = rte_rdtsc();
-
-		double sc_avg = ((double)(sc_end-sc_start) /
-				(iterations * bulk_sizes[sz]));
-		double mc_avg = ((double)(mc_end-mc_start) /
-				(iterations * bulk_sizes[sz]));
+	rte_free(burst);
 
-		printf("SP/SC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				sc_avg);
-		printf("MP/MC bulk enq/dequeue (size: %u): %.2F\n", bulk_sizes[sz],
-				mc_avg);
-	}
+	return 0;
 }
 
-static int
-test_ring_perf(void)
+/* Run all tests for a given element size */
+static __rte_always_inline int
+test_ring_perf_esize(const int esize)
 {
 	struct lcore_pair cores;
 	struct rte_ring *r = NULL;
 
-	r = rte_ring_create(RING_NAME, RING_SIZE, rte_socket_id(), 0);
+	/*
+	 * Performance test for legacy/_elem APIs
+	 * SP-SC/MP-MC, single
+	 */
+	r = test_ring_create(RING_NAME, esize, RING_SIZE, rte_socket_id(), 0);
 	if (r == NULL)
-		return -1;
-
-	printf("### Testing single element and burst enq/deq ###\n");
-	test_single_enqueue_dequeue(r);
-	test_burst_enqueue_dequeue(r);
-
-	printf("\n### Testing empty dequeue ###\n");
-	test_empty_dequeue(r);
-
-	printf("\n### Testing using a single lcore ###\n");
-	test_bulk_enqueue_dequeue(r);
+		goto test_fail;
+
+	printf("\n### Testing single element enq/deq ###\n");
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_SINGLE) < 0)
+		goto test_fail;
+	if (test_single_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_SINGLE) < 0)
+		goto test_fail;
+
+	printf("\n### Testing burst enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BURST) < 0)
+		goto test_fail;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BURST) < 0)
+		goto test_fail;
+
+	printf("\n### Testing bulk enq/deq ###\n");
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK) < 0)
+		goto test_fail;
+	if (test_burst_bulk_enqueue_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK) < 0)
+		goto test_fail;
+
+	printf("\n### Testing empty bulk deq ###\n");
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_SPSC | TEST_RING_ELEM_BULK);
+	test_empty_dequeue(r, esize,
+			TEST_RING_THREAD_MPMC | TEST_RING_ELEM_BULK);
 
 	if (get_two_hyperthreads(&cores) == 0) {
 		printf("\n### Testing using two hyperthreads ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			goto test_fail;
 	}
+
 	if (get_two_cores(&cores) == 0) {
 		printf("\n### Testing using two physical cores ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			goto test_fail;
 	}
 	if (get_two_sockets(&cores) == 0) {
 		printf("\n### Testing using two NUMA nodes ###\n");
-		run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk);
+		if (run_on_core_pair(&cores, r, esize) < 0)
+			goto test_fail;
 	}
 
 	printf("\n### Testing using all slave nodes ###\n");
-	run_on_all_cores(r);
+	if (run_on_all_cores(r, esize) < 0)
+		goto test_fail;
+
+	rte_ring_free(r);
+
+	return 0;
 
+test_fail:
 	rte_ring_free(r);
+
+	return -1;
+}
+
+static int
+test_ring_perf(void)
+{
+	/* Run all the tests for different element sizes */
+	if (test_ring_perf_esize(-1) == -1)
+		return -1;
+
+	if (test_ring_perf_esize(16) == -1)
+		return -1;
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 5/6] lib/hash: use ring with 32b element size to save memory
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 6/6] eventdev: use custom element size ring for event rings Honnappa Nagarahalli
  2020-01-19 19:31     ` [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size David Marchand
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

The freelist and external bucket indices are 32b. Using rings
that use 32b element sizes will save memory.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
---
 lib/librte_hash/rte_cuckoo_hash.c | 94 ++++++++++++++++---------------
 lib/librte_hash/rte_cuckoo_hash.h |  2 +-
 2 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 87a4c01f2..6c292b6f8 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -24,7 +24,7 @@
 #include <rte_cpuflags.h>
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
-#include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include <rte_compat.h>
 #include <rte_vect.h>
 #include <rte_tailq.h>
@@ -136,7 +136,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	char ring_name[RTE_RING_NAMESIZE];
 	char ext_ring_name[RTE_RING_NAMESIZE];
 	unsigned num_key_slots;
-	unsigned i;
 	unsigned int hw_trans_mem_support = 0, use_local_cache = 0;
 	unsigned int ext_table_support = 0;
 	unsigned int readwrite_concur_support = 0;
@@ -145,6 +144,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	uint32_t *ext_bkt_to_free = NULL;
 	uint32_t *tbl_chng_cnt = NULL;
 	unsigned int readwrite_concur_lf_support = 0;
+	uint32_t i;
 
 	rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
 
@@ -213,8 +213,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
 	/* Create ring (Dummy slot index is not enqueued) */
-	r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
-			params->socket_id, 0);
+	r = rte_ring_create_elem(ring_name, sizeof(uint32_t),
+			rte_align32pow2(num_key_slots), params->socket_id, 0);
 	if (r == NULL) {
 		RTE_LOG(ERR, HASH, "memory allocation failed\n");
 		goto err;
@@ -227,7 +227,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 	if (ext_table_support) {
 		snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
 								params->name);
-		r_ext = rte_ring_create(ext_ring_name,
+		r_ext = rte_ring_create_elem(ext_ring_name, sizeof(uint32_t),
 				rte_align32pow2(num_buckets + 1),
 				params->socket_id, 0);
 
@@ -295,7 +295,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 		 * for next bucket
 		 */
 		for (i = 1; i <= num_buckets; i++)
-			rte_ring_sp_enqueue(r_ext, (void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(r_ext, &i, sizeof(uint32_t));
 
 		if (readwrite_concur_lf_support) {
 			ext_bkt_to_free = rte_zmalloc(NULL, sizeof(uint32_t) *
@@ -434,7 +434,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 
 	/* Populate free slots ring. Entry zero is reserved for key misses. */
 	for (i = 1; i < num_key_slots; i++)
-		rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(r, &i, sizeof(uint32_t));
 
 	te->data = (void *) h;
 	TAILQ_INSERT_TAIL(hash_list, te, next);
@@ -598,13 +598,13 @@ rte_hash_reset(struct rte_hash *h)
 		tot_ring_cnt = h->entries;
 
 	for (i = 1; i < tot_ring_cnt + 1; i++)
-		rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
+		rte_ring_sp_enqueue_elem(h->free_slots, &i, sizeof(uint32_t));
 
 	/* Repopulate the free ext bkt ring. */
 	if (h->ext_table_support) {
 		for (i = 1; i <= h->num_buckets; i++)
-			rte_ring_sp_enqueue(h->free_ext_bkts,
-						(void *)((uintptr_t) i));
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &i,
+							sizeof(uint32_t));
 	}
 
 	if (h->use_local_cache) {
@@ -623,13 +623,14 @@ rte_hash_reset(struct rte_hash *h)
 static inline void
 enqueue_slot_back(const struct rte_hash *h,
 		struct lcore_cache *cached_free_slots,
-		void *slot_id)
+		uint32_t slot_id)
 {
 	if (h->use_local_cache) {
 		cached_free_slots->objs[cached_free_slots->len] = slot_id;
 		cached_free_slots->len++;
 	} else
-		rte_ring_sp_enqueue(h->free_slots, slot_id);
+		rte_ring_sp_enqueue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t));
 }
 
 /* Search a key from bucket and update its data.
@@ -923,9 +924,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	uint32_t prim_bucket_idx, sec_bucket_idx;
 	struct rte_hash_bucket *prim_bkt, *sec_bkt, *cur_bkt;
 	struct rte_hash_key *new_k, *keys = h->key_store;
-	void *slot_id = NULL;
-	void *ext_bkt_id = NULL;
-	uint32_t new_idx, bkt_id;
+	uint32_t slot_id;
+	uint32_t ext_bkt_id;
 	int ret;
 	unsigned n_slots;
 	unsigned lcore_id;
@@ -968,8 +968,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		/* Try to get a free slot from the local cache */
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
-			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
+			n_slots = rte_ring_mc_dequeue_burst_elem(h->free_slots,
 					cached_free_slots->objs,
+					sizeof(uint32_t),
 					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0) {
 				return -ENOSPC;
@@ -982,13 +983,13 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		cached_free_slots->len--;
 		slot_id = cached_free_slots->objs[cached_free_slots->len];
 	} else {
-		if (rte_ring_sc_dequeue(h->free_slots, &slot_id) != 0) {
+		if (rte_ring_sc_dequeue_elem(h->free_slots, &slot_id,
+						sizeof(uint32_t)) != 0) {
 			return -ENOSPC;
 		}
 	}
 
-	new_k = RTE_PTR_ADD(keys, (uintptr_t)slot_id * h->key_entry_size);
-	new_idx = (uint32_t)((uintptr_t) slot_id);
+	new_k = RTE_PTR_ADD(keys, slot_id * h->key_entry_size);
 	/* The store to application data (by the application) at *data should
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
@@ -1001,9 +1002,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Find an empty slot and insert */
 	ret = rte_hash_cuckoo_insert_mw(h, prim_bkt, sec_bkt, key, data,
-					short_sig, new_idx, &ret_val);
+					short_sig, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1011,9 +1012,9 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Primary bucket full, need to make space for new entry */
 	ret = rte_hash_cuckoo_make_space_mw(h, prim_bkt, sec_bkt, key, data,
-				short_sig, prim_bucket_idx, new_idx, &ret_val);
+				short_sig, prim_bucket_idx, slot_id, &ret_val);
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1021,10 +1022,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 
 	/* Also search secondary bucket to get better occupancy */
 	ret = rte_hash_cuckoo_make_space_mw(h, sec_bkt, prim_bkt, key, data,
-				short_sig, sec_bucket_idx, new_idx, &ret_val);
+				short_sig, sec_bucket_idx, slot_id, &ret_val);
 
 	if (ret == 0)
-		return new_idx - 1;
+		return slot_id - 1;
 	else if (ret == 1) {
 		enqueue_slot_back(h, cached_free_slots, slot_id);
 		return ret_val;
@@ -1067,10 +1068,10 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 				 * and key.
 				 */
 				__atomic_store_n(&cur_bkt->key_idx[i],
-						 new_idx,
+						 slot_id,
 						 __ATOMIC_RELEASE);
 				__hash_rw_writer_unlock(h);
-				return new_idx - 1;
+				return slot_id - 1;
 			}
 		}
 	}
@@ -1078,26 +1079,26 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 	/* Failed to get an empty entry from extendable buckets. Link a new
 	 * extendable bucket. We first get a free bucket from ring.
 	 */
-	if (rte_ring_sc_dequeue(h->free_ext_bkts, &ext_bkt_id) != 0) {
+	if (rte_ring_sc_dequeue_elem(h->free_ext_bkts, &ext_bkt_id,
+						sizeof(uint32_t)) != 0) {
 		ret = -ENOSPC;
 		goto failure;
 	}
 
-	bkt_id = (uint32_t)((uintptr_t)ext_bkt_id) - 1;
 	/* Use the first location of the new bucket */
-	(h->buckets_ext[bkt_id]).sig_current[0] = short_sig;
+	(h->buckets_ext[ext_bkt_id - 1]).sig_current[0] = short_sig;
 	/* Store to signature and key should not leak after
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[bkt_id]).key_idx[0],
-			 new_idx,
+	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+			 slot_id,
 			 __ATOMIC_RELEASE);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
-	last->next = &h->buckets_ext[bkt_id];
+	last->next = &h->buckets_ext[ext_bkt_id - 1];
 	__hash_rw_writer_unlock(h);
-	return new_idx - 1;
+	return slot_id - 1;
 
 failure:
 	__hash_rw_writer_unlock(h);
@@ -1373,8 +1374,9 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			ERR_IF_TRUE((n_slots == 0),
 				"%s: could not enqueue free slots in global ring\n",
@@ -1383,11 +1385,11 @@ remove_entry(const struct rte_hash *h, struct rte_hash_bucket *bkt, unsigned i)
 		}
 		/* Put index of new free slot in cache. */
 		cached_free_slots->objs[cached_free_slots->len] =
-				(void *)((uintptr_t)bkt->key_idx[i]);
+							bkt->key_idx[i];
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)bkt->key_idx[i]));
+		rte_ring_sp_enqueue_elem(h->free_slots,
+				&bkt->key_idx[i], sizeof(uint32_t));
 	}
 }
 
@@ -1551,7 +1553,8 @@ __rte_hash_del_key_with_hash(const struct rte_hash *h, const void *key,
 			 */
 			h->ext_bkt_to_free[ret] = index;
 		else
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 	}
 	__hash_rw_writer_unlock(h);
 	return ret;
@@ -1614,7 +1617,8 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		uint32_t index = h->ext_bkt_to_free[position];
 		if (index) {
 			/* Recycle empty ext bkt to free list. */
-			rte_ring_sp_enqueue(h->free_ext_bkts, (void *)(uintptr_t)index);
+			rte_ring_sp_enqueue_elem(h->free_ext_bkts, &index,
+							sizeof(uint32_t));
 			h->ext_bkt_to_free[position] = 0;
 		}
 	}
@@ -1625,19 +1629,19 @@ rte_hash_free_key_with_position(const struct rte_hash *h,
 		/* Cache full, need to free it. */
 		if (cached_free_slots->len == LCORE_CACHE_SIZE) {
 			/* Need to enqueue the free slots in global ring. */
-			n_slots = rte_ring_mp_enqueue_burst(h->free_slots,
+			n_slots = rte_ring_mp_enqueue_burst_elem(h->free_slots,
 						cached_free_slots->objs,
+						sizeof(uint32_t),
 						LCORE_CACHE_SIZE, NULL);
 			RETURN_IF_TRUE((n_slots == 0), -EFAULT);
 			cached_free_slots->len -= n_slots;
 		}
 		/* Put index of new free slot in cache. */
-		cached_free_slots->objs[cached_free_slots->len] =
-					(void *)((uintptr_t)key_idx);
+		cached_free_slots->objs[cached_free_slots->len] = key_idx;
 		cached_free_slots->len++;
 	} else {
-		rte_ring_sp_enqueue(h->free_slots,
-				(void *)((uintptr_t)key_idx));
+		rte_ring_sp_enqueue_elem(h->free_slots, &key_idx,
+						sizeof(uint32_t));
 	}
 
 	return 0;
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index fb19bb27d..345de6bf9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -124,7 +124,7 @@ const rte_hash_cmp_eq_t cmp_jump_table[NUM_KEY_CMP_CASES] = {
 
 struct lcore_cache {
 	unsigned len; /**< Cache len */
-	void *objs[LCORE_CACHE_SIZE]; /**< Cache objects */
+	uint32_t objs[LCORE_CACHE_SIZE]; /**< Cache objects */
 } __rte_cache_aligned;
 
 /* Structure that stores key-value pair */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [dpdk-dev] [PATCH v10 6/6] eventdev: use custom element size ring for event rings
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
@ 2020-01-18 19:32     ` Honnappa Nagarahalli
  2020-01-19 19:31     ` [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size David Marchand
  6 siblings, 0 replies; 173+ messages in thread
From: Honnappa Nagarahalli @ 2020-01-18 19:32 UTC (permalink / raw)
  To: olivier.matz, sthemmin, jerinj, bruce.richardson, david.marchand,
	pbhagavatula, konstantin.ananyev, yipeng1.wang, drc,
	honnappa.nagarahalli
  Cc: dev, dharmik.thakkar, ruifeng.wang, gavin.hu, nd

Use custom element size ring APIs to replace event ring
implementation. This avoids code duplication.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/librte_eventdev/rte_event_ring.c | 147 ++-------------------------
 lib/librte_eventdev/rte_event_ring.h |  45 ++++----
 2 files changed, 24 insertions(+), 168 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_ring.c b/lib/librte_eventdev/rte_event_ring.c
index 50190de01..d27e23901 100644
--- a/lib/librte_eventdev/rte_event_ring.c
+++ b/lib/librte_eventdev/rte_event_ring.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <sys/queue.h>
@@ -11,13 +12,6 @@
 #include <rte_eal_memconfig.h>
 #include "rte_event_ring.h"
 
-TAILQ_HEAD(rte_event_ring_list, rte_tailq_entry);
-
-static struct rte_tailq_elem rte_event_ring_tailq = {
-	.name = RTE_TAILQ_EVENT_RING_NAME,
-};
-EAL_REGISTER_TAILQ(rte_event_ring_tailq)
-
 int
 rte_event_ring_init(struct rte_event_ring *r, const char *name,
 	unsigned int count, unsigned int flags)
@@ -35,150 +29,21 @@ struct rte_event_ring *
 rte_event_ring_create(const char *name, unsigned int count, int socket_id,
 		unsigned int flags)
 {
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	struct rte_event_ring *r;
-	struct rte_tailq_entry *te;
-	const struct rte_memzone *mz;
-	ssize_t ring_size;
-	int mz_flags = 0;
-	struct rte_event_ring_list *ring_list = NULL;
-	const unsigned int requested_count = count;
-	int ret;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-		rte_event_ring_list);
-
-	/* for an exact size ring, round up from count to a power of two */
-	if (flags & RING_F_EXACT_SZ)
-		count = rte_align32pow2(count + 1);
-	else if (!rte_is_power_of_2(count)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	ring_size = sizeof(*r) + (count * sizeof(struct rte_event));
-
-	ret = snprintf(mz_name, sizeof(mz_name), "%s%s",
-		RTE_RING_MZ_PREFIX, name);
-	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-		rte_errno = ENAMETOOLONG;
-		return NULL;
-	}
-
-	te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
-	if (te == NULL) {
-		RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	rte_mcfg_tailq_write_lock();
-
-	/*
-	 * reserve a memory zone for this ring. If we can't get rte_config or
-	 * we are secondary process, the memzone_reserve function will set
-	 * rte_errno for us appropriately - hence no check in this this function
-	 */
-	mz = rte_memzone_reserve(mz_name, ring_size, socket_id, mz_flags);
-	if (mz != NULL) {
-		r = mz->addr;
-		/* Check return value in case rte_ring_init() fails on size */
-		int err = rte_event_ring_init(r, name, requested_count, flags);
-		if (err) {
-			RTE_LOG(ERR, RING, "Ring init failed\n");
-			if (rte_memzone_free(mz) != 0)
-				RTE_LOG(ERR, RING, "Cannot free memzone\n");
-			rte_free(te);
-			rte_mcfg_tailq_write_unlock();
-			return NULL;
-		}
-
-		te->data = (void *) r;
-		r->r.memzone = mz;
-
-		TAILQ_INSERT_TAIL(ring_list, te, next);
-	} else {
-		r = NULL;
-		RTE_LOG(ERR, RING, "Cannot reserve memory\n");
-		rte_free(te);
-	}
-	rte_mcfg_tailq_write_unlock();
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_create_elem(name,
+						sizeof(struct rte_event),
+						count, socket_id, flags);
 }
 
 
 struct rte_event_ring *
 rte_event_ring_lookup(const char *name)
 {
-	struct rte_tailq_entry *te;
-	struct rte_event_ring *r = NULL;
-	struct rte_event_ring_list *ring_list;
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-
-	rte_mcfg_tailq_read_lock();
-
-	TAILQ_FOREACH(te, ring_list, next) {
-		r = (struct rte_event_ring *) te->data;
-		if (strncmp(name, r->r.name, RTE_RING_NAMESIZE) == 0)
-			break;
-	}
-
-	rte_mcfg_tailq_read_unlock();
-
-	if (te == NULL) {
-		rte_errno = ENOENT;
-		return NULL;
-	}
-
-	return r;
+	return (struct rte_event_ring *)rte_ring_lookup(name);
 }
 
 /* free the ring */
 void
 rte_event_ring_free(struct rte_event_ring *r)
 {
-	struct rte_event_ring_list *ring_list = NULL;
-	struct rte_tailq_entry *te;
-
-	if (r == NULL)
-		return;
-
-	/*
-	 * Ring was not created with rte_event_ring_create,
-	 * therefore, there is no memzone to free.
-	 */
-	if (r->r.memzone == NULL) {
-		RTE_LOG(ERR, RING,
-			"Cannot free ring (not created with rte_event_ring_create()");
-		return;
-	}
-
-	if (rte_memzone_free(r->r.memzone) != 0) {
-		RTE_LOG(ERR, RING, "Cannot free memory\n");
-		return;
-	}
-
-	ring_list = RTE_TAILQ_CAST(rte_event_ring_tailq.head,
-			rte_event_ring_list);
-	rte_mcfg_tailq_write_lock();
-
-	/* find out tailq entry */
-	TAILQ_FOREACH(te, ring_list, next) {
-		if (te->data == (void *) r)
-			break;
-	}
-
-	if (te == NULL) {
-		rte_mcfg_tailq_write_unlock();
-		return;
-	}
-
-	TAILQ_REMOVE(ring_list, te, next);
-
-	rte_mcfg_tailq_write_unlock();
-
-	rte_free(te);
+	rte_ring_free((struct rte_ring *)r);
 }
diff --git a/lib/librte_eventdev/rte_event_ring.h b/lib/librte_eventdev/rte_event_ring.h
index 827a3209e..c0861b0ec 100644
--- a/lib/librte_eventdev/rte_event_ring.h
+++ b/lib/librte_eventdev/rte_event_ring.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2016-2017 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 /**
@@ -19,6 +20,7 @@
 #include <rte_memory.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
@@ -88,22 +90,17 @@ rte_event_ring_enqueue_burst(struct rte_event_ring *r,
 		const struct rte_event *events,
 		unsigned int n, uint16_t *free_space)
 {
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
+	unsigned int num;
+	uint32_t space;
 
-	n = __rte_ring_move_prod_head(&r->r, r->r.prod.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
+	num = rte_ring_enqueue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&space);
 
-	ENQUEUE_PTRS(&r->r, &r[1], prod_head, events, n, struct rte_event);
-
-	update_tail(&r->r.prod, prod_head, prod_next, r->r.prod.single, 1);
-end:
 	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
+		*free_space = space;
+
+	return num;
 }
 
 /**
@@ -129,23 +126,17 @@ rte_event_ring_dequeue_burst(struct rte_event_ring *r,
 		struct rte_event *events,
 		unsigned int n, uint16_t *available)
 {
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(&r->r, r->r.cons.single, n,
-			RTE_RING_QUEUE_VARIABLE,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
+	unsigned int num;
+	uint32_t remaining;
 
-	DEQUEUE_PTRS(&r->r, &r[1], cons_head, events, n, struct rte_event);
+	num = rte_ring_dequeue_burst_elem(&r->r, events,
+				sizeof(struct rte_event), n,
+				&remaining);
 
-	update_tail(&r->r.cons, cons_head, cons_next, r->r.cons.single, 0);
-
-end:
 	if (available != NULL)
-		*available = entries - n;
-	return n;
+		*available = remaining;
+
+	return num;
 }
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size
  2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
                       ` (5 preceding siblings ...)
  2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 6/6] eventdev: use custom element size ring for event rings Honnappa Nagarahalli
@ 2020-01-19 19:31     ` David Marchand
  6 siblings, 0 replies; 173+ messages in thread
From: David Marchand @ 2020-01-19 19:31 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Olivier Matz, Stephen Hemminger, Jerin Jacob Kollanukkaran,
	Bruce Richardson, Pavan Nikhilesh, Ananyev, Konstantin, Wang,
	Yipeng1, David Christensen, dev, Dharmik Thakkar,
	Ruifeng Wang (Arm Technology China),
	Gavin Hu, nd

On Sat, Jan 18, 2020 at 8:33 PM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> The current rte_ring hard-codes the type of the ring element to 'void *',
> hence the size of the element is hard-coded to 32b/64b. Since the ring
> element type is not an input to rte_ring APIs, it results in couple
> of issues:
>
> 1) If an application requires to store an element which is not 64b, it
>    needs to write its own ring APIs similar to rte_event_ring APIs. This
>    creates additional burden on the programmers, who end up making
>    work-arounds and often waste memory.
> 2) If there are multiple libraries that store elements of the same
>    type, currently they would have to write their own rte_ring APIs. This
>    results in code duplication.
>
> This patch adds new APIs to support configurable ring element size.
> The APIs support custom element sizes by allowing to define the ring
> element to be a multiple of 32b.
>
> The aim is to achieve same performance as the existing ring
> implementation.
>
> v10
>  - Improved comments in test case files (Olivier)
>  - Fixed possible memory leaks (Olivier)
>  - Changed 'test_ring_with_exact_size' to use unaligned
>    addresses (Konstantin)
>  - Changed the commit message for eventdev (Jerin)

Thanks for working on this and a big thanks to all reviewers too.

The CI has been switched to Ubuntu 18.04, so that we won't hit the
Travis timeout with clang 7.
There is still some work on the ABI checks, because of the abidiff
report on rte_cuckoo_hash.h I mentioned: passing the public headers to
abidw/abidiff should do the trick.


Series applied.

--
David Marchand


^ permalink raw reply	[flat|nested] 173+ messages in thread

end of thread, other threads:[~2020-01-19 19:32 UTC | newest]

Thread overview: 173+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-28 14:46 [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 1/5] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 2/5] lib/ring: add template to support different element sizes Honnappa Nagarahalli
2019-10-01 11:47   ` Ananyev, Konstantin
2019-10-02  4:21     ` Honnappa Nagarahalli
2019-10-02  8:39       ` Ananyev, Konstantin
2019-10-03  3:33         ` Honnappa Nagarahalli
2019-10-03 11:51           ` Ananyev, Konstantin
2019-10-03 12:27             ` Ananyev, Konstantin
2019-10-03 22:49               ` Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 3/5] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 4/5] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
2019-08-28 14:46 ` [dpdk-dev] [PATCH 5/5] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-08-28 15:12 ` [dpdk-dev] [PATCH 0/5] lib/ring: templates to support custom element size Jerin Jacob Kollanukkaran
2019-08-28 15:16 ` Pavan Nikhilesh Bhagavatula
2019-08-28 22:59   ` Honnappa Nagarahalli
2019-09-06 19:05 ` [dpdk-dev] [PATCH v2 0/6] " Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 1/6] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add template to support different element sizes Honnappa Nagarahalli
2019-09-08 19:44     ` Stephen Hemminger
2019-09-09  9:01       ` Bruce Richardson
2019-09-09 22:33         ` Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 3/6] tools/checkpatch: relax constraints on __rte_experimental Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 4/6] lib/ring: add ring APIs to support 32b ring elements Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-09-06 19:05   ` [dpdk-dev] [PATCH v2 6/6] lib/eventdev: use ring templates for event rings Honnappa Nagarahalli
2019-09-09 13:04   ` [dpdk-dev] [PATCH v2 0/6] lib/ring: templates to support custom element size Aaron Conole
2019-10-07 13:49   ` David Marchand
2019-10-08 19:19   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs " Honnappa Nagarahalli
2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-08 19:19     ` [dpdk-dev] [PATCH v3 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-09  2:47   ` [dpdk-dev] [PATCH v3 0/2] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-11 19:21       ` Honnappa Nagarahalli
2019-10-14 19:41         ` Ananyev, Konstantin
2019-10-14 23:56           ` Honnappa Nagarahalli
2019-10-15  9:34             ` Ananyev, Konstantin
2019-10-17  4:46               ` Honnappa Nagarahalli
2019-10-17 11:51                 ` Ananyev, Konstantin
2019-10-17 20:16                   ` Honnappa Nagarahalli
2019-10-17 23:17                     ` David Christensen
2019-10-18  3:18                       ` Honnappa Nagarahalli
2019-10-18  8:04                         ` Jerin Jacob
2019-10-18 16:11                           ` Jerin Jacob
2019-10-21  0:27                             ` Honnappa Nagarahalli
2019-10-18 16:44                           ` Ananyev, Konstantin
2019-10-18 19:03                             ` Honnappa Nagarahalli
2019-10-21  0:36                             ` Honnappa Nagarahalli
2019-10-21  9:04                               ` Ananyev, Konstantin
2019-10-22 15:59                                 ` Ananyev, Konstantin
2019-10-22 17:57                                   ` Ananyev, Konstantin
2019-10-23 18:58                                     ` Honnappa Nagarahalli
2019-10-18 17:23                         ` David Christensen
2019-10-09  2:47     ` [dpdk-dev] [PATCH v4 2/2] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-17 20:08   ` [dpdk-dev] [PATCH v5 0/3] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 1/3] lib/ring: apis to support configurable " Honnappa Nagarahalli
2019-10-17 20:39       ` Stephen Hemminger
2019-10-17 20:40       ` Stephen Hemminger
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 2/3] test/ring: add test cases for configurable element size ring Honnappa Nagarahalli
2019-10-17 20:08     ` [dpdk-dev] [PATCH v5 3/3] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
2019-10-21  0:22   ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2019-10-23  9:49       ` Olivier Matz
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2019-10-23  9:59       ` Olivier Matz
2019-10-23 19:12         ` Honnappa Nagarahalli
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 3/6] test/ring: add functional tests for configurable element size ring Honnappa Nagarahalli
2019-10-23 10:01       ` Olivier Matz
2019-10-23 11:12         ` Ananyev, Konstantin
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 4/6] test/ring: add perf " Honnappa Nagarahalli
2019-10-23 10:02       ` Olivier Matz
2019-10-21  0:22     ` [dpdk-dev] [RFC v6 5/6] lib/ring: copy ring elements using memcpy partially Honnappa Nagarahalli
2019-10-21  0:23     ` [dpdk-dev] [RFC v6 6/6] lib/ring: improved copy function to copy ring elements Honnappa Nagarahalli
2019-10-23 10:05       ` Olivier Matz
2019-10-23  9:48     ` [dpdk-dev] [RFC v6 0/6] lib/ring: APIs to support custom element size Olivier Matz
2019-12-20  4:45   ` [dpdk-dev] [PATCH v7 00/17] " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 01/17] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-02 16:42       ` Ananyev, Konstantin
2020-01-07  5:35         ` Honnappa Nagarahalli
2020-01-07  6:00           ` Honnappa Nagarahalli
2020-01-07 10:21             ` Ananyev, Konstantin
2020-01-07 15:21               ` Honnappa Nagarahalli
2020-01-07 15:41                 ` Ananyev, Konstantin
2020-01-08  6:17                   ` Honnappa Nagarahalli
2020-01-08 10:05                     ` Ananyev, Konstantin
2020-01-08 23:40                       ` Honnappa Nagarahalli
2020-01-09  0:48                         ` Ananyev, Konstantin
2020-01-09 16:06                           ` Honnappa Nagarahalli
2020-01-13 11:53                             ` Ananyev, Konstantin
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 03/17] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-02 16:31       ` Ananyev, Konstantin
2020-01-07  5:13         ` Honnappa Nagarahalli
2020-01-07 16:03           ` Ananyev, Konstantin
2020-01-09  5:15             ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 04/17] test/ring: test burst APIs with random empty-full test case Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 05/17] test/ring: add default, single element test cases Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 06/17] test/ring: rte_ring_xxx_elem test cases for exact size ring Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 07/17] test/ring: negative test cases for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 08/17] test/ring: remove duplicate test cases Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 09/17] test/ring: removed unused variable synchro Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 10/17] test/ring: modify single element enq/deq perf test cases Honnappa Nagarahalli
2020-01-02 17:03       ` Ananyev, Konstantin
2020-01-07  5:54         ` Honnappa Nagarahalli
2020-01-07 16:13           ` Ananyev, Konstantin
2020-01-07 22:33             ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 11/17] test/ring: modify burst " Honnappa Nagarahalli
2020-01-02 16:57       ` Ananyev, Konstantin
2020-01-07  5:42         ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 12/17] test/ring: modify bulk " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 13/17] test/ring: modify bulk empty deq " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 14/17] test/ring: modify multi-lcore " Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 15/17] test/ring: adjust run-on-all-cores " Honnappa Nagarahalli
2020-01-02 17:00       ` Ananyev, Konstantin
2020-01-07  5:42         ` Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 16/17] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2019-12-20  4:45     ` [dpdk-dev] [PATCH v7 17/17] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-13 17:25   ` [dpdk-dev] [PATCH v8 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-13 17:25     ` [dpdk-dev] [PATCH v8 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
     [not found]       ` <1578977880-13011-1-git-send-email-robot@bytheb.org>
     [not found]         ` <VE1PR08MB5149BE79083CD66A41CBD6D198340@VE1PR08MB5149.eurprd08.prod.outlook.com>
2020-01-14 15:12           ` [dpdk-dev] FW: || pw64572 " Aaron Conole
2020-01-14 16:51             ` Aaron Conole
2020-01-14 19:35               ` Honnappa Nagarahalli
2020-01-14 20:44                 ` Aaron Conole
2020-01-15  0:55                   ` Honnappa Nagarahalli
2020-01-15  4:43                   ` Honnappa Nagarahalli
2020-01-15  5:05                     ` Honnappa Nagarahalli
2020-01-15 18:22                       ` Aaron Conole
2020-01-15 18:38                         ` Honnappa Nagarahalli
2020-01-16  5:27                           ` Honnappa Nagarahalli
2020-01-16  5:25   ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-17 16:34       ` Olivier Matz
2020-01-17 16:45         ` Honnappa Nagarahalli
2020-01-17 18:10           ` David Christensen
2020-01-18 12:32           ` Ananyev, Konstantin
2020-01-18 15:01             ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-17 17:03       ` Olivier Matz
2020-01-18 16:27         ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-17 17:12       ` Olivier Matz
2020-01-18 16:28         ` Honnappa Nagarahalli
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-17 20:27       ` David Marchand
2020-01-17 20:54         ` Honnappa Nagarahalli
2020-01-17 21:07           ` David Marchand
2020-01-17 22:24             ` Wang, Yipeng1
2020-01-16  5:25     ` [dpdk-dev] [PATCH v9 6/6] lib/eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-17 14:41       ` Jerin Jacob
2020-01-17 16:12         ` David Marchand
2020-01-16 16:36     ` [dpdk-dev] [PATCH v9 0/6] lib/ring: APIs to support custom element size Honnappa Nagarahalli
2020-01-17 12:14       ` David Marchand
2020-01-17 13:34         ` Jerin Jacob
2020-01-17 16:37           ` Mattias Rönnblom
2020-01-17 14:28         ` Honnappa Nagarahalli
2020-01-17 14:36           ` Honnappa Nagarahalli
2020-01-17 16:15           ` David Marchand
2020-01-17 16:32             ` Honnappa Nagarahalli
2020-01-17 17:15     ` Olivier Matz
2020-01-18 19:32   ` [dpdk-dev] [PATCH v10 " Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 1/6] test/ring: use division for cycle count calculation Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 2/6] lib/ring: apis to support configurable element size Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 3/6] test/ring: add functional tests for rte_ring_xxx_elem APIs Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 4/6] test/ring: modify perf test cases to use " Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 5/6] lib/hash: use ring with 32b element size to save memory Honnappa Nagarahalli
2020-01-18 19:32     ` [dpdk-dev] [PATCH v10 6/6] eventdev: use custom element size ring for event rings Honnappa Nagarahalli
2020-01-19 19:31     ` [dpdk-dev] [PATCH v10 0/6] lib/ring: APIs to support custom element size David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).