DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] ring clean up
@ 2020-07-03 10:26 Feifei Wang
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-03 10:26 UTC (permalink / raw)
  Cc: dev, nd, Feifei Wang

Do some work for ring refactoring, which includes:
1. remove experimental tags
2. use element APIs to implement legacy APIs

Feifei Wang (3):
  ring: remove experimental tag for ring reset API
  ring: remove experimental tag for ring element APIs
  ring: use element APIs to implement legacy APIs

 lib/librte_ring/rte_ring.h           | 292 +++------------------------
 lib/librte_ring/rte_ring_elem.h      |   8 -
 lib/librte_ring/rte_ring_version.map |  13 +-
 3 files changed, 34 insertions(+), 279 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 10:26 [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
@ 2020-07-03 10:26 ` Feifei Wang
  2020-07-03 16:16   ` Kinsella, Ray
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Feifei Wang @ 2020-07-03 10:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Feifei Wang

Remove the experimental tag for rte_ring_reset API that have been around
for 4 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/rte_ring.h           | 3 ---
 lib/librte_ring/rte_ring_version.map | 4 +---
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f67141482..7181c33b4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
  *
  * This function flush all the elements in a ring
  *
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * @warning
  * Make sure the ring is not in use while calling this function.
  *
  * @param r
  *   A pointer to the ring structure.
  */
-__rte_experimental
 void
 rte_ring_reset(struct rte_ring *r);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index e88c143cf..aec6f3820 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -8,6 +8,7 @@ DPDK_20.0 {
 	rte_ring_init;
 	rte_ring_list_dump;
 	rte_ring_lookup;
+	rte_ring_reset;
 
 	local: *;
 };
@@ -15,9 +16,6 @@ DPDK_20.0 {
 EXPERIMENTAL {
 	global:
 
-	# added in 19.08
-	rte_ring_reset;
-
 	# added in 20.02
 	rte_ring_create_elem;
 	rte_ring_get_memsize_elem;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs
  2020-07-03 10:26 [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
@ 2020-07-03 10:26 ` Feifei Wang
  2020-07-03 16:17   ` Kinsella, Ray
  2020-07-07 14:44   ` Ananyev, Konstantin
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs Feifei Wang
  2020-07-08 14:51 ` [dpdk-dev] [PATCH 0/3] ring clean up David Marchand
  3 siblings, 2 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-03 10:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Feifei Wang

Remove the experimental tag for rte_ring_xxx_elem APIs that have been
around for 2 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/rte_ring.h           | 5 +----
 lib/librte_ring/rte_ring_elem.h      | 8 --------
 lib/librte_ring/rte_ring_version.map | 9 ++-------
 3 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7181c33b4..35f3f8c42 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -40,6 +40,7 @@ extern "C" {
 #endif
 
 #include <rte_ring_core.h>
+#include <rte_ring_elem.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
-#ifdef ALLOW_EXPERIMENTAL_API
-#include <rte_ring_elem.h>
-#endif
-
 /**
  * Enqueue several objects on a ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 9e5192ae6..69dc51746 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -23,9 +23,6 @@ extern "C" {
 #include <rte_ring_core.h>
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Calculate the memory size needed for a ring with given element size
  *
  * This function returns the number of bytes needed for a ring, given
@@ -43,13 +40,9 @@ extern "C" {
  *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
  *		 power of 2.
  */
-__rte_experimental
 ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Create a new ring named *name* that stores elements with given size.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Then it
@@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *    - EEXIST - a memzone with the same name already exists
  *    - ENOMEM - no appropriate memory area found in which to create memzone
  */
-__rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
 			unsigned int count, int socket_id, unsigned int flags);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index aec6f3820..3030e8edb 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -2,9 +2,11 @@ DPDK_20.0 {
 	global:
 
 	rte_ring_create;
+	rte_ring_create_elem;
 	rte_ring_dump;
 	rte_ring_free;
 	rte_ring_get_memsize;
+	rte_ring_get_memsize_elem;
 	rte_ring_init;
 	rte_ring_list_dump;
 	rte_ring_lookup;
@@ -13,10 +15,3 @@ DPDK_20.0 {
 	local: *;
 };
 
-EXPERIMENTAL {
-	global:
-
-	# added in 20.02
-	rte_ring_create_elem;
-	rte_ring_get_memsize_elem;
-};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-03 10:26 [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
@ 2020-07-03 10:26 ` Feifei Wang
  2020-07-07  5:19   ` Feifei Wang
  2020-07-07 14:45   ` Ananyev, Konstantin
  2020-07-08 14:51 ` [dpdk-dev] [PATCH 0/3] ring clean up David Marchand
  3 siblings, 2 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-03 10:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev; +Cc: dev, nd, Feifei Wang

Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
This reduces code duplication and improves code maintenance.

aarch64:
HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz
OS:Ubuntu 18.04.1 LTS, Kernel: 5.4.0+
DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
gcc:9.2.1

$sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
RTE>>ring_perf_autotest

test results on aarch64 in the case of esize 4:

                                     without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69

x86:
HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
1 thread/core, 3.3GHz
OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
gcc: 9.3.0

$sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
RTE>>ring_perf_autotest

test results on x86 in the case of esize 4:

                                     without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84

Summary:
In aarch64 server with this patch, there is almost no performance
difference.
In x86 server with this patch, in some cases, the performance slightly
improve, in other cases, the performance slightly drop.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
 1 file changed, 30 insertions(+), 254 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35f3f8c42..2a2190bfc 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
  */
 void rte_ring_dump(FILE *f, const struct rte_ring *r);
 
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions */
-#define ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	obj_type *ring = (obj_type *)ring_start; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
-			ring[idx] = obj_table[i]; \
-			ring[idx + 1] = obj_table[i + 1]; \
-			ring[idx + 2] = obj_table[i + 2]; \
-			ring[idx + 3] = obj_table[i + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			ring[idx++] = obj_table[i++]; /* fallthrough */ \
-		case 2: \
-			ring[idx++] = obj_table[i++]; /* fallthrough */ \
-		case 1: \
-			ring[idx++] = obj_table[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj_table[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj_table[i]; \
-	} \
-} while (0)
-
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions */
-#define DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	obj_type *ring = (obj_type *)ring_start; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
-			obj_table[i] = ring[idx]; \
-			obj_table[i + 1] = ring[idx + 1]; \
-			obj_table[i + 2] = ring[idx + 2]; \
-			obj_table[i + 3] = ring[idx + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			obj_table[i++] = ring[idx++]; /* fallthrough */ \
-		case 2: \
-			obj_table[i++] = ring[idx++]; /* fallthrough */ \
-		case 1: \
-			obj_table[i++] = ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj_table[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj_table[i] = ring[idx]; \
-	} \
-} while (0)
-
-/* Between load and load. there might be cpu reorder in weak model
- * (powerpc/arm).
- * There are 2 choices for the users
- * 1.use rmb() memory barrier
- * 2.use one-direction load_acquire/store_release barrier,defined by
- * CONFIG_RTE_USE_C11_MEM_MODEL=y
- * It depends on performance test results.
- * By default, move common functions to rte_ring_generic.h
- */
-#ifdef RTE_USE_C11_MEM_MODEL
-#include "rte_ring_c11_mem.h"
-#else
-#include "rte_ring_generic.h"
-#endif
-
-/**
- * @internal Enqueue several objects on the ring
- *
-  * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
- * @param is_sp
- *   Indicates whether to use single producer or multi-producer head update
- * @param free_space
- *   returns the amount of space after the enqueue operation has finished
- * @return
- *   Actual number of objects enqueued.
- *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
- */
-static __rte_always_inline unsigned int
-__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
-		 unsigned int n, enum rte_ring_queue_behavior behavior,
-		 unsigned int is_sp, unsigned int *free_space)
-{
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
-
-	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
-
-	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
-
-	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
-end:
-	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
-}
-
-/**
- * @internal Dequeue several objects from the ring
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to pull from the ring.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
- * @param is_sc
- *   Indicates whether to use single consumer or multi-consumer head update
- * @param available
- *   returns the number of remaining ring entries after the dequeue has finished
- * @return
- *   - Actual number of objects dequeued.
- *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
- */
-static __rte_always_inline unsigned int
-__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned int n, enum rte_ring_queue_behavior behavior,
-		 unsigned int is_sc, unsigned int *available)
-{
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
-
-	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
-
-	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
-
-end:
-	if (available != NULL)
-		*available = entries - n;
-	return n;
-}
-
 /**
  * Enqueue several objects on the ring (multi-producers safe).
  *
@@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_MT, free_space);
+	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_ST, free_space);
+	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	switch (r->prod.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
-			free_space);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
-			free_space);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static __rte_always_inline int
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_MT, available);
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_ST, available);
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	switch (r->cons.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 static __rte_always_inline int
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
+	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static __rte_always_inline int
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
+	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static __rte_always_inline int
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
+	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
 rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
+	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
 rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
+	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	switch (r->prod.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
-			free_space);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
-			free_space);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
 rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
+	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
 rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
+	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	switch (r->cons.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
-			available);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
-			available);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
@ 2020-07-03 16:16   ` Kinsella, Ray
  2020-07-03 18:46     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 23+ messages in thread
From: Kinsella, Ray @ 2020-07-03 16:16 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_reset API that have been around
> for 4 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 3 ---
>  lib/librte_ring/rte_ring_version.map | 4 +---
>  2 files changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index f67141482..7181c33b4 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>   *
>   * This function flush all the elements in a ring
>   *
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * @warning
>   * Make sure the ring is not in use while calling this function.
>   *
>   * @param r
>   *   A pointer to the ring structure.
>   */
> -__rte_experimental
>  void
>  rte_ring_reset(struct rte_ring *r);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index e88c143cf..aec6f3820 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -8,6 +8,7 @@ DPDK_20.0 {
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> +	rte_ring_reset;
>  
>  	local: *;
>  };
> @@ -15,9 +16,6 @@ DPDK_20.0 {
>  EXPERIMENTAL {
>  	global:
>  
> -	# added in 19.08
> -	rte_ring_reset;
> -
>  	# added in 20.02
>  	rte_ring_create_elem;
>  	rte_ring_get_memsize_elem;

So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0 ABI.

The way to solve is to add it the DPDK_21 ABI in the map file.
And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental if necessary. 

https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-macros

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
@ 2020-07-03 16:17   ` Kinsella, Ray
  2020-07-07 14:44   ` Ananyev, Konstantin
  1 sibling, 0 replies; 23+ messages in thread
From: Kinsella, Ray @ 2020-07-03 16:17 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_xxx_elem APIs that have been
> around for 2 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 5 +----
>  lib/librte_ring/rte_ring_elem.h      | 8 --------
>  lib/librte_ring/rte_ring_version.map | 9 ++-------
>  3 files changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 7181c33b4..35f3f8c42 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -40,6 +40,7 @@ extern "C" {
>  #endif
>  
>  #include <rte_ring_core.h>
> +#include <rte_ring_elem.h>
>  
>  /**
>   * Calculate the memory size needed for a ring
> @@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			RTE_RING_SYNC_ST, free_space);
>  }
>  
> -#ifdef ALLOW_EXPERIMENTAL_API
> -#include <rte_ring_elem.h>
> -#endif
> -
>  /**
>   * Enqueue several objects on a ring.
>   *
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 9e5192ae6..69dc51746 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -23,9 +23,6 @@ extern "C" {
>  #include <rte_ring_core.h>
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Calculate the memory size needed for a ring with given element size
>   *
>   * This function returns the number of bytes needed for a ring, given
> @@ -43,13 +40,9 @@ extern "C" {
>   *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
>   *		 power of 2.
>   */
> -__rte_experimental
>  ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Create a new ring named *name* that stores elements with given size.
>   *
>   * This function uses ``memzone_reserve()`` to allocate memory. Then it
> @@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>   *    - EEXIST - a memzone with the same name already exists
>   *    - ENOMEM - no appropriate memory area found in which to create memzone
>   */
> -__rte_experimental
>  struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
>  			unsigned int count, int socket_id, unsigned int flags);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index aec6f3820..3030e8edb 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -2,9 +2,11 @@ DPDK_20.0 {
>  	global:
>  
>  	rte_ring_create;
> +	rte_ring_create_elem;
>  	rte_ring_dump;
>  	rte_ring_free;
>  	rte_ring_get_memsize;
> +	rte_ring_get_memsize_elem;
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> @@ -13,10 +15,3 @@ DPDK_20.0 {
>  	local: *;
>  };
>  
> -EXPERIMENTAL {
> -	global:
> -
> -	# added in 20.02
> -	rte_ring_create_elem;
> -	rte_ring_get_memsize_elem;
> -};
> 

Same as the last comment.
Rte_ring_get_memsize_elem and rte_ring_create_elem are part of the DPDK_21 ABI.

Ray K

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 16:16   ` Kinsella, Ray
@ 2020-07-03 18:46     ` Honnappa Nagarahalli
  2020-07-06  6:23       ` Kinsella, Ray
  0 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-03 18:46 UTC (permalink / raw)
  To: Kinsella, Ray, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> 
> On 03/07/2020 11:26, Feifei Wang wrote:
> > Remove the experimental tag for rte_ring_reset API that have been
> > around for 4 releases.
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h           | 3 ---
> >  lib/librte_ring/rte_ring_version.map | 4 +---
> >  2 files changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index f67141482..7181c33b4 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
> >   *
> >   * This function flush all the elements in a ring
> >   *
> > - * @b EXPERIMENTAL: this API may change without prior notice
> > - *
> >   * @warning
> >   * Make sure the ring is not in use while calling this function.
> >   *
> >   * @param r
> >   *   A pointer to the ring structure.
> >   */
> > -__rte_experimental
> >  void
> >  rte_ring_reset(struct rte_ring *r);
> >
> > diff --git a/lib/librte_ring/rte_ring_version.map
> > b/lib/librte_ring/rte_ring_version.map
> > index e88c143cf..aec6f3820 100644
> > --- a/lib/librte_ring/rte_ring_version.map
> > +++ b/lib/librte_ring/rte_ring_version.map
> > @@ -8,6 +8,7 @@ DPDK_20.0 {
> >  	rte_ring_init;
> >  	rte_ring_list_dump;
> >  	rte_ring_lookup;
> > +	rte_ring_reset;
> >
> >  	local: *;
> >  };
> > @@ -15,9 +16,6 @@ DPDK_20.0 {
> >  EXPERIMENTAL {
> >  	global:
> >
> > -	# added in 19.08
> > -	rte_ring_reset;
> > -
> >  	# added in 20.02
> >  	rte_ring_create_elem;
> >  	rte_ring_get_memsize_elem;
> 
> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
> ABI.
Thanks Ray for clarifying this.

> 
> The way to solve is to add it the DPDK_21 ABI in the map file.
> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
> if necessary.
Is using VERSION_SYMBOL_EXPERIMENTAL a must? The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

> 
> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
> macros

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 18:46     ` Honnappa Nagarahalli
@ 2020-07-06  6:23       ` Kinsella, Ray
  2020-07-07  3:19         ` Feifei Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Kinsella, Ray @ 2020-07-06  6:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> On 03/07/2020 11:26, Feifei Wang wrote:
>>> Remove the experimental tag for rte_ring_reset API that have been
>>> around for 4 releases.
>>>
>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>> index f67141482..7181c33b4 100644
>>> --- a/lib/librte_ring/rte_ring.h
>>> +++ b/lib/librte_ring/rte_ring.h
>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>>>   *
>>>   * This function flush all the elements in a ring
>>>   *
>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>> - *
>>>   * @warning
>>>   * Make sure the ring is not in use while calling this function.
>>>   *
>>>   * @param r
>>>   *   A pointer to the ring structure.
>>>   */
>>> -__rte_experimental
>>>  void
>>>  rte_ring_reset(struct rte_ring *r);
>>>
>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>> b/lib/librte_ring/rte_ring_version.map
>>> index e88c143cf..aec6f3820 100644
>>> --- a/lib/librte_ring/rte_ring_version.map
>>> +++ b/lib/librte_ring/rte_ring_version.map
>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>  	rte_ring_init;
>>>  	rte_ring_list_dump;
>>>  	rte_ring_lookup;
>>> +	rte_ring_reset;
>>>
>>>  	local: *;
>>>  };
>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>  EXPERIMENTAL {
>>>  	global:
>>>
>>> -	# added in 19.08
>>> -	rte_ring_reset;
>>> -
>>>  	# added in 20.02
>>>  	rte_ring_create_elem;
>>>  	rte_ring_get_memsize_elem;
>>
>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
>> ABI.
> Thanks Ray for clarifying this.
> 
>>
>> The way to solve is to add it the DPDK_21 ABI in the map file.
>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
>> if necessary.
> Is using VERSION_SYMBOL_EXPERIMENTAL a must? 

Purely at the discretion of the contributor and maintainer. 
If it has been around for a while, applications are using it and changing the symbol will break them.

You may choose to provide the alias or not. 

> The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

"Some time" is a bit vague alright, should be "until the next major ABI version" - I will fix. 

> 
>>
>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
>> macros

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-06  6:23       ` Kinsella, Ray
@ 2020-07-07  3:19         ` Feifei Wang
  2020-07-07  7:40           ` Kinsella, Ray
  0 siblings, 1 reply; 23+ messages in thread
From: Feifei Wang @ 2020-07-07  3:19 UTC (permalink / raw)
  To: Kinsella, Ray, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, nd



> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: 2020年7月6日 14:23
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
> 
> 
> 
> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>
> >> On 03/07/2020 11:26, Feifei Wang wrote:
> >>> Remove the experimental tag for rte_ring_reset API that have been
> >>> around for 4 releases.
> >>>
> >>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>>  lib/librte_ring/rte_ring.h           | 3 ---
> >>>  lib/librte_ring/rte_ring_version.map | 4 +---
> >>>  2 files changed, 1 insertion(+), 6 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> >>> index f67141482..7181c33b4 100644
> >>> --- a/lib/librte_ring/rte_ring.h
> >>> +++ b/lib/librte_ring/rte_ring.h
> >>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
> **obj_p)
> >>>   *
> >>>   * This function flush all the elements in a ring
> >>>   *
> >>> - * @b EXPERIMENTAL: this API may change without prior notice
> >>> - *
> >>>   * @warning
> >>>   * Make sure the ring is not in use while calling this function.
> >>>   *
> >>>   * @param r
> >>>   *   A pointer to the ring structure.
> >>>   */
> >>> -__rte_experimental
> >>>  void
> >>>  rte_ring_reset(struct rte_ring *r);
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring_version.map
> >>> b/lib/librte_ring/rte_ring_version.map
> >>> index e88c143cf..aec6f3820 100644
> >>> --- a/lib/librte_ring/rte_ring_version.map
> >>> +++ b/lib/librte_ring/rte_ring_version.map
> >>> @@ -8,6 +8,7 @@ DPDK_20.0 {
> >>>  	rte_ring_init;
> >>>  	rte_ring_list_dump;
> >>>  	rte_ring_lookup;
> >>> +	rte_ring_reset;
> >>>
> >>>  	local: *;
> >>>  };
> >>> @@ -15,9 +16,6 @@ DPDK_20.0 {
> >>>  EXPERIMENTAL {
> >>>  	global:
> >>>
> >>> -	# added in 19.08
> >>> -	rte_ring_reset;
> >>> -
> >>>  	# added in 20.02
> >>>  	rte_ring_create_elem;
> >>>  	rte_ring_get_memsize_elem;
> >>
> >> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
> >> the v20.0 ABI.
> > Thanks Ray for clarifying this.
> >
Thanks very much for pointing this.
> >>
> >> The way to solve is to add it the DPDK_21 ABI in the map file.
> >> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
> experimental
> >> if necessary.
> > Is using VERSION_SYMBOL_EXPERIMENTAL a must?
> 
> Purely at the discretion of the contributor and maintainer.
> If it has been around for a while, applications are using it and changing the
> symbol will break them.
> 
> You may choose to provide the alias or not.
Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
version, it is still needed to be removed in the near future.

Thanks very much for your review.
> 
> > The documentation also seems to be vague. It says " The macro is used
> when a symbol matures to become part of the stable ABI, to provide an alias
> to experimental for some time". What does 'some time' mean?
> 
> "Some time" is a bit vague alright, should be "until the next major ABI
> version" - I will fix.
> 
> >
> >>
> >> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
> >> ng-
> >> macros

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs Feifei Wang
@ 2020-07-07  5:19   ` Feifei Wang
  2020-07-07 14:04     ` Ananyev, Konstantin
  2020-07-07 14:45   ` Ananyev, Konstantin
  1 sibling, 1 reply; 23+ messages in thread
From: Feifei Wang @ 2020-07-07  5:19 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, drc
  Cc: dev, nd, Ruifeng Wang, nd

Hi, Konstantin, David

I'm Feifei Wang from Arm. Sorry to make the following request:
Would you please do some ring performance tests of this patch in your platforms at the time you are free?
And I want to know whether this patch has a significant impact on other platforms except ARM.

Thanks very much.
Feifei

> -----Original Message-----
> From: Feifei Wang <feifei.wang2@arm.com>
> Sent: 2020年7月3日 18:27
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>
> Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> This reduces code duplication and improves code maintenance.
> 
> aarch64:
> HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1 LTS,
> Kernel: 5.4.0+
> DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> gcc:9.2.1
> 
> $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> RTE>>ring_perf_autotest
> 
> test results on aarch64 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> 
> x86:
> HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> 1 thread/core, 3.3GHz
> OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> gcc: 9.3.0
> 
> $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> RTE>>ring_perf_autotest
> 
> test results on x86 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> 
> Summary:
> In aarch64 server with this patch, there is almost no performance difference.
> In x86 server with this patch, in some cases, the performance slightly
> improve, in other cases, the performance slightly drop.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
>  1 file changed, 30 insertions(+), 254 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 35f3f8c42..2a2190bfc 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
>   */
>  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> 
> -/* the actual enqueue of pointers on the ring.
> - * Placed here since identical code needed in both
> - * single and multi producer enqueue functions */ -#define
> ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> -	unsigned int i; \
> -	const uint32_t size = (r)->size; \
> -	uint32_t idx = prod_head & (r)->mask; \
> -	obj_type *ring = (obj_type *)ring_start; \
> -	if (likely(idx + n < size)) { \
> -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> -			ring[idx] = obj_table[i]; \
> -			ring[idx + 1] = obj_table[i + 1]; \
> -			ring[idx + 2] = obj_table[i + 2]; \
> -			ring[idx + 3] = obj_table[i + 3]; \
> -		} \
> -		switch (n & 0x3) { \
> -		case 3: \
> -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> -		case 2: \
> -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> -		case 1: \
> -			ring[idx++] = obj_table[i++]; \
> -		} \
> -	} else { \
> -		for (i = 0; idx < size; i++, idx++)\
> -			ring[idx] = obj_table[i]; \
> -		for (idx = 0; i < n; i++, idx++) \
> -			ring[idx] = obj_table[i]; \
> -	} \
> -} while (0)
> -
> -/* the actual copy of pointers on the ring to obj_table.
> - * Placed here since identical code needed in both
> - * single and multi consumer dequeue functions */ -#define
> DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> -	unsigned int i; \
> -	uint32_t idx = cons_head & (r)->mask; \
> -	const uint32_t size = (r)->size; \
> -	obj_type *ring = (obj_type *)ring_start; \
> -	if (likely(idx + n < size)) { \
> -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> -			obj_table[i] = ring[idx]; \
> -			obj_table[i + 1] = ring[idx + 1]; \
> -			obj_table[i + 2] = ring[idx + 2]; \
> -			obj_table[i + 3] = ring[idx + 3]; \
> -		} \
> -		switch (n & 0x3) { \
> -		case 3: \
> -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> -		case 2: \
> -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> -		case 1: \
> -			obj_table[i++] = ring[idx++]; \
> -		} \
> -	} else { \
> -		for (i = 0; idx < size; i++, idx++) \
> -			obj_table[i] = ring[idx]; \
> -		for (idx = 0; i < n; i++, idx++) \
> -			obj_table[i] = ring[idx]; \
> -	} \
> -} while (0)
> -
> -/* Between load and load. there might be cpu reorder in weak model
> - * (powerpc/arm).
> - * There are 2 choices for the users
> - * 1.use rmb() memory barrier
> - * 2.use one-direction load_acquire/store_release barrier,defined by
> - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> - * It depends on performance test results.
> - * By default, move common functions to rte_ring_generic.h
> - */
> -#ifdef RTE_USE_C11_MEM_MODEL
> -#include "rte_ring_c11_mem.h"
> -#else
> -#include "rte_ring_generic.h"
> -#endif
> -
> -/**
> - * @internal Enqueue several objects on the ring
> - *
> -  * @param r
> - *   A pointer to the ring structure.
> - * @param obj_table
> - *   A pointer to a table of void * pointers (objects).
> - * @param n
> - *   The number of objects to add in the ring from the obj_table.
> - * @param behavior
> - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> ring
> - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> - * @param is_sp
> - *   Indicates whether to use single producer or multi-producer head update
> - * @param free_space
> - *   returns the amount of space after the enqueue operation has finished
> - * @return
> - *   Actual number of objects enqueued.
> - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> - */
> -static __rte_always_inline unsigned int -__rte_ring_do_enqueue(struct
> rte_ring *r, void * const *obj_table,
> -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> -		 unsigned int is_sp, unsigned int *free_space)
> -{
> -	uint32_t prod_head, prod_next;
> -	uint32_t free_entries;
> -
> -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> -			&prod_head, &prod_next, &free_entries);
> -	if (n == 0)
> -		goto end;
> -
> -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> -
> -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> -end:
> -	if (free_space != NULL)
> -		*free_space = free_entries - n;
> -	return n;
> -}
> -
> -/**
> - * @internal Dequeue several objects from the ring
> - *
> - * @param r
> - *   A pointer to the ring structure.
> - * @param obj_table
> - *   A pointer to a table of void * pointers (objects).
> - * @param n
> - *   The number of objects to pull from the ring.
> - * @param behavior
> - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> ring
> - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> - * @param is_sc
> - *   Indicates whether to use single consumer or multi-consumer head
> update
> - * @param available
> - *   returns the number of remaining ring entries after the dequeue has
> finished
> - * @return
> - *   - Actual number of objects dequeued.
> - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> - */
> -static __rte_always_inline unsigned int -__rte_ring_do_dequeue(struct
> rte_ring *r, void **obj_table,
> -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> -		 unsigned int is_sc, unsigned int *available)
> -{
> -	uint32_t cons_head, cons_next;
> -	uint32_t entries;
> -
> -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> -			&cons_head, &cons_next, &entries);
> -	if (n == 0)
> -		goto end;
> -
> -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> -
> -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> -
> -end:
> -	if (available != NULL)
> -		*available = entries - n;
> -	return n;
> -}
> -
>  /**
>   * Enqueue several objects on the ring (multi-producers safe).
>   *
> @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_MT, free_space);
> +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_ST, free_space);
> +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	switch (r->prod.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> free_space);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> free_space);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> -			free_space);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> -			free_space);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,  static __rte_always_inline int
> rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> *obj)  {
> -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> {
> -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_MT, available);
> +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_ST, available);
> +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
> -	switch (r->cons.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> available);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> available);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> **obj_table, unsigned int n,  static __rte_always_inline int
> rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> rte_ring *r, void **obj_p)  {
> -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
> static __rte_always_inline int  rte_ring_dequeue(struct rte_ring *r, void
> **obj_p)  {
> -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> free_space);
> +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> free_space);
> +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	switch (r->prod.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> free_space);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> free_space);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> -			free_space);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> -			free_space);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> available);
> +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> *),
> +			n, available);
>  }
> 
>  /**
> @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> available);
> +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	switch (r->cons.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> available);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> -			available);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> -			available);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  #ifdef __cplusplus
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-07  3:19         ` Feifei Wang
@ 2020-07-07  7:40           ` Kinsella, Ray
  0 siblings, 0 replies; 23+ messages in thread
From: Kinsella, Ray @ 2020-07-07  7:40 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 07/07/2020 04:19, Feifei Wang wrote:
> 
> 
>> -----Original Message-----
>> From: Kinsella, Ray <mdr@ashroe.eu>
>> Sent: 2020年7月6日 14:23
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
>> <Feifei.Wang2@arm.com>; Konstantin Ananyev
>> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
>> Cc: dev@dpdk.org; nd <nd@arm.com>
>> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
>>
>>
>>
>> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>
>>>> On 03/07/2020 11:26, Feifei Wang wrote:
>>>>> Remove the experimental tag for rte_ring_reset API that have been
>>>>> around for 4 releases.
>>>>>
>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> ---
>>>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>>>> index f67141482..7181c33b4 100644
>>>>> --- a/lib/librte_ring/rte_ring.h
>>>>> +++ b/lib/librte_ring/rte_ring.h
>>>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
>> **obj_p)
>>>>>   *
>>>>>   * This function flush all the elements in a ring
>>>>>   *
>>>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>>>> - *
>>>>>   * @warning
>>>>>   * Make sure the ring is not in use while calling this function.
>>>>>   *
>>>>>   * @param r
>>>>>   *   A pointer to the ring structure.
>>>>>   */
>>>>> -__rte_experimental
>>>>>  void
>>>>>  rte_ring_reset(struct rte_ring *r);
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>>>> b/lib/librte_ring/rte_ring_version.map
>>>>> index e88c143cf..aec6f3820 100644
>>>>> --- a/lib/librte_ring/rte_ring_version.map
>>>>> +++ b/lib/librte_ring/rte_ring_version.map
>>>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>>>  	rte_ring_init;
>>>>>  	rte_ring_list_dump;
>>>>>  	rte_ring_lookup;
>>>>> +	rte_ring_reset;
>>>>>
>>>>>  	local: *;
>>>>>  };
>>>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>>>  EXPERIMENTAL {
>>>>>  	global:
>>>>>
>>>>> -	# added in 19.08
>>>>> -	rte_ring_reset;
>>>>> -
>>>>>  	# added in 20.02
>>>>>  	rte_ring_create_elem;
>>>>>  	rte_ring_get_memsize_elem;
>>>>
>>>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
>>>> the v20.0 ABI.
>>> Thanks Ray for clarifying this.
>>>
> Thanks very much for pointing this.
>>>>
>>>> The way to solve is to add it the DPDK_21 ABI in the map file.
>>>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
>> experimental
>>>> if necessary.
>>> Is using VERSION_SYMBOL_EXPERIMENTAL a must?
>>
>> Purely at the discretion of the contributor and maintainer.
>> If it has been around for a while, applications are using it and changing the
>> symbol will break them.
>>
>> You may choose to provide the alias or not.
> Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
> VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
> version, it is still needed to be removed in the near future.
> 
> Thanks very much for your review.

That is 100%

>>
>>> The documentation also seems to be vague. It says " The macro is used
>> when a symbol matures to become part of the stable ABI, to provide an alias
>> to experimental for some time". What does 'some time' mean?
>>
>> "Some time" is a bit vague alright, should be "until the next major ABI
>> version" - I will fix.
>>
>>>
>>>>
>>>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
>>>> ng-
>>>> macros

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07  5:19   ` Feifei Wang
@ 2020-07-07 14:04     ` Ananyev, Konstantin
  2020-07-07 16:21       ` Honnappa Nagarahalli
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-07-07 14:04 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, drc; +Cc: dev, nd, Ruifeng Wang, nd


Hi Feifei,

 > Hi, Konstantin, David
> 
> I'm Feifei Wang from Arm. Sorry to make the following request:
> Would you please do some ring performance tests of this patch in your platforms at the time you are free?
> And I want to know whether this patch has a significant impact on other platforms except ARM.

I run few tests on SKX box and so far didn’t notice any real perf difference.
Konstantin
 
> Thanks very much.
> Feifei
> 
> > -----Original Message-----
> > From: Feifei Wang <feifei.wang2@arm.com>
> > Sent: 2020年7月3日 18:27
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > Ananyev <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > <Feifei.Wang2@arm.com>
> > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> >
> > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > This reduces code duplication and improves code maintenance.
> >
> > aarch64:
> > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1 LTS,
> > Kernel: 5.4.0+
> > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > gcc:9.2.1
> >
> > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > RTE>>ring_perf_autotest
> >
> > test results on aarch64 in the case of esize 4:
> >
> >                                      without this patch   with this patch
> > Testing burst enq/deq
> > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > Testing bulk enq/deq
> > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> >
> > x86:
> > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> > 1 thread/core, 3.3GHz
> > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > gcc: 9.3.0
> >
> > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > RTE>>ring_perf_autotest
> >
> > test results on x86 in the case of esize 4:
> >
> >                                      without this patch   with this patch
> > Testing burst enq/deq
> > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > Testing bulk enq/deq
> > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> >
> > Summary:
> > In aarch64 server with this patch, there is almost no performance difference.
> > In x86 server with this patch, in some cases, the performance slightly
> > improve, in other cases, the performance slightly drop.
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
> >  1 file changed, 30 insertions(+), 254 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > 35f3f8c42..2a2190bfc 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> >   */
> >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> >
> > -/* the actual enqueue of pointers on the ring.
> > - * Placed here since identical code needed in both
> > - * single and multi producer enqueue functions */ -#define
> > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > -	unsigned int i; \
> > -	const uint32_t size = (r)->size; \
> > -	uint32_t idx = prod_head & (r)->mask; \
> > -	obj_type *ring = (obj_type *)ring_start; \
> > -	if (likely(idx + n < size)) { \
> > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > -			ring[idx] = obj_table[i]; \
> > -			ring[idx + 1] = obj_table[i + 1]; \
> > -			ring[idx + 2] = obj_table[i + 2]; \
> > -			ring[idx + 3] = obj_table[i + 3]; \
> > -		} \
> > -		switch (n & 0x3) { \
> > -		case 3: \
> > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > -		case 2: \
> > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > -		case 1: \
> > -			ring[idx++] = obj_table[i++]; \
> > -		} \
> > -	} else { \
> > -		for (i = 0; idx < size; i++, idx++)\
> > -			ring[idx] = obj_table[i]; \
> > -		for (idx = 0; i < n; i++, idx++) \
> > -			ring[idx] = obj_table[i]; \
> > -	} \
> > -} while (0)
> > -
> > -/* the actual copy of pointers on the ring to obj_table.
> > - * Placed here since identical code needed in both
> > - * single and multi consumer dequeue functions */ -#define
> > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > -	unsigned int i; \
> > -	uint32_t idx = cons_head & (r)->mask; \
> > -	const uint32_t size = (r)->size; \
> > -	obj_type *ring = (obj_type *)ring_start; \
> > -	if (likely(idx + n < size)) { \
> > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > -			obj_table[i] = ring[idx]; \
> > -			obj_table[i + 1] = ring[idx + 1]; \
> > -			obj_table[i + 2] = ring[idx + 2]; \
> > -			obj_table[i + 3] = ring[idx + 3]; \
> > -		} \
> > -		switch (n & 0x3) { \
> > -		case 3: \
> > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > -		case 2: \
> > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > -		case 1: \
> > -			obj_table[i++] = ring[idx++]; \
> > -		} \
> > -	} else { \
> > -		for (i = 0; idx < size; i++, idx++) \
> > -			obj_table[i] = ring[idx]; \
> > -		for (idx = 0; i < n; i++, idx++) \
> > -			obj_table[i] = ring[idx]; \
> > -	} \
> > -} while (0)
> > -
> > -/* Between load and load. there might be cpu reorder in weak model
> > - * (powerpc/arm).
> > - * There are 2 choices for the users
> > - * 1.use rmb() memory barrier
> > - * 2.use one-direction load_acquire/store_release barrier,defined by
> > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > - * It depends on performance test results.
> > - * By default, move common functions to rte_ring_generic.h
> > - */
> > -#ifdef RTE_USE_C11_MEM_MODEL
> > -#include "rte_ring_c11_mem.h"
> > -#else
> > -#include "rte_ring_generic.h"
> > -#endif
> > -
> > -/**
> > - * @internal Enqueue several objects on the ring
> > - *
> > -  * @param r
> > - *   A pointer to the ring structure.
> > - * @param obj_table
> > - *   A pointer to a table of void * pointers (objects).
> > - * @param n
> > - *   The number of objects to add in the ring from the obj_table.
> > - * @param behavior
> > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > ring
> > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > - * @param is_sp
> > - *   Indicates whether to use single producer or multi-producer head update
> > - * @param free_space
> > - *   returns the amount of space after the enqueue operation has finished
> > - * @return
> > - *   Actual number of objects enqueued.
> > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > - */
> > -static __rte_always_inline unsigned int -__rte_ring_do_enqueue(struct
> > rte_ring *r, void * const *obj_table,
> > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > -		 unsigned int is_sp, unsigned int *free_space)
> > -{
> > -	uint32_t prod_head, prod_next;
> > -	uint32_t free_entries;
> > -
> > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > -			&prod_head, &prod_next, &free_entries);
> > -	if (n == 0)
> > -		goto end;
> > -
> > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > -
> > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > -end:
> > -	if (free_space != NULL)
> > -		*free_space = free_entries - n;
> > -	return n;
> > -}
> > -
> > -/**
> > - * @internal Dequeue several objects from the ring
> > - *
> > - * @param r
> > - *   A pointer to the ring structure.
> > - * @param obj_table
> > - *   A pointer to a table of void * pointers (objects).
> > - * @param n
> > - *   The number of objects to pull from the ring.
> > - * @param behavior
> > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > ring
> > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > - * @param is_sc
> > - *   Indicates whether to use single consumer or multi-consumer head
> > update
> > - * @param available
> > - *   returns the number of remaining ring entries after the dequeue has
> > finished
> > - * @return
> > - *   - Actual number of objects dequeued.
> > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > - */
> > -static __rte_always_inline unsigned int -__rte_ring_do_dequeue(struct
> > rte_ring *r, void **obj_table,
> > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > -		 unsigned int is_sc, unsigned int *available)
> > -{
> > -	uint32_t cons_head, cons_next;
> > -	uint32_t entries;
> > -
> > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > -			&cons_head, &cons_next, &entries);
> > -	if (n == 0)
> > -		goto end;
> > -
> > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > -
> > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > -
> > -end:
> > -	if (available != NULL)
> > -		*available = entries - n;
> > -	return n;
> > -}
> > -
> >  /**
> >   * Enqueue several objects on the ring (multi-producers safe).
> >   *
> > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_MT, free_space);
> > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_ST, free_space);
> > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	switch (r->prod.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > free_space);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > free_space);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > -			free_space);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > -			free_space);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,  static __rte_always_inline int
> > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> > static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> > *obj)  {
> > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> > static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> > {
> > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_MT, available);
> > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_ST, available);
> > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
> >  		unsigned int *available)
> >  {
> > -	switch (r->cons.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > available);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > available);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > **obj_table, unsigned int n,  static __rte_always_inline int
> > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > rte_ring *r, void **obj_p)  {
> > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
> > static __rte_always_inline int  rte_ring_dequeue(struct rte_ring *r, void
> > **obj_p)  {
> > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > free_space);
> > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > free_space);
> > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	switch (r->prod.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > free_space);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > free_space);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > -			free_space);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > -			free_space);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > available);
> > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > available);
> > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	switch (r->cons.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > available);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > -			available);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > -			available);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  #ifdef __cplusplus
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
  2020-07-03 16:17   ` Kinsella, Ray
@ 2020-07-07 14:44   ` Ananyev, Konstantin
  1 sibling, 0 replies; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-07-07 14:44 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Ray Kinsella, Neil Horman; +Cc: dev, nd

> 
> Remove the experimental tag for rte_ring_xxx_elem APIs that have been
> around for 2 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs Feifei Wang
  2020-07-07  5:19   ` Feifei Wang
@ 2020-07-07 14:45   ` Ananyev, Konstantin
  1 sibling, 0 replies; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-07-07 14:45 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli; +Cc: dev, nd

> 
> Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> This reduces code duplication and improves code maintenance.
> 
> aarch64:
> HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz
> OS:Ubuntu 18.04.1 LTS, Kernel: 5.4.0+
> DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> gcc:9.2.1
> 
> $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> RTE>>ring_perf_autotest
> 
> test results on aarch64 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> 
> x86:
> HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> 1 thread/core, 3.3GHz
> OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> gcc: 9.3.0
> 
> $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> RTE>>ring_perf_autotest
> 
> test results on x86 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> 
> Summary:
> In aarch64 server with this patch, there is almost no performance
> difference.
> In x86 server with this patch, in some cases, the performance slightly
> improve, in other cases, the performance slightly drop.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07 14:04     ` Ananyev, Konstantin
@ 2020-07-07 16:21       ` Honnappa Nagarahalli
  2020-07-08  3:19         ` Feifei Wang
  2020-07-07 20:07       ` David Christensen
  2020-07-08  1:21       ` Feifei Wang
  2 siblings, 1 reply; 23+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-07 16:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, Feifei Wang, drc
  Cc: dev, nd, Ruifeng Wang, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Hi Feifei,
> 
>  > Hi, Konstantin, David
> >
> > I'm Feifei Wang from Arm. Sorry to make the following request:
> > Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> > And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
Thanks Konstantin for testing this.

> Konstantin
> 
> > Thanks very much.
> > Feifei
> >
> > > -----Original Message-----
> > > From: Feifei Wang <feifei.wang2@arm.com>
> > > Sent: 2020年7月3日 18:27
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > > Ananyev <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > <Feifei.Wang2@arm.com>
> > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> > >
> > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > This reduces code duplication and improves code maintenance.
> > >
> > > aarch64:
> > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1
> > > LTS,
> > > Kernel: 5.4.0+
> > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > gcc:9.2.1
> > >
> > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > RTE>>ring_perf_autotest
> > >
> > > test results on aarch64 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > >
> > > x86:
> > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > cores/socket,
> > > 1 thread/core, 3.3GHz
> > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > gcc: 9.3.0
> > >
> > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > RTE>>ring_perf_autotest
> > >
> > > test results on x86 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > >
> > > Summary:
> > > In aarch64 server with this patch, there is almost no performance
> difference.
> > > In x86 server with this patch, in some cases, the performance
> > > slightly improve, in other cases, the performance slightly drop.
I suggest removing the perf data from the commit message, as Konstantin confirmed there is no perf drop on x86 either.

> > >
> > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 284
> > > ++++---------------------------------
> > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 35f3f8c42..2a2190bfc 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > >   */
> > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > >
> > > -/* the actual enqueue of pointers on the ring.
> > > - * Placed here since identical code needed in both
> > > - * single and multi producer enqueue functions */ -#define
> > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	const uint32_t size = (r)->size; \
> > > -	uint32_t idx = prod_head & (r)->mask; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > > -			ring[idx] = obj_table[i]; \
> > > -			ring[idx + 1] = obj_table[i + 1]; \
> > > -			ring[idx + 2] = obj_table[i + 2]; \
> > > -			ring[idx + 3] = obj_table[i + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			ring[idx++] = obj_table[i++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++)\
> > > -			ring[idx] = obj_table[i]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			ring[idx] = obj_table[i]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* the actual copy of pointers on the ring to obj_table.
> > > - * Placed here since identical code needed in both
> > > - * single and multi consumer dequeue functions */ -#define
> > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	uint32_t idx = cons_head & (r)->mask; \
> > > -	const uint32_t size = (r)->size; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > > -			obj_table[i] = ring[idx]; \
> > > -			obj_table[i + 1] = ring[idx + 1]; \
> > > -			obj_table[i + 2] = ring[idx + 2]; \
> > > -			obj_table[i + 3] = ring[idx + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			obj_table[i++] = ring[idx++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* Between load and load. there might be cpu reorder in weak model
> > > - * (powerpc/arm).
> > > - * There are 2 choices for the users
> > > - * 1.use rmb() memory barrier
> > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > by
> > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > - * It depends on performance test results.
> > > - * By default, move common functions to rte_ring_generic.h
> > > - */
> > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > -#include "rte_ring_c11_mem.h"
> > > -#else
> > > -#include "rte_ring_generic.h"
> > > -#endif
> > > -
> > > -/**
> > > - * @internal Enqueue several objects on the ring
> > > - *
> > > -  * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to add in the ring from the obj_table.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > - * @param is_sp
> > > - *   Indicates whether to use single producer or multi-producer head
> update
> > > - * @param free_space
> > > - *   returns the amount of space after the enqueue operation has finished
> > > - * @return
> > > - *   Actual number of objects enqueued.
> > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sp, unsigned int *free_space)
> > > -{
> > > -	uint32_t prod_head, prod_next;
> > > -	uint32_t free_entries;
> > > -
> > > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > -			&prod_head, &prod_next, &free_entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > -end:
> > > -	if (free_space != NULL)
> > > -		*free_space = free_entries - n;
> > > -	return n;
> > > -}
> > > -
> > > -/**
> > > - * @internal Dequeue several objects from the ring
> > > - *
> > > - * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to pull from the ring.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> from
> > > ring
> > > - * @param is_sc
> > > - *   Indicates whether to use single consumer or multi-consumer head
> > > update
> > > - * @param available
> > > - *   returns the number of remaining ring entries after the dequeue has
> > > finished
> > > - * @return
> > > - *   - Actual number of objects dequeued.
> > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sc, unsigned int *available)
> > > -{
> > > -	uint32_t cons_head, cons_next;
> > > -	uint32_t entries;
> > > -
> > > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > -			&cons_head, &cons_next, &entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > -
> > > -end:
> > > -	if (available != NULL)
> > > -		*available = entries - n;
> > > -	return n;
> > > -}
> > > -
> > >  /**
> > >   * Enqueue several objects on the ring (multi-producers safe).
> > >   *
> > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, free_space);
> > > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, free_space);
> > > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > > const *obj_table,  static __rte_always_inline int
> > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > rte_ring *r, void
> > > *obj)  {
> > > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > rte_ring *r, void *obj) {
> > > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, available);
> > > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, available);
> > > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
> > >  		unsigned int *available)
> > >  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > > rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > rte_ring *r, void
> > > **obj_p)  {
> > > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > free_space);
> > > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > free_space);
> > > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > available);
> > > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > available);
> > > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  #ifdef __cplusplus
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07 14:04     ` Ananyev, Konstantin
  2020-07-07 16:21       ` Honnappa Nagarahalli
@ 2020-07-07 20:07       ` David Christensen
  2020-07-08  1:30         ` Feifei Wang
  2020-07-08  1:21       ` Feifei Wang
  2 siblings, 1 reply; 23+ messages in thread
From: David Christensen @ 2020-07-07 20:07 UTC (permalink / raw)
  To: Ananyev, Konstantin, Feifei Wang, Honnappa Nagarahalli
  Cc: dev, nd, Ruifeng Wang



On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
> 
> Hi Feifei,
> 
>   > Hi, Konstantin, David
>>
>> I'm Feifei Wang from Arm. Sorry to make the following request:
>> Would you please do some ring performance tests of this patch in your platforms at the time you are free?
>> And I want to know whether this patch has a significant impact on other platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
> Konstantin
>   

Full performance results for IBM POWER9 system below.  I ran the tests  
twice for each version and the results were consistent.

                                      without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          43.63              43.63
legacy APIs: SP/SC: burst (size: 32):         50.07              50.04
legacy APIs: MP/MC: burst (size: 8):          58.43              58.42
legacy APIs: MP/MC: burst (size: 32):         65.52              65.51
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           43.61              43.61
legacy APIs: SP/SC: bulk (size: 32):          50.05              50.02
legacy APIs: MP/MC: bulk (size: 8):           58.43              58.43
legacy APIs: MP/MC: bulk (size: 32):          65.50              65.49

HW:
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        6
Model:               2.3 (pvr 004e 1203)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2300.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127

OS: RHEL 8.2

GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

DPDK: 20.08.0-rc0 (a8550b773)



Unpatched
===========
sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.01
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.07
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.52

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.05
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.50

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.44
legacy APIs: MP/MC: bulk (size: 8): 16.19
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.64

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 362382
Core [69] count = 362516
Total count (size: 8): 724898

Bulk enq/dequeue count on size 32
Core [68] count = 361565
Core [69] count = 361852
Total count (size: 32): 723417

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.81
elem APIs: element size 16B: MP/MC: single: 56.78

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.68
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.00

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.23
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.64
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.11

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.55
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 374327
Core [69] count = 374433
Total count (size: 8): 748760

Bulk enq/dequeue count on size 32
Core [68] count = 324111
Core [69] count = 320038
Total count (size: 32): 644149
Test OK

Patched
=======
$ sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.00
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.04
legacy APIs: MP/MC: burst (size: 8): 58.42
legacy APIs: MP/MC: burst (size: 32): 65.51

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.02
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.49

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.43
legacy APIs: MP/MC: bulk (size: 8): 16.17
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.65

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 363208
Core [69] count = 363334
Total count (size: 8): 726542

Bulk enq/dequeue count on size 32
Core [68] count = 361592
Core [69] count = 361690
Total count (size: 32): 723282

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.78
elem APIs: element size 16B: MP/MC: single: 56.75

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.66
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.03

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.33
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.65
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.04

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.56
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 372618
Core [69] count = 372415
Total count (size: 8): 745033

Bulk enq/dequeue count on size 32
Core [68] count = 318784
Core [69] count = 316066
Total count (size: 32): 634850
Test OK

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07 14:04     ` Ananyev, Konstantin
  2020-07-07 16:21       ` Honnappa Nagarahalli
  2020-07-07 20:07       ` David Christensen
@ 2020-07-08  1:21       ` Feifei Wang
  2 siblings, 0 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-08  1:21 UTC (permalink / raw)
  To: Ananyev, Konstantin, Honnappa Nagarahalli, drc
  Cc: dev, nd, Ruifeng Wang, nd, nd

Hi, Konstantin

Thanks very much for your effort.

Feifei

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: 2020年7月7日 22:04
> To: Feifei Wang <Feifei.Wang2@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; drc@linux.vnet.ibm.com
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> 
> Hi Feifei,
> 
>  > Hi, Konstantin, David
> >
> > I'm Feifei Wang from Arm. Sorry to make the following request:
> > Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> > And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
> Konstantin
> 
> > Thanks very much.
> > Feifei
> >
> > > -----Original Message-----
> > > From: Feifei Wang <feifei.wang2@arm.com>
> > > Sent: 2020年7月3日 18:27
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > > Ananyev <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > <Feifei.Wang2@arm.com>
> > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> > >
> > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > This reduces code duplication and improves code maintenance.
> > >
> > > aarch64:
> > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1
> > > LTS,
> > > Kernel: 5.4.0+
> > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > gcc:9.2.1
> > >
> > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > RTE>>ring_perf_autotest
> > >
> > > test results on aarch64 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > >
> > > x86:
> > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > cores/socket,
> > > 1 thread/core, 3.3GHz
> > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > gcc: 9.3.0
> > >
> > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > RTE>>ring_perf_autotest
> > >
> > > test results on x86 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > >
> > > Summary:
> > > In aarch64 server with this patch, there is almost no performance
> difference.
> > > In x86 server with this patch, in some cases, the performance
> > > slightly improve, in other cases, the performance slightly drop.
> > >
> > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 284
> > > ++++---------------------------------
> > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 35f3f8c42..2a2190bfc 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > >   */
> > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > >
> > > -/* the actual enqueue of pointers on the ring.
> > > - * Placed here since identical code needed in both
> > > - * single and multi producer enqueue functions */ -#define
> > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	const uint32_t size = (r)->size; \
> > > -	uint32_t idx = prod_head & (r)->mask; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > > -			ring[idx] = obj_table[i]; \
> > > -			ring[idx + 1] = obj_table[i + 1]; \
> > > -			ring[idx + 2] = obj_table[i + 2]; \
> > > -			ring[idx + 3] = obj_table[i + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			ring[idx++] = obj_table[i++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++)\
> > > -			ring[idx] = obj_table[i]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			ring[idx] = obj_table[i]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* the actual copy of pointers on the ring to obj_table.
> > > - * Placed here since identical code needed in both
> > > - * single and multi consumer dequeue functions */ -#define
> > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	uint32_t idx = cons_head & (r)->mask; \
> > > -	const uint32_t size = (r)->size; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > > -			obj_table[i] = ring[idx]; \
> > > -			obj_table[i + 1] = ring[idx + 1]; \
> > > -			obj_table[i + 2] = ring[idx + 2]; \
> > > -			obj_table[i + 3] = ring[idx + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			obj_table[i++] = ring[idx++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* Between load and load. there might be cpu reorder in weak model
> > > - * (powerpc/arm).
> > > - * There are 2 choices for the users
> > > - * 1.use rmb() memory barrier
> > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > by
> > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > - * It depends on performance test results.
> > > - * By default, move common functions to rte_ring_generic.h
> > > - */
> > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > -#include "rte_ring_c11_mem.h"
> > > -#else
> > > -#include "rte_ring_generic.h"
> > > -#endif
> > > -
> > > -/**
> > > - * @internal Enqueue several objects on the ring
> > > - *
> > > -  * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to add in the ring from the obj_table.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > - * @param is_sp
> > > - *   Indicates whether to use single producer or multi-producer head
> update
> > > - * @param free_space
> > > - *   returns the amount of space after the enqueue operation has
> finished
> > > - * @return
> > > - *   Actual number of objects enqueued.
> > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sp, unsigned int *free_space)
> > > -{
> > > -	uint32_t prod_head, prod_next;
> > > -	uint32_t free_entries;
> > > -
> > > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > -			&prod_head, &prod_next, &free_entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > -end:
> > > -	if (free_space != NULL)
> > > -		*free_space = free_entries - n;
> > > -	return n;
> > > -}
> > > -
> > > -/**
> > > - * @internal Dequeue several objects from the ring
> > > - *
> > > - * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to pull from the ring.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> from
> > > ring
> > > - * @param is_sc
> > > - *   Indicates whether to use single consumer or multi-consumer head
> > > update
> > > - * @param available
> > > - *   returns the number of remaining ring entries after the dequeue has
> > > finished
> > > - * @return
> > > - *   - Actual number of objects dequeued.
> > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sc, unsigned int *available)
> > > -{
> > > -	uint32_t cons_head, cons_next;
> > > -	uint32_t entries;
> > > -
> > > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > -			&cons_head, &cons_next, &entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > -
> > > -end:
> > > -	if (available != NULL)
> > > -		*available = entries - n;
> > > -	return n;
> > > -}
> > > -
> > >  /**
> > >   * Enqueue several objects on the ring (multi-producers safe).
> > >   *
> > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, free_space);
> > > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, free_space);
> > > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > > const *obj_table,  static __rte_always_inline int
> > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > rte_ring *r, void
> > > *obj)  {
> > > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > rte_ring *r, void *obj) {
> > > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, available);
> > > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, available);
> > > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int
> n,
> > >  		unsigned int *available)
> > >  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > > rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > rte_ring *r, void
> > > **obj_p)  {
> > > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > free_space);
> > > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > free_space);
> > > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > available);
> > > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > available);
> > > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  #ifdef __cplusplus
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07 20:07       ` David Christensen
@ 2020-07-08  1:30         ` Feifei Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-08  1:30 UTC (permalink / raw)
  To: David Christensen, Ananyev, Konstantin, Honnappa Nagarahalli
  Cc: dev, nd, Ruifeng Wang, nd

Hi, David

> -----Original Message-----
> From: David Christensen <drc@linux.vnet.ibm.com>
> Sent: 2020年7月8日 4:07
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Subject: Re: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> 
> 
> On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
> >
> > Hi Feifei,
> >
> >   > Hi, Konstantin, David
> >>
> >> I'm Feifei Wang from Arm. Sorry to make the following request:
> >> Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> >> And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> >
> > I run few tests on SKX box and so far didn’t notice any real perf difference.
> > Konstantin
> >
> 
Thanks very much for presenting these test results.  
Feifei
> Full performance results for IBM POWER9 system below.  I ran the tests
> twice for each version and the results were consistent.
> 
>                                       without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          43.63              43.63
> legacy APIs: SP/SC: burst (size: 32):         50.07              50.04
> legacy APIs: MP/MC: burst (size: 8):          58.43              58.42
> legacy APIs: MP/MC: burst (size: 32):         65.52              65.51
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           43.61              43.61
> legacy APIs: SP/SC: bulk (size: 32):          50.05              50.02
> legacy APIs: MP/MC: bulk (size: 8):           58.43              58.43
> legacy APIs: MP/MC: bulk (size: 32):          65.50              65.49
> 
> HW:
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  4
> Core(s) per socket:  16
> Socket(s):           2
> NUMA node(s):        6
> Model:               2.3 (pvr 004e 1203)
> Model name:          POWER9, altivec supported
> CPU max MHz:         3800.0000
> CPU min MHz:         2300.0000
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node8 CPU(s):   64-127
> 
> OS: RHEL 8.2
> 
> GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> 
> DPDK: 20.08.0-rc0 (a8550b773)
> 
> 
> 
> Unpatched
> ===========
> sudo app/test/dpdk-test -l 68,69
> EAL: Detected 128 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-2048kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
> EAL:   using IOMMU type 7 (sPAPR)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
> APP: HPET is not enabled, using TSC as default timer
> RTE>>ring_perf_autotest
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 42.01
> legacy APIs: MP/MC: single: 56.27
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 43.63 legacy APIs: SP/SC: burst (size: 32):
> 50.07 legacy APIs: MP/MC: burst (size: 8): 58.43 legacy APIs: MP/MC: burst
> (size: 32): 65.52
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 43.61 legacy APIs: SP/SC: bulk (size: 32):
> 50.05 legacy APIs: MP/MC: bulk (size: 8): 58.43 legacy APIs: MP/MC: bulk (size:
> 32): 65.50
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.16 legacy APIs: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> legacy APIs: SP/SC: bulk (size: 8): 12.44 legacy APIs: MP/MC: bulk (size: 8):
> 16.19 legacy APIs: SP/SC: bulk (size: 32): 3.10 legacy APIs: MP/MC: bulk (size:
> 32): 3.64
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 362382
> Core [69] count = 362516
> Total count (size: 8): 724898
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 361565
> Core [69] count = 361852
> Total count (size: 32): 723417
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 42.81 elem APIs: element size 16B:
> MP/MC: single: 56.78
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: burst (size: 32): 59.27 elem APIs: element size 16B: MP/MC:
> burst (size: 8): 60.68 elem APIs: element size 16B: MP/MC: burst (size: 32):
> 75.00
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05 elem APIs: element
> size 16B: SP/SC: bulk (size: 32): 59.23 elem APIs: element size 16B: MP/MC:
> bulk (size: 8): 60.64 elem APIs: element size 16B: MP/MC: bulk (size: 32):
> 75.11
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 15.55 elem APIs: element size 16B: SP/SC:
> bulk (size: 32): 3.22 elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 374327
> Core [69] count = 374433
> Total count (size: 8): 748760
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 324111
> Core [69] count = 320038
> Total count (size: 32): 644149
> Test OK
> 
> Patched
> =======
> $ sudo app/test/dpdk-test -l 68,69
> EAL: Detected 128 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-2048kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
> EAL:   using IOMMU type 7 (sPAPR)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
> APP: HPET is not enabled, using TSC as default timer
> RTE>>ring_perf_autotest
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 42.00
> legacy APIs: MP/MC: single: 56.27
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 43.63 legacy APIs: SP/SC: burst (size: 32):
> 50.04 legacy APIs: MP/MC: burst (size: 8): 58.42 legacy APIs: MP/MC: burst
> (size: 32): 65.51
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 43.61 legacy APIs: SP/SC: bulk (size: 32):
> 50.02 legacy APIs: MP/MC: bulk (size: 8): 58.43 legacy APIs: MP/MC: bulk (size:
> 32): 65.49
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.16 legacy APIs: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> legacy APIs: SP/SC: bulk (size: 8): 12.43 legacy APIs: MP/MC: bulk (size: 8):
> 16.17 legacy APIs: SP/SC: bulk (size: 32): 3.10 legacy APIs: MP/MC: bulk (size:
> 32): 3.65
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 363208
> Core [69] count = 363334
> Total count (size: 8): 726542
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 361592
> Core [69] count = 361690
> Total count (size: 32): 723282
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 42.78 elem APIs: element size 16B:
> MP/MC: single: 56.75
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: burst (size: 32): 59.27 elem APIs: element size 16B: MP/MC:
> burst (size: 8): 60.66 elem APIs: element size 16B: MP/MC: burst (size: 32):
> 75.03
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: bulk (size: 32): 59.33 elem APIs: element size 16B: MP/MC:
> bulk (size: 8): 60.65 elem APIs: element size 16B: MP/MC: bulk (size: 32):
> 75.04
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 15.56 elem APIs: element size 16B: SP/SC:
> bulk (size: 32): 3.22 elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 372618
> Core [69] count = 372415
> Total count (size: 8): 745033
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 318784
> Core [69] count = 316066
> Total count (size: 32): 634850
> Test OK

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs
  2020-07-07 16:21       ` Honnappa Nagarahalli
@ 2020-07-08  3:19         ` Feifei Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-08  3:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, drc
  Cc: dev, nd, Ruifeng Wang, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: 2020年7月8日 0:22
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; drc@linux.vnet.ibm.com
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> <snip>
> 
> >
> > Hi Feifei,
> >
> >  > Hi, Konstantin, David
> > >
> > > I'm Feifei Wang from Arm. Sorry to make the following request:
> > > Would you please do some ring performance tests of this patch in
> > > your
> > platforms at the time you are free?
> > > And I want to know whether this patch has a significant impact on
> > > other
> > platforms except ARM.
> >
> > I run few tests on SKX box and so far didn’t notice any real perf difference.
> Thanks Konstantin for testing this.
> 
> > Konstantin
> >
> > > Thanks very much.
> > > Feifei
> > >
> > > > -----Original Message-----
> > > > From: Feifei Wang <feifei.wang2@arm.com>
> > > > Sent: 2020年7月3日 18:27
> > > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > > > Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > > <Feifei.Wang2@arm.com>
> > > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy
> > > > APIs
> > > >
> > > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > > This reduces code duplication and improves code maintenance.
> > > >
> > > > aarch64:
> > > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu
> > > > 18.04.1 LTS,
> > > > Kernel: 5.4.0+
> > > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > > gcc:9.2.1
> > > >
> > > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > > RTE>>ring_perf_autotest
> > > >
> > > > test results on aarch64 in the case of esize 4:
> > > >
> > > >                                      without this patch   with this patch
> > > > Testing burst enq/deq
> > > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > > Testing bulk enq/deq
> > > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > > >
> > > > x86:
> > > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > > cores/socket,
> > > > 1 thread/core, 3.3GHz
> > > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > > gcc: 9.3.0
> > > >
> > > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > > RTE>>ring_perf_autotest
> > > >
> > > > test results on x86 in the case of esize 4:
> > > >
> > > >                                      without this patch   with this patch
> > > > Testing burst enq/deq
> > > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > > Testing bulk enq/deq
> > > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > > >
> > > > Summary:
> > > > In aarch64 server with this patch, there is almost no performance
> > difference.
> > > > In x86 server with this patch, in some cases, the performance
> > > > slightly improve, in other cases, the performance slightly drop.
> I suggest removing the perf data from the commit message, as Konstantin
> confirmed there is no perf drop on x86 either.
Ok, that's all right.
> 
> > > >
> > > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 284
> > > > ++++---------------------------------
> > > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index 35f3f8c42..2a2190bfc 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > > >   */
> > > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > > >
> > > > -/* the actual enqueue of pointers on the ring.
> > > > - * Placed here since identical code needed in both
> > > > - * single and multi producer enqueue functions */ -#define
> > > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do
> > > > { \ -unsigned int i; \ -const uint32_t size = (r)->size; \
> > > > -uint32_t idx = prod_head & (r)->mask; \ -obj_type *ring =
> > > > (obj_type *)ring_start; \ -if (likely(idx + n < size)) { \ -for (i
> > > > = 0; i < (n & ~0x3); i += 4, idx += 4) { \ -ring[idx] =
> > > > obj_table[i]; \ -ring[idx + 1] = obj_table[i + 1]; \ -ring[idx +
> > > > 2] = obj_table[i + 2]; \ -ring[idx + 3] = obj_table[i + 3]; \ -} \
> > > > -switch (n & 0x3) { \ -case 3: \ -ring[idx++] = obj_table[i++]; /*
> > > > fallthrough */ \ -case 2: \ -ring[idx++] = obj_table[i++]; /*
> > > > fallthrough */ \ -case 1: \ -ring[idx++] = obj_table[i++]; \ -} \
> > > > -} else { \ -for (i = 0; idx < size; i++, idx++)\ -ring[idx] =
> > > > obj_table[i]; \ -for (idx = 0; i < n; i++, idx++) \ -ring[idx] =
> > > > obj_table[i]; \ -} \ -} while (0)
> > > > -
> > > > -/* the actual copy of pointers on the ring to obj_table.
> > > > - * Placed here since identical code needed in both
> > > > - * single and multi consumer dequeue functions */ -#define
> > > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do
> > > > { \ -unsigned int i; \ -uint32_t idx = cons_head & (r)->mask; \
> > > > -const uint32_t size = (r)->size; \ -obj_type *ring = (obj_type
> > > > *)ring_start; \ -if (likely(idx + n < size)) { \ -for (i = 0; i <
> > > > (n & ~0x3); i += 4, idx += 4) {\ -obj_table[i] = ring[idx]; \
> > > > -obj_table[i + 1] = ring[idx + 1]; \ -obj_table[i + 2] = ring[idx
> > > > + 2]; \ -obj_table[i + 3] = ring[idx + 3]; \ -} \ -switch (n &
> > > > 0x3) { \ -case 3: \ -obj_table[i++] = ring[idx++]; /* fallthrough
> > > > */ \ -case 2: \ -obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > > -case 1: \ -obj_table[i++] = ring[idx++]; \ -} \ -} else { \ -for
> > > > (i = 0; idx < size; i++, idx++) \ -obj_table[i] = ring[idx]; \
> > > > -for (idx = 0; i < n; i++, idx++) \ -obj_table[i] = ring[idx]; \
> > > > -} \ -} while (0)
> > > > -
> > > > -/* Between load and load. there might be cpu reorder in weak
> > > > model
> > > > - * (powerpc/arm).
> > > > - * There are 2 choices for the users
> > > > - * 1.use rmb() memory barrier
> > > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > > by
> > > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > - * It depends on performance test results.
> > > > - * By default, move common functions to rte_ring_generic.h
> > > > - */
> > > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > > -#include "rte_ring_c11_mem.h"
> > > > -#else
> > > > -#include "rte_ring_generic.h"
> > > > -#endif
> > > > -
> > > > -/**
> > > > - * @internal Enqueue several objects on the ring
> > > > - *
> > > > -  * @param r
> > > > - *   A pointer to the ring structure.
> > > > - * @param obj_table
> > > > - *   A pointer to a table of void * pointers (objects).
> > > > - * @param n
> > > > - *   The number of objects to add in the ring from the obj_table.
> > > > - * @param behavior
> > > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from
> a
> > > > ring
> > > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > from
> > > > ring
> > > > - * @param is_sp
> > > > - *   Indicates whether to use single producer or multi-producer head
> > update
> > > > - * @param free_space
> > > > - *   returns the amount of space after the enqueue operation has
> finished
> > > > - * @return
> > > > - *   Actual number of objects enqueued.
> > > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > - */
> > > > -static __rte_always_inline unsigned int
> > > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const
> > > > *obj_table,
> > > > - unsigned int n, enum rte_ring_queue_behavior behavior,
> > > > - unsigned int is_sp, unsigned int *free_space) -{ -uint32_t
> > > > prod_head, prod_next; -uint32_t free_entries;
> > > > -
> > > > -n = __rte_ring_move_prod_head(r, is_sp, n, behavior, -&prod_head,
> > > > &prod_next, &free_entries); -if (n == 0) -goto end;
> > > > -
> > > > -ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > > -
> > > > -update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > -end:
> > > > -if (free_space != NULL)
> > > > -*free_space = free_entries - n;
> > > > -return n;
> > > > -}
> > > > -
> > > > -/**
> > > > - * @internal Dequeue several objects from the ring
> > > > - *
> > > > - * @param r
> > > > - *   A pointer to the ring structure.
> > > > - * @param obj_table
> > > > - *   A pointer to a table of void * pointers (objects).
> > > > - * @param n
> > > > - *   The number of objects to pull from the ring.
> > > > - * @param behavior
> > > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from
> a
> > > > ring
> > > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> > from
> > > > ring
> > > > - * @param is_sc
> > > > - *   Indicates whether to use single consumer or multi-consumer head
> > > > update
> > > > - * @param available
> > > > - *   returns the number of remaining ring entries after the dequeue
> has
> > > > finished
> > > > - * @return
> > > > - *   - Actual number of objects dequeued.
> > > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > - */
> > > > -static __rte_always_inline unsigned int
> > > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > > - unsigned int n, enum rte_ring_queue_behavior behavior,
> > > > - unsigned int is_sc, unsigned int *available) -{ -uint32_t
> > > > cons_head, cons_next; -uint32_t entries;
> > > > -
> > > > -n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > > -&cons_head, &cons_next, &entries); -if (n == 0) -goto end;
> > > > -
> > > > -DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > > -
> > > > -update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > > -
> > > > -end:
> > > > -if (available != NULL)
> > > > -*available = entries - n;
> > > > -return n;
> > > > -}
> > > > -
> > > >  /**
> > > >   * Enqueue several objects on the ring (multi-producers safe).
> > > >   *
> > > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_MT, free_space);
> > > > +return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_ST, free_space);
> > > > +return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >        unsigned int n, unsigned int *free_space)  { -switch
> > > > (r->prod.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > > -free_space); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > > -free_space); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void
> > > > * const *obj_table,  static __rte_always_inline int
> > > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  { -return
> > > > rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > > rte_ring *r, void
> > > > *obj)  {
> > > > -return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > > rte_ring *r, void *obj) { -return rte_ring_enqueue_bulk(r, &obj,
> > > > 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_MT, available);
> > > > +return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_ST, available);
> > > > +return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,  unsigned int *available)  { -switch
> > > > (r->cons.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
> > > > -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  { -return
> > > > rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > > +return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > > **obj_p)  static __rte_always_inline int
> > > > rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)  { -return
> > > > rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > > +return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > > rte_ring *r, void
> > > > **obj_p)  {
> > > > -return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > > +return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const
> *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_MT, free_space);
> > > > +return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > > *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_ST, free_space);
> > > > +return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > > >        unsigned int n, unsigned int *free_space)  { -switch
> > > > (r->prod.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > > -free_space); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > > -free_space); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_MT, available);
> > > > +return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > > *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_ST, available);
> > > > +return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -switch
> > > > (r->cons.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > > -available); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > > -available); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  #ifdef __cplusplus
> > > > --
> > > > 2.17.1
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] ring clean up
  2020-07-03 10:26 [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
                   ` (2 preceding siblings ...)
  2020-07-03 10:26 ` [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs Feifei Wang
@ 2020-07-08 14:51 ` David Marchand
  2020-07-08 17:05   ` Ananyev, Konstantin
  3 siblings, 1 reply; 23+ messages in thread
From: David Marchand @ 2020-07-08 14:51 UTC (permalink / raw)
  To: Feifei Wang; +Cc: dev, nd

On Fri, Jul 3, 2020 at 12:27 PM Feifei Wang <feifei.wang2@arm.com> wrote:
>
> Do some work for ring refactoring, which includes:
> 1. remove experimental tags
> 2. use element APIs to implement legacy APIs

This series triggers unit test failures.
https://travis-ci.com/github/ovsrobot/dpdk/builds/174196115

https://travis-ci.com/github/ovsrobot/dpdk/jobs/357314387#L1092

In my env:

test_refcnt_slave started at lcore 22
test_refcnt_slave started at lcore 23
test_refcnt_slave started at lcore 24
test_refcnt_slave started at lcore 25
test_refcnt_slave started at lcore 26
test_refcnt_slave started at lcore 27
test_refcnt_master started at lcore 1

Thread 6 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4723400 (LWP 374679)]
0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
1407            m_next = m->next;
Missing separate debuginfos, use: dnf debuginfo-install
dbus-libs-1.12.18-1.fc31.x86_64 elfutils-libelf-0.179-2.fc31.x86_64
jansson-2.12-4.fc31.x86_64 libbpf-0.0.7-1.fc31.x86_64
libbsd-0.9.1-4.fc31.x86_64 libfdt-1.6.0-1.fc31.x86_64
libgcc-9.3.1-2.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
libnl3-3.5.0-1.fc31.x86_64 libpcap-1.9.1-2.fc31.x86_64
lz4-libs-1.9.1-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
openssl-libs-1.1.1g-1.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
xz-libs-5.2.5-1.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
(gdb) bt
#0  0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
#1  test_refcnt_slave (arg=0x17f7e3940) at ../../dpdk/app/test/test_mbuf.c:1016
#2  0x0000000000446a89 in eal_thread_loop (arg=<optimized out>) at
../../dpdk/lib/librte_eal/linux/eal_thread.c:127
#3  0x00007ffff709c4e2 in start_thread () from /usr/lib64/libpthread.so.0
#4  0x00007ffff6fc96a3 in clone () from /usr/lib64/libc.so.6



Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] ring clean up
  2020-07-08 14:51 ` [dpdk-dev] [PATCH 0/3] ring clean up David Marchand
@ 2020-07-08 17:05   ` Ananyev, Konstantin
  2020-07-08 20:33     ` Honnappa Nagarahalli
  2020-07-09  2:27     ` Feifei Wang
  0 siblings, 2 replies; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-07-08 17:05 UTC (permalink / raw)
  To: David Marchand, Feifei Wang; +Cc: dev, nd

> On Fri, Jul 3, 2020 at 12:27 PM Feifei Wang <feifei.wang2@arm.com> wrote:
> >
> > Do some work for ring refactoring, which includes:
> > 1. remove experimental tags
> > 2. use element APIs to implement legacy APIs
> 
> This series triggers unit test failures.
> https://travis-ci.com/github/ovsrobot/dpdk/builds/174196115
> 
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/357314387#L1092
> 
> In my env:
> 
> test_refcnt_slave started at lcore 22
> test_refcnt_slave started at lcore 23
> test_refcnt_slave started at lcore 24
> test_refcnt_slave started at lcore 25
> test_refcnt_slave started at lcore 26
> test_refcnt_slave started at lcore 27
> test_refcnt_master started at lcore 1
> 
> Thread 6 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff4723400 (LWP 374679)]
> 0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> 1407            m_next = m->next;
> Missing separate debuginfos, use: dnf debuginfo-install
> dbus-libs-1.12.18-1.fc31.x86_64 elfutils-libelf-0.179-2.fc31.x86_64
> jansson-2.12-4.fc31.x86_64 libbpf-0.0.7-1.fc31.x86_64
> libbsd-0.9.1-4.fc31.x86_64 libfdt-1.6.0-1.fc31.x86_64
> libgcc-9.3.1-2.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
> libnl3-3.5.0-1.fc31.x86_64 libpcap-1.9.1-2.fc31.x86_64
> lz4-libs-1.9.1-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
> openssl-libs-1.1.1g-1.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
> xz-libs-5.2.5-1.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
> (gdb) bt
> #0  0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> #1  test_refcnt_slave (arg=0x17f7e3940) at ../../dpdk/app/test/test_mbuf.c:1016
> #2  0x0000000000446a89 in eal_thread_loop (arg=<optimized out>) at
> ../../dpdk/lib/librte_eal/linux/eal_thread.c:127
> #3  0x00007ffff709c4e2 in start_thread () from /usr/lib64/libpthread.so.0
> #4  0x00007ffff6fc96a3 in clone () from /usr/lib64/libc.so.6
> 


Ah, yes indeed there is problem within single enqueue.
It should be:

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a2190bfc..da17ed6d7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -284,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static __rte_always_inline int
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-       return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
+       return rte_ring_mp_enqueue_elem(r, &obj, sizeof(void *));
 }

 /**
@@ -301,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-       return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
+       return rte_ring_sp_enqueue_elem(r, &obj, sizeof(void *));
 }

 /**
@@ -322,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-       return rte_ring_enqueue_elem(r, obj, sizeof(void *));
+       return rte_ring_enqueue_elem(r, &obj, sizeof(void *));
 }

 /**


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] ring clean up
  2020-07-08 17:05   ` Ananyev, Konstantin
@ 2020-07-08 20:33     ` Honnappa Nagarahalli
  2020-07-09  2:27     ` Feifei Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Honnappa Nagarahalli @ 2020-07-08 20:33 UTC (permalink / raw)
  To: Ananyev, Konstantin, David Marchand, Feifei Wang
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>
> 
> > On Fri, Jul 3, 2020 at 12:27 PM Feifei Wang <feifei.wang2@arm.com> wrote:
> > >
> > > Do some work for ring refactoring, which includes:
> > > 1. remove experimental tags
> > > 2. use element APIs to implement legacy APIs
> >
> > This series triggers unit test failures.
> > https://travis-ci.com/github/ovsrobot/dpdk/builds/174196115
> >
> > https://travis-ci.com/github/ovsrobot/dpdk/jobs/357314387#L1092
> >
> > In my env:
> >
> > test_refcnt_slave started at lcore 22
> > test_refcnt_slave started at lcore 23
> > test_refcnt_slave started at lcore 24
> > test_refcnt_slave started at lcore 25
> > test_refcnt_slave started at lcore 26
> > test_refcnt_slave started at lcore 27
> > test_refcnt_master started at lcore 1
> >
> > Thread 6 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7ffff4723400 (LWP 374679)]
> > 0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> > ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> > 1407            m_next = m->next;
> > Missing separate debuginfos, use: dnf debuginfo-install
> > dbus-libs-1.12.18-1.fc31.x86_64 elfutils-libelf-0.179-2.fc31.x86_64
> > jansson-2.12-4.fc31.x86_64 libbpf-0.0.7-1.fc31.x86_64
> > libbsd-0.9.1-4.fc31.x86_64 libfdt-1.6.0-1.fc31.x86_64
> > libgcc-9.3.1-2.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
> > libnl3-3.5.0-1.fc31.x86_64 libpcap-1.9.1-2.fc31.x86_64
> > lz4-libs-1.9.1-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
> > openssl-libs-1.1.1g-1.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
> > xz-libs-5.2.5-1.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
> > (gdb) bt
> > #0  0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> > ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> > #1  test_refcnt_slave (arg=0x17f7e3940) at
> > ../../dpdk/app/test/test_mbuf.c:1016
> > #2  0x0000000000446a89 in eal_thread_loop (arg=<optimized out>) at
> > ../../dpdk/lib/librte_eal/linux/eal_thread.c:127
> > #3  0x00007ffff709c4e2 in start_thread () from
> > /usr/lib64/libpthread.so.0
> > #4  0x00007ffff6fc96a3 in clone () from /usr/lib64/libc.so.6
> >
> 
> 
> Ah, yes indeed there is problem within single enqueue.
> It should be:
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 2a2190bfc..da17ed6d7 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -284,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const
> *obj_table,  static __rte_always_inline int  rte_ring_mp_enqueue(struct
> rte_ring *r, void *obj)  {
> -       return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_mp_enqueue_elem(r, &obj, sizeof(void *));
>  }
> 
>  /**
> @@ -301,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> *obj)  {
> -       return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_sp_enqueue_elem(r, &obj, sizeof(void *));
>  }
> 
>  /**
> @@ -322,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> {
> -       return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_enqueue_elem(r, &obj, sizeof(void *));
>  }
Thanks, same conclusion. table_autotest was failing, that passes as well.

> 
>  /**


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] ring clean up
  2020-07-08 17:05   ` Ananyev, Konstantin
  2020-07-08 20:33     ` Honnappa Nagarahalli
@ 2020-07-09  2:27     ` Feifei Wang
  1 sibling, 0 replies; 23+ messages in thread
From: Feifei Wang @ 2020-07-09  2:27 UTC (permalink / raw)
  To: Ananyev, Konstantin, David Marchand; +Cc: dev, nd, nd



> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: 2020年7月9日 1:06
> To: David Marchand <david.marchand@redhat.com>; Feifei Wang
> <Feifei.Wang2@arm.com>
> Cc: dev <dev@dpdk.org>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 0/3] ring clean up
> 
> > On Fri, Jul 3, 2020 at 12:27 PM Feifei Wang <feifei.wang2@arm.com> wrote:
> > >
> > > Do some work for ring refactoring, which includes:
> > > 1. remove experimental tags
> > > 2. use element APIs to implement legacy APIs
> >
> > This series triggers unit test failures.
> > https://travis-ci.com/github/ovsrobot/dpdk/builds/174196115
> >
> > https://travis-ci.com/github/ovsrobot/dpdk/jobs/357314387#L1092
> >
> > In my env:
> >
> > test_refcnt_slave started at lcore 22
> > test_refcnt_slave started at lcore 23
> > test_refcnt_slave started at lcore 24
> > test_refcnt_slave started at lcore 25
> > test_refcnt_slave started at lcore 26
> > test_refcnt_slave started at lcore 27
> > test_refcnt_master started at lcore 1
> >
> > Thread 6 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7ffff4723400 (LWP 374679)]
> > 0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> > ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> > 1407            m_next = m->next;
> > Missing separate debuginfos, use: dnf debuginfo-install
> > dbus-libs-1.12.18-1.fc31.x86_64 elfutils-libelf-0.179-2.fc31.x86_64
> > jansson-2.12-4.fc31.x86_64 libbpf-0.0.7-1.fc31.x86_64
> > libbsd-0.9.1-4.fc31.x86_64 libfdt-1.6.0-1.fc31.x86_64
> > libgcc-9.3.1-2.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
> > libnl3-3.5.0-1.fc31.x86_64 libpcap-1.9.1-2.fc31.x86_64
> > lz4-libs-1.9.1-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
> > openssl-libs-1.1.1g-1.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
> > xz-libs-5.2.5-1.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
> > (gdb) bt
> > #0  0x00000000007e29d0 in rte_pktmbuf_free (m=0xffff000100010000) at
> > ../../dpdk/lib/librte_mbuf/rte_mbuf.h:1407
> > #1  test_refcnt_slave (arg=0x17f7e3940) at
> > ../../dpdk/app/test/test_mbuf.c:1016
> > #2  0x0000000000446a89 in eal_thread_loop (arg=<optimized out>) at
> > ../../dpdk/lib/librte_eal/linux/eal_thread.c:127
> > #3  0x00007ffff709c4e2 in start_thread () from
> > /usr/lib64/libpthread.so.0
> > #4  0x00007ffff6fc96a3 in clone () from /usr/lib64/libc.so.6
> >
> 
> 
Hi, Konstantin

Thanks very much for your effort.
We are in the process of solving this problem and modify the same part. Until now, the tested result is OK.
Currently, we are conducting internal testing. After confirming it is fully secure, the new version will be sent to the community as soon as possible.

Thanks again
Feifei
> Ah, yes indeed there is problem within single enqueue.
> It should be:
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 2a2190bfc..da17ed6d7 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -284,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,  static __rte_always_inline int
> rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> -       return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_mp_enqueue_elem(r, &obj, sizeof(void *));
>  }
> 
>  /**
> @@ -301,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> *obj)  {
> -       return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_sp_enqueue_elem(r, &obj, sizeof(void *));
>  }
> 
>  /**
> @@ -322,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> {
> -       return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> +       return rte_ring_enqueue_elem(r, &obj, sizeof(void *));
>  }
> 
>  /**


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-07-09  2:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-03 10:26 [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
2020-07-03 10:26 ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
2020-07-03 16:16   ` Kinsella, Ray
2020-07-03 18:46     ` Honnappa Nagarahalli
2020-07-06  6:23       ` Kinsella, Ray
2020-07-07  3:19         ` Feifei Wang
2020-07-07  7:40           ` Kinsella, Ray
2020-07-03 10:26 ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
2020-07-03 16:17   ` Kinsella, Ray
2020-07-07 14:44   ` Ananyev, Konstantin
2020-07-03 10:26 ` [dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs Feifei Wang
2020-07-07  5:19   ` Feifei Wang
2020-07-07 14:04     ` Ananyev, Konstantin
2020-07-07 16:21       ` Honnappa Nagarahalli
2020-07-08  3:19         ` Feifei Wang
2020-07-07 20:07       ` David Christensen
2020-07-08  1:30         ` Feifei Wang
2020-07-08  1:21       ` Feifei Wang
2020-07-07 14:45   ` Ananyev, Konstantin
2020-07-08 14:51 ` [dpdk-dev] [PATCH 0/3] ring clean up David Marchand
2020-07-08 17:05   ` Ananyev, Konstantin
2020-07-08 20:33     ` Honnappa Nagarahalli
2020-07-09  2:27     ` Feifei Wang

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git