DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app
@ 2021-08-02 10:18 Joyce Kong
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Since atomic operations have been adopted in DPDK now[1],
change rte_atomicNN_xxx APIs to compiler's atomic built-ins
in app module[2].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html

Joyce Kong (12):
  test/pmd_perf: use compiler atomic builtins for polling sync
  test/ring_perf: use compiler atomic builtins for lcores sync
  test/timer: use compiler atomic builtins for sync
  test/stack_perf: use compiler atomics for lcore sync
  test/bpf: use compiler atomics for calculation
  test/func_reentrancy: use compiler atomic for data sync
  app/eventdev: use compiler atomic builtins for packets sync
  app/crypto: use compiler atomic builtins for display sync
  app/compress: use compiler atomic builtins for display sync
  app/testpmd: use compiler atomic builtins for port sync
  app/bbdev: use compiler atomics for thread sync
  app: remove unnecessary include of atomic

 app/proc-info/main.c                          |   1 -
 app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
 .../comp_perf_test_common.h                   |   2 +-
 .../comp_perf_test_cyclecount.c               |  10 +-
 .../comp_perf_test_throughput.c               |  11 +-
 .../comp_perf_test_verify.c                   |   6 +-
 app/test-crypto-perf/cperf_test_latency.c     |   6 +-
 .../cperf_test_pmd_cyclecount.c               |   9 +-
 app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
 app/test-crypto-perf/cperf_test_verify.c      |   9 +-
 app/test-eventdev/evt_main.c                  |   1 -
 app/test-eventdev/test_order_atq.c            |   4 +-
 app/test-eventdev/test_order_common.c         |   4 +-
 app/test-eventdev/test_order_common.h         |   8 +-
 app/test-eventdev/test_order_queue.c          |   4 +-
 app/test-pipeline/config.c                    |   1 -
 app/test-pipeline/init.c                      |   1 -
 app/test-pipeline/main.c                      |   1 -
 app/test-pipeline/runtime.c                   |   1 -
 app/test-pmd/cmdline.c                        |   1 -
 app/test-pmd/config.c                         |   1 -
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/icmpecho.c                       |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/parameters.c                     |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |  75 ++++++----
 app/test-pmd/txonly.c                         |   1 -
 app/test/test_barrier.c                       |   1 -
 app/test/test_bpf.c                           |  28 ++--
 app/test/test_func_reentrancy.c               |  27 ++--
 app/test/test_mbuf.c                          |   1 -
 app/test/test_mp_secondary.c                  |   1 -
 app/test/test_pmd_perf.c                      |  12 +-
 app/test/test_ring.c                          |   1 -
 app/test/test_ring_perf.c                     |   9 +-
 app/test/test_stack_perf.c                    |  14 +-
 app/test/test_timer.c                         |  28 ++--
 app/test/test_timer_secondary.c               |   1 -
 42 files changed, 213 insertions(+), 219 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-08 22:50   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for polling sync in pmd_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_pmd_perf.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 3a248d512c..41ef001b90 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -10,7 +10,6 @@
 #include <rte_cycles.h>
 #include <rte_ethdev.h>
 #include <rte_byteorder.h>
-#include <rte_atomic.h>
 #include <rte_malloc.h>
 #include "packet_burst_generator.h"
 #include "test.h"
@@ -526,7 +525,7 @@ main_loop(__rte_unused void *args)
 	return 0;
 }
 
-static rte_atomic64_t start;
+static uint64_t start;
 
 static inline int
 poll_burst(void *args)
@@ -564,8 +563,7 @@ poll_burst(void *args)
 		num[portid] = pkt_per_port;
 	}
 
-	while (!rte_atomic64_read(&start))
-		;
+	rte_wait_until_equal_64(&start, 1, __ATOMIC_RELAXED);
 
 	cur_tsc = rte_rdtsc();
 	while (total) {
@@ -617,7 +615,7 @@ exec_burst(uint32_t flags, int lcore)
 	pkt_per_port = MAX_TRAFFIC_BURST;
 	num = pkt_per_port * conf->nb_ports;
 
-	rte_atomic64_init(&start);
+	__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
 
 	/* start polling thread, but not actually poll yet */
 	rte_eal_remote_launch(poll_burst,
@@ -625,7 +623,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* Only when polling first */
 	if (flags == SC_BURST_POLL_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
 
 	/* start xmit */
 	i = 0;
@@ -642,7 +640,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* only when polling second  */
 	if (flags == SC_BURST_XMIT_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
 
 	/* wait for polling finished */
 	diff_tsc = rte_eal_wait_lcore(lcore);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09  5:43   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for lcores sync in ring_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_ring_perf.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index fd82e20412..2d8bb675a3 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 	return 0;
 }
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 static uint64_t queue_count[RTE_MAX_LCORE];
 
 #define TIME_MS 100
@@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			rte_pause();
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
@@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize)
 		param.r = r;
 
 		/* clear synchro and start workers */
-		rte_atomic32_set(&synchro, 0);
+		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN) < 0)
 			return -1;
 
 		/* start synchro and launch test on main */
-		rte_atomic32_set(&synchro, 1);
+		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 20:59   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for lcore_state and collisions sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_timer.c           | 28 ++++++++++++----------------
 app/test/test_timer_secondary.c |  1 -
 2 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/app/test/test_timer.c b/app/test/test_timer.c
index a10b2fe9da..9a3e1b53e4 100644
--- a/app/test/test_timer.c
+++ b/app/test/test_timer.c
@@ -102,7 +102,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_random.h>
 #include <rte_malloc.h>
@@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
 
 /* Need to synchronize worker lcores through multiple steps. */
 enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED };
-static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
+static uint16_t lcore_state[RTE_MAX_LCORE];
 
 static void
 main_init_workers(void)
@@ -211,7 +210,7 @@ main_init_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
+		__atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -221,11 +220,10 @@ main_start_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
+		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELAXED);
 	}
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -235,8 +233,7 @@ main_wait_for_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_RELAXED);
 	}
 }
 
@@ -245,9 +242,8 @@ worker_wait_to_start(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL)
-		rte_pause();
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
+	rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_RELAXED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELAXED);
 }
 
 static void
@@ -255,7 +251,7 @@ worker_finish(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELAXED);
 }
 
 
@@ -281,12 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
 	unsigned int lcore_id = rte_lcore_id();
 	unsigned int main_lcore = rte_get_main_lcore();
 	int32_t my_collisions = 0;
-	static rte_atomic32_t collisions;
+	static uint32_t collisions;
 
 	if (lcore_id == main_lcore) {
 		cb_count = 0;
 		test_failed = 0;
-		rte_atomic32_set(&collisions, 0);
+		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
 		main_init_workers();
 		timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0);
 		if (timers == NULL) {
@@ -315,7 +311,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 			my_collisions++;
 	}
 	if (my_collisions != 0)
-		rte_atomic32_add(&collisions, my_collisions);
+		__atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED);
 
 	/* wait long enough for timers to expire */
 	rte_delay_ms(100);
@@ -329,7 +325,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 
 	/* now check that we get the right number of callbacks */
 	if (lcore_id == main_lcore) {
-		my_collisions = rte_atomic32_read(&collisions);
+		my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED);
 		if (my_collisions != 0)
 			printf("- %d timer reset collisions (OK)\n", my_collisions);
 		rte_timer_manage();
diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
index 16a9f1878b..5795c97f07 100644
--- a/app/test/test_timer_secondary.c
+++ b/app/test/test_timer_secondary.c
@@ -9,7 +9,6 @@
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_memzone.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_cycles.h>
 #include <rte_mempool.h>
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (2 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 21:12   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for lcore sync in stack_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_stack_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c
index 4ee40d5d19..1eae00a334 100644
--- a/app/test/test_stack_perf.c
+++ b/app/test/test_stack_perf.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <inttypes.h>
 
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_launch.h>
 #include <rte_pause.h>
@@ -24,7 +23,7 @@
  */
 static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
 
-static rte_atomic32_t lcore_barrier;
+static uint32_t lcore_barrier;
 
 struct lcore_pair {
 	unsigned int c1;
@@ -144,9 +143,8 @@ bulk_push_pop(void *p)
 	s = args->s;
 	size = args->sz;
 
-	rte_atomic32_sub(&lcore_barrier, 1);
-	while (rte_atomic32_read(&lcore_barrier) != 0)
-		rte_pause();
+	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	uint64_t start = rte_rdtsc();
 
@@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s,
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
-		rte_atomic32_set(&lcore_barrier, 2);
+		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
 
 		args[0].sz = args[1].sz = bulk_sizes[i];
 		args[0].s = args[1].s = s;
@@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n)
 		int cnt = 0;
 		double avg;
 
-		rte_atomic32_set(&lcore_barrier, n);
+		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
 
 		RTE_LCORE_FOREACH_WORKER(lcore_id) {
 			if (++cnt >= n)
@@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
 	struct lcore_pair cores;
 	struct rte_stack *s;
 
-	rte_atomic32_init(&lcore_barrier);
+	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 05/12] test/bpf: use compiler atomics for calculation
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (3 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync Joyce Kong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for calculation in bpf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_bpf.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b807..8d78e5761b 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -1553,32 +1553,32 @@ test_xadd1_check(uint64_t rc, const void *arg)
 	memset(&dfe, 0, sizeof(dfe));
 
 	rv = 1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = -1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = (int32_t)TEST_FILL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_3;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (4 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 21:54   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync Joyce Kong
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for data sync in func_reentrancy test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 231c99a9eb..c00ecb8110 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
@@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
 
 #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
 
-static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
-static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
+static uint32_t obj_count;
+static uint32_t synchro;
 
 #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
 	if (lcore_self != rte_get_main_lcore())                  \
-		while (rte_atomic32_read(&synchro) == 0);        \
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
 } while(0)
 
 /*
@@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
 
 	WAIT_SYNCHRO_FOR_WORKERS();
 
-	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
+	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */
 	if (rte_eal_init(0, NULL) != -1)
 		return -1;
 
@@ -112,7 +111,7 @@ ring_create_lookup(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
 		if (rp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -176,7 +175,7 @@ mempool_create_lookup(__rte_unused void *arg)
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
 		if (mp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -239,7 +238,7 @@ hash_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_hash_create(&hash_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple times simultaneously */
@@ -303,7 +302,7 @@ fbk_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_fbk_hash_create(&fbk_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -365,7 +364,7 @@ lpm_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, &config);
 		if (lpm != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -427,8 +426,8 @@ launch_test(struct test_case *pt_case)
 	if (pt_case->func == NULL)
 		return -1;
 
-	rte_atomic32_set(&obj_count, 0);
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
 		if (cores == 1)
@@ -437,7 +436,7 @@ launch_test(struct test_case *pt_case)
 		rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
 	}
 
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	if (pt_case->func(pt_case->arg) < 0)
 		ret = -1;
@@ -454,7 +453,7 @@ launch_test(struct test_case *pt_case)
 			pt_case->clean(lcore_id);
 	}
 
-	count = rte_atomic32_read(&obj_count);
+	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
 	if (count != 1) {
 		printf("%s: common object allocated %d times (should be 1)\n",
 			pt_case->name, count);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (5 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-10 23:19   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for outstanding_pkts sync in eventdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/evt_main.c          | 1 -
 app/test-eventdev/test_order_atq.c    | 4 ++--
 app/test-eventdev/test_order_common.c | 4 ++--
 app/test-eventdev/test_order_common.h | 8 ++++----
 app/test-eventdev/test_order_queue.c  | 4 ++--
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index a8d304bab3..01434691e1 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -6,7 +6,6 @@
 #include <unistd.h>
 #include <signal.h>
 
-#include <rte_atomic.h>
 #include <rte_debug.h>
 #include <rte_eal.h>
 #include <rte_eventdev.h>
diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c
index 71215a07b6..2fee4b4daa 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c
index d7760061ba..ff7813f9c2 100644
--- a/app/test-eventdev/test_order_common.c
+++ b/app/test-eventdev/test_order_common.c
@@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt)
 		evt_err("failed to allocate t->expected_flow_seq memory");
 		goto exp_nomem;
 	}
-	rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts);
+	__atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED);
 	t->err = false;
 	t->nb_pkts = opt->nb_pkts;
 	t->nb_flows = opt->nb_flows;
@@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt,
 
 	while (t->err == false) {
 		uint64_t new_cycles = rte_get_timer_cycles();
-		int64_t remaining = rte_atomic64_read(&t->outstand_pkts);
+		int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED);
 
 		if (remaining <= 0) {
 			t->result = EVT_TEST_SUCCESS;
diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h
index cd9d6009ec..1507265928 100644
--- a/app/test-eventdev/test_order_common.h
+++ b/app/test-eventdev/test_order_common.h
@@ -48,7 +48,7 @@ struct test_order {
 	 * The atomic_* is an expensive operation,Since it is a functional test,
 	 * We are using the atomic_ operation to reduce the code complexity.
 	 */
-	rte_atomic64_t outstand_pkts;
+	uint64_t outstand_pkts;
 	enum evt_test_result result;
 	uint32_t nb_flows;
 	uint64_t nb_pkts;
@@ -95,7 +95,7 @@ static __rte_always_inline void
 order_process_stage_1(struct test_order *const t,
 		struct rte_event *const ev, const uint32_t nb_flows,
 		uint32_t *const expected_flow_seq,
-		rte_atomic64_t *const outstand_pkts)
+		uint64_t *const outstand_pkts)
 {
 	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
 	/* compare the seqn against expected value */
@@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t,
 	 */
 	expected_flow_seq[flow]++;
 	rte_pktmbuf_free(ev->mbuf);
-	rte_atomic64_sub(outstand_pkts, 1);
+	__atomic_fetch_sub(outstand_pkts, 1, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline void
@@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t,
 	const uint8_t port = w->port_id;\
 	const uint32_t nb_flows = t->nb_flows;\
 	uint32_t *expected_flow_seq = t->expected_flow_seq;\
-	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
+	uint64_t *outstand_pkts = &t->outstand_pkts;\
 	if (opt->verbose_level > 1)\
 		printf("%s(): lcore %d dev_id %d port=%d\n",\
 			__func__, rte_lcore_id(), dev_id, port)
diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c
index 621367805a..80eaea5cf5 100644
--- a/app/test-eventdev/test_order_queue.c
+++ b/app/test-eventdev/test_order_queue.c
@@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (6 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 22:11   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 09/12] app/compress: " Joyce Kong
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Covert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync in crypto cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
 app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
 app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index 159fe8492b..5e73c75fba 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	if (ctx == NULL)
 		return 0;
@@ -308,7 +308,9 @@ cperf_latency_test_runner(void *arg)
 		time_min = tunit*(double)(tsc_min) / tsc_hz;
 
 		if (ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			uint16_t exp = 0;
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, "
 						"cycles, time (us)");
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index 844659aeca..a1de334efb 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 	state.lcore = rte_lcore_id();
 	state.linearize = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static bool warmup = true;
 
 	/*
@@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 			continue;
 		}
 
+		uint16_t exp = 0;
 		if (!opts->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(PRETTY_HDR_FMT, "lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
@@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 					state.cycles_per_enq,
 					state.cycles_per_deq);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index f6eb8cf259..1407007c6e 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -106,7 +106,7 @@ cperf_throughput_test_runner(void *test_ctx)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	struct rte_crypto_op *ops[ctx->options->max_burst_size];
 	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
@@ -272,8 +272,10 @@ cperf_throughput_test_runner(void *test_ctx)
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
 
+		uint16_t exp = 0;
 		if (!ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 					"lcore id", "Buf Size", "Burst Size",
 					"Enqueued", "Dequeued", "Failed Enq",
@@ -293,7 +295,8 @@ cperf_throughput_test_runner(void *test_ctx)
 					throughput_gbps,
 					cycles_per_packet);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("#lcore id,Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 2939aeaa93..0c053ad3c0 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
 	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 	uint64_t ops_failed = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	uint64_t i;
 	uint16_t ops_unused = 0;
@@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
 		ops_deqd_total += ops_deqd;
 	}
 
+	uint16_t exp = 0;
 	if (!ctx->options->csv) {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 				"lcore id", "Buf Size", "Burst size",
 				"Enqueued", "Dequeued", "Failed Enq",
@@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
 				ops_deqd_failed,
 				ops_failed);
 	} else {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("\n# lcore id, Buffer Size(B), "
 				"Burst Size,Enqueued,Dequeued,Failed Enq,"
 				"Failed Deq,Failed Ops\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 09/12] app/compress: use compiler atomic builtins for display sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (7 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 22:59   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync Joyce Kong
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-compress-perf/comp_perf_test_common.h     |  2 +-
 app/test-compress-perf/comp_perf_test_cyclecount.c | 10 +++++++---
 app/test-compress-perf/comp_perf_test_throughput.c | 11 ++++++++---
 app/test-compress-perf/comp_perf_test_verify.c     |  6 ++++--
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h
index 72705c6a2b..d039e5a29a 100644
--- a/app/test-compress-perf/comp_perf_test_common.h
+++ b/app/test-compress-perf/comp_perf_test_common.h
@@ -14,7 +14,7 @@ struct cperf_mem_resources {
 	uint16_t qp_id;
 	uint8_t lcore_id;
 
-	rte_atomic16_t print_info_once;
+	uint16_t print_info_once;
 
 	uint32_t total_bufs;
 	uint8_t *compressed_data;
diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c
index 55559a7d5a..e002e53bdf 100644
--- a/app/test-compress-perf/comp_perf_test_cyclecount.c
+++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
@@ -468,7 +468,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	struct cperf_cyclecount_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static rte_spinlock_t print_spinlock;
 	int i;
 
@@ -488,10 +488,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED,  __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -547,8 +549,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	duration_setup_per_op = ctx->duration_op /
 			(ctx->ver.mem.total_bufs * test_data->num_iter);
 
+	exp = 0;
 	/* R E P O R T processing */
-	if (rte_atomic16_test_and_set(&display_once)) {
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 
 		rte_spinlock_lock(&print_spinlock);
 
diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c
index 13922b658c..f587ad2ec3 100644
--- a/app/test-compress-perf/comp_perf_test_throughput.c
+++ b/app/test-compress-perf/comp_perf_test_throughput.c
@@ -329,15 +329,18 @@ cperf_throughput_test_runner(void *test_ctx)
 	struct cperf_benchmark_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	int i, ret = EXIT_SUCCESS;
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -391,7 +394,9 @@ cperf_throughput_test_runner(void *test_ctx)
 	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
 			1000000000;
 
-	if (rte_atomic16_test_and_set(&display_once)) {
+	exp = 0;
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 		printf("\n%12s%6s%12s%17s%15s%16s\n",
 			"lcore id", "Level", "Comp size", "Comp ratio [%]",
 			"Comp [Gbps]", "Decomp [Gbps]");
diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c
index 5e13257b79..6a2497985b 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
 	struct cperf_verify_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->options;
 	int ret = EXIT_SUCCESS;
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	uint32_t lcore = rte_lcore_id();
 
 	ctx->mem.lcore_id = lcore;
@@ -428,7 +428,9 @@ cperf_verify_test_runner(void *test_ctx)
 			test_data->input_data_sz * 100;
 
 	if (!ctx->silent) {
-		if (rte_atomic16_test_and_set(&display_once)) {
+		uint16_t exp = 0;
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			printf("%12s%6s%12s%17s\n",
 			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
 		}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (8 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 09/12] app/compress: " Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-09 23:14   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync Joyce Kong
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Replace rte_atomic16_cmpset operation with compiler atomic
CAS operation for port status sync in testpmd case.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-pmd/testpmd.c | 75 +++++++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6cbe9ba3c8..22579dd438 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -36,7 +36,6 @@
 #include <rte_alarm.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
@@ -2302,6 +2301,7 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 	uint16_t peer_tx_port = pi;
 	uint32_t manual = 1;
 	uint32_t tx_exp = hairpin_mode & 0x10;
+	uint16_t port_exp;
 
 	if (!(hairpin_mode & 0xf)) {
 		peer_rx_port = pi;
@@ -2347,10 +2347,11 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		if (diag == 0)
 			continue;
 
+		port_exp = RTE_PORT_HANDLING;
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_STOPPED, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2370,10 +2371,11 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 		if (diag == 0)
 			continue;
 
+		port_exp = RTE_PORT_HANDLING;
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_STOPPED, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2444,6 +2446,7 @@ start_port(portid_t pid)
 	queueid_t qi;
 	struct rte_port *port;
 	struct rte_eth_hairpin_cap cap;
+	uint16_t port_exp;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -2454,8 +2457,10 @@ start_port(portid_t pid)
 
 		need_check_link_status = 0;
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED,
-						 RTE_PORT_HANDLING) == 0) {
+		port_exp = RTE_PORT_STOPPED;
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_HANDLING, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0) {
 			fprintf(stderr, "Port %d is now not stopped\n", pi);
 			continue;
 		}
@@ -2487,8 +2492,10 @@ start_port(portid_t pid)
 						     nb_txq + nb_hairpinq,
 						     &(port->dev_conf));
 			if (diag != 0) {
-				if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+				port_exp = RTE_PORT_HANDLING;
+				if (__atomic_compare_exchange_n(&(port->port_status),
+						&port_exp, RTE_PORT_STOPPED, 0,
+						__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2518,10 +2525,11 @@ start_port(portid_t pid)
 				if (diag == 0)
 					continue;
 
+				port_exp = RTE_PORT_HANDLING;
 				/* Fail to setup tx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (__atomic_compare_exchange_n(&(port->port_status),
+						&port_exp, RTE_PORT_STOPPED, 0,
+						__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2570,10 +2578,11 @@ start_port(portid_t pid)
 				if (diag == 0)
 					continue;
 
+				port_exp = RTE_PORT_HANDLING;
 				/* Fail to setup rx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (__atomic_compare_exchange_n(&(port->port_status),
+						&port_exp, RTE_PORT_STOPPED, 0,
+						__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2607,17 +2616,21 @@ start_port(portid_t pid)
 			fprintf(stderr, "Fail to start port %d: %s\n",
 				pi, rte_strerror(-diag));
 
+			port_exp = RTE_PORT_HANDLING;
 			/* Fail to setup rx queue, return */
-			if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+			if (__atomic_compare_exchange_n(&(port->port_status),
+					&port_exp, RTE_PORT_STOPPED, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 				fprintf(stderr,
 					"Port %d can not be set back to stopped\n",
 					pi);
 			continue;
 		}
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
+		port_exp = RTE_PORT_HANDLING;
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_STARTED, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 			fprintf(stderr, "Port %d can not be set into started\n",
 				pi);
 
@@ -2697,6 +2710,7 @@ stop_port(portid_t pid)
 	int need_check_link_status = 0;
 	portid_t peer_pl[RTE_MAX_ETHPORTS];
 	int peer_pi;
+	uint16_t port_exp;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return;
@@ -2722,8 +2736,10 @@ stop_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED,
-						RTE_PORT_HANDLING) == 0)
+		port_exp = RTE_PORT_STARTED;
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_HANDLING, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 			continue;
 
 		if (hairpin_mode & 0xf) {
@@ -2749,8 +2765,10 @@ stop_port(portid_t pid)
 			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
 				pi);
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+		port_exp = RTE_PORT_HANDLING;
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_STOPPED, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED) == 0)
 			fprintf(stderr, "Port %d can not be set into stopped\n",
 				pi);
 		need_check_link_status = 1;
@@ -2788,6 +2806,7 @@ close_port(portid_t pid)
 {
 	portid_t pi;
 	struct rte_port *port;
+	uint16_t port_exp;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return;
@@ -2813,8 +2832,10 @@ close_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
+		port_exp = RTE_PORT_CLOSED;
+		if (__atomic_compare_exchange_n(&(port->port_status),
+				&port_exp, RTE_PORT_CLOSED, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			fprintf(stderr, "Port %d is already closed\n", pi);
 			continue;
 		}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (9 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-11-10 21:25   ` Honnappa Nagarahalli
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 12/12] app: remove unnecessary include of atomic Joyce Kong
  2021-10-21  6:35 ` [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
  12 siblings, 1 reply; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins
for thread params sync in bbdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 469597b8b3..dc62e16216 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -133,7 +133,7 @@ struct test_op_params {
 	uint16_t num_to_process;
 	uint16_t num_lcores;
 	int vector_mask;
-	rte_atomic16_t sync;
+	uint16_t sync;
 	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
 };
 
@@ -148,9 +148,9 @@ struct thread_params {
 	uint8_t iter_count;
 	double iter_average;
 	double bler;
-	rte_atomic16_t nb_dequeued;
-	rte_atomic16_t processing_status;
-	rte_atomic16_t burst_sz;
+	uint16_t nb_dequeued;
+	int16_t processing_status;
+	uint16_t burst_sz;
 	struct test_op_params *op_params;
 	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
 	struct rte_bbdev_enc_op *enc_ops[MAX_BURST];
@@ -2594,46 +2594,46 @@ dequeue_event_callback(uint16_t dev_id,
 	}
 
 	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		printf(
 			"Dequeue interrupt handler called for incorrect event!\n");
 		return;
 	}
 
-	burst_sz = rte_atomic16_read(&tp->burst_sz);
+	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
 	num_ops = tp->op_params->num_to_process;
 
 	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
 		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
 		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
 		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else /*RTE_BBDEV_OP_TURBO_ENC*/
 		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 
 	if (deq < burst_sz) {
 		printf(
 			"After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n",
 			burst_sz, deq);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
-	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
-		rte_atomic16_add(&tp->nb_dequeued, deq);
+	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) {
+		__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2670,7 +2670,7 @@ dequeue_event_callback(uint16_t dev_id,
 
 	if (ret) {
 		printf("Buffers validation failed\n");
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 	}
 
 	switch (test_vector.op_type) {
@@ -2691,7 +2691,7 @@ dequeue_event_callback(uint16_t dev_id,
 		break;
 	default:
 		printf("Unknown op type: %d\n", test_vector.op_type);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2700,7 +2700,7 @@ dequeue_event_callback(uint16_t dev_id,
 	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
 			((double)total_time / (double)rte_get_tsc_hz());
 
-	rte_atomic16_add(&tp->nb_dequeued, deq);
+	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 }
 
 static int
@@ -2738,11 +2738,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2790,17 +2789,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2835,11 +2832,10 @@ throughput_intr_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2880,17 +2876,15 @@ throughput_intr_lcore_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2925,11 +2919,10 @@ throughput_intr_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -2969,17 +2962,15 @@ throughput_intr_lcore_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3015,11 +3006,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3061,17 +3051,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3105,8 +3093,7 @@ throughput_pmd_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3209,8 +3196,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3339,8 +3325,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3456,8 +3441,7 @@ throughput_pmd_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3547,8 +3531,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3731,7 +3714,7 @@ bler_test(struct active_device *ad,
 	else
 		return TEST_SKIPPED;
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3754,7 +3737,7 @@ bler_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = bler_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3849,7 +3832,7 @@ throughput_test(struct active_device *ad,
 			throughput_function = throughput_pmd_lcore_enc;
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3872,7 +3855,7 @@ throughput_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = throughput_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3902,29 +3885,29 @@ throughput_test(struct active_device *ad,
 	 * Wait for main lcore operations.
 	 */
 	tp = &t_params[0];
-	while ((rte_atomic16_read(&tp->nb_dequeued) <
-			op_params->num_to_process) &&
-			(rte_atomic16_read(&tp->processing_status) !=
-			TEST_FAILED))
+	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+		op_params->num_to_process) &&
+		(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+		TEST_FAILED))
 		rte_pause();
 
 	tp->ops_per_sec /= TEST_REPETITIONS;
 	tp->mbps /= TEST_REPETITIONS;
-	ret |= (int)rte_atomic16_read(&tp->processing_status);
+	ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 
 	/* Wait for worker lcores operations */
 	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
 		tp = &t_params[used_cores];
 
-		while ((rte_atomic16_read(&tp->nb_dequeued) <
-				op_params->num_to_process) &&
-				(rte_atomic16_read(&tp->processing_status) !=
-				TEST_FAILED))
+		while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+			op_params->num_to_process) &&
+			(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+			TEST_FAILED))
 			rte_pause();
 
 		tp->ops_per_sec /= TEST_REPETITIONS;
 		tp->mbps /= TEST_REPETITIONS;
-		ret |= (int)rte_atomic16_read(&tp->processing_status);
+		ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 	}
 
 	/* Print throughput if test passed */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dpdk-dev] [PATCH v1 12/12] app: remove unnecessary include of atomic
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (10 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync Joyce Kong
@ 2021-08-02 10:18 ` Joyce Kong
  2021-10-21  6:35 ` [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
  12 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-08-02 10:18 UTC (permalink / raw)
  To: thomas, david.marchand, honnappa.nagarahalli, ruifeng.wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Remove the unnecessary rte_atomic.h included in app module.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/proc-info/main.c         | 1 -
 app/test-pipeline/config.c   | 1 -
 app/test-pipeline/init.c     | 1 -
 app/test-pipeline/main.c     | 1 -
 app/test-pipeline/runtime.c  | 1 -
 app/test-pmd/cmdline.c       | 1 -
 app/test-pmd/config.c        | 1 -
 app/test-pmd/csumonly.c      | 1 -
 app/test-pmd/flowgen.c       | 1 -
 app/test-pmd/icmpecho.c      | 1 -
 app/test-pmd/iofwd.c         | 1 -
 app/test-pmd/macfwd.c        | 1 -
 app/test-pmd/macswap.c       | 1 -
 app/test-pmd/parameters.c    | 1 -
 app/test-pmd/rxonly.c        | 1 -
 app/test-pmd/txonly.c        | 1 -
 app/test/test_barrier.c      | 1 -
 app/test/test_mbuf.c         | 1 -
 app/test/test_mp_secondary.c | 1 -
 app/test/test_ring.c         | 1 -
 20 files changed, 20 deletions(-)

diff --git a/app/proc-info/main.c b/app/proc-info/main.c
index a8e928fa9f..b5c86f20ac 100644
--- a/app/proc-info/main.c
+++ b/app/proc-info/main.c
@@ -27,7 +27,6 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
 #include <rte_metrics.h>
diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c
index 33f3f1c827..daf838948b 100644
--- a/app/test-pipeline/config.c
+++ b/app/test-pipeline/config.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index fe37d63730..b97d5d7b9d 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c
index 72e4797ff2..1e16794183 100644
--- a/app/test-pipeline/main.c
+++ b/app/test-pipeline/main.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 159192bcd8..d939a85d7e 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 82253bc751..a7c555f71e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 31d8ba1b91..ec46b87b4e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -27,7 +27,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index bd5ad64a57..41efef5c8c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index 3bf6e1ce97..fb49c8f630 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 8948f28eb5..631b96e1f8 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -20,7 +20,6 @@
 #include <rte_cycles.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 83d098adcb..19cd920f70 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -23,7 +23,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memcpy.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index 0568ea794d..6d0c7859e4 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 310bca06af..4627ff83e9 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 7c13210f04..78392b7e8b 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -30,7 +30,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_interrupts.h>
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index c78fc4609a..d1a579d8d8 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index aed820f5d3..b1be8fe9f3 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c
index c27f8a0742..898c2516ed 100644
--- a/app/test/test_barrier.c
+++ b/app/test/test_barrier.c
@@ -24,7 +24,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 9a248dfaea..5d643ba34f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 5b6f05dbb1..021ca0547f 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -28,7 +28,6 @@
 #include <rte_lcore.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
-#include <rte_atomic.h>
 #include <rte_ring.h>
 #include <rte_debug.h>
 #include <rte_log.h>
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fb8532a409..bde33ab4a1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app
  2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
                   ` (11 preceding siblings ...)
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 12/12] app: remove unnecessary include of atomic Joyce Kong
@ 2021-10-21  6:35 ` Joyce Kong
  12 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-10-21  6:35 UTC (permalink / raw)
  To: thomas, david.marchand, Honnappa Nagarahalli, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

Hi all,

Would you please help review the patch series?
Thanks!

Best Regards,
Joyce

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Joyce Kong
> Sent: Monday, August 2, 2021 6:19 PM
> To: thomas@monjalon.net; david.marchand@redhat.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; konstantin.ananyev@intel.com;
> rsanford@akamai.com; erik.g.carrillo@intel.com; olivier.matz@6wind.com;
> yipeng1.wang@intel.com; sameh.gobriel@intel.com;
> bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> anatoly.burakov@intel.com; andrew.rybchenko@oktetlabs.ru;
> jerinj@marvell.com; declan.doherty@intel.com; ciara.power@intel.com;
> xiaoyun.li@intel.com; nicolas.chautru@intel.com;
> maryam.tahhan@intel.com; reshma.pattan@intel.com;
> cristian.dumitrescu@intel.com
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app
> 
> Since atomic operations have been adopted in DPDK now[1], change
> rte_atomicNN_xxx APIs to compiler's atomic built-ins in app module[2].
> 
> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
> model/
> [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html
> 
> Joyce Kong (12):
>   test/pmd_perf: use compiler atomic builtins for polling sync
>   test/ring_perf: use compiler atomic builtins for lcores sync
>   test/timer: use compiler atomic builtins for sync
>   test/stack_perf: use compiler atomics for lcore sync
>   test/bpf: use compiler atomics for calculation
>   test/func_reentrancy: use compiler atomic for data sync
>   app/eventdev: use compiler atomic builtins for packets sync
>   app/crypto: use compiler atomic builtins for display sync
>   app/compress: use compiler atomic builtins for display sync
>   app/testpmd: use compiler atomic builtins for port sync
>   app/bbdev: use compiler atomics for thread sync
>   app: remove unnecessary include of atomic
> 
>  app/proc-info/main.c                          |   1 -
>  app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
>  .../comp_perf_test_common.h                   |   2 +-
>  .../comp_perf_test_cyclecount.c               |  10 +-
>  .../comp_perf_test_throughput.c               |  11 +-
>  .../comp_perf_test_verify.c                   |   6 +-
>  app/test-crypto-perf/cperf_test_latency.c     |   6 +-
>  .../cperf_test_pmd_cyclecount.c               |   9 +-
>  app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
>  app/test-crypto-perf/cperf_test_verify.c      |   9 +-
>  app/test-eventdev/evt_main.c                  |   1 -
>  app/test-eventdev/test_order_atq.c            |   4 +-
>  app/test-eventdev/test_order_common.c         |   4 +-
>  app/test-eventdev/test_order_common.h         |   8 +-
>  app/test-eventdev/test_order_queue.c          |   4 +-
>  app/test-pipeline/config.c                    |   1 -
>  app/test-pipeline/init.c                      |   1 -
>  app/test-pipeline/main.c                      |   1 -
>  app/test-pipeline/runtime.c                   |   1 -
>  app/test-pmd/cmdline.c                        |   1 -
>  app/test-pmd/config.c                         |   1 -
>  app/test-pmd/csumonly.c                       |   1 -
>  app/test-pmd/flowgen.c                        |   1 -
>  app/test-pmd/icmpecho.c                       |   1 -
>  app/test-pmd/iofwd.c                          |   1 -
>  app/test-pmd/macfwd.c                         |   1 -
>  app/test-pmd/macswap.c                        |   1 -
>  app/test-pmd/parameters.c                     |   1 -
>  app/test-pmd/rxonly.c                         |   1 -
>  app/test-pmd/testpmd.c                        |  75 ++++++----
>  app/test-pmd/txonly.c                         |   1 -
>  app/test/test_barrier.c                       |   1 -
>  app/test/test_bpf.c                           |  28 ++--
>  app/test/test_func_reentrancy.c               |  27 ++--
>  app/test/test_mbuf.c                          |   1 -
>  app/test/test_mp_secondary.c                  |   1 -
>  app/test/test_pmd_perf.c                      |  12 +-
>  app/test/test_ring.c                          |   1 -
>  app/test/test_ring_perf.c                     |   9 +-
>  app/test/test_stack_perf.c                    |  14 +-
>  app/test/test_timer.c                         |  28 ++--
>  app/test/test_timer_secondary.c               |   1 -
>  42 files changed, 213 insertions(+), 219 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-08 22:50   ` Honnappa Nagarahalli
  2021-11-10  6:10     ` Joyce Kong
  0 siblings, 1 reply; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-08 22:50 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for polling sync in
> pmd_perf test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_pmd_perf.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index
> 3a248d512c..41ef001b90 100644
> --- a/app/test/test_pmd_perf.c
> +++ b/app/test/test_pmd_perf.c
> @@ -10,7 +10,6 @@
>  #include <rte_cycles.h>
>  #include <rte_ethdev.h>
>  #include <rte_byteorder.h>
> -#include <rte_atomic.h>
>  #include <rte_malloc.h>
>  #include "packet_burst_generator.h"
>  #include "test.h"
> @@ -526,7 +525,7 @@ main_loop(__rte_unused void *args)
>  	return 0;
>  }
> 
> -static rte_atomic64_t start;
> +static uint64_t start;
> 
>  static inline int
>  poll_burst(void *args)
> @@ -564,8 +563,7 @@ poll_burst(void *args)
>  		num[portid] = pkt_per_port;
>  	}
> 
> -	while (!rte_atomic64_read(&start))
> -		;
> +	rte_wait_until_equal_64(&start, 1, __ATOMIC_RELAXED);
This needs to be ACQUIRE. Please see comments below for function 'exec_burst'.

> 
>  	cur_tsc = rte_rdtsc();
>  	while (total) {
> @@ -617,7 +615,7 @@ exec_burst(uint32_t flags, int lcore)
>  	pkt_per_port = MAX_TRAFFIC_BURST;
>  	num = pkt_per_port * conf->nb_ports;
> 
> -	rte_atomic64_init(&start);
> +	__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
> 
>  	/* start polling thread, but not actually poll yet */
>  	rte_eal_remote_launch(poll_burst,
> @@ -625,7 +623,7 @@ exec_burst(uint32_t flags, int lcore)
> 
>  	/* Only when polling first */
>  	if (flags == SC_BURST_POLL_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
This line can be combined with initialization of 'start'. i.e.
if (flags == SC_BURST_POLL_FIRST) {
	__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
} else {
	__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
}
It is ok to use RELAXED here as it is part of initialization and 'rte_eal_remote_launch' will ensure that the store to 'start' is visible to the worker thread.

> 
>  	/* start xmit */
>  	i = 0;
> @@ -642,7 +640,7 @@ exec_burst(uint32_t flags, int lcore)
> 
>  	/* only when polling second  */
>  	if (flags == SC_BURST_XMIT_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
This store should happen after transmitting the packets. It is better to use RELEASE here.

> 
>  	/* wait for polling finished */
>  	diff_tsc = rte_eal_wait_lcore(lcore);
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-11-09  5:43   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09  5:43 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for lcores sync in
> ring_perf test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Looks good.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_ring_perf.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index
> fd82e20412..2d8bb675a3 100644
> --- a/app/test/test_ring_perf.c
> +++ b/app/test/test_ring_perf.c
> @@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct
> rte_ring *r, const int esize)
>  	return 0;
>  }
> 
> -static rte_atomic32_t synchro;
> +static uint32_t synchro;
>  static uint64_t queue_count[RTE_MAX_LCORE];
> 
>  #define TIME_MS 100
> @@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const
> int esize)
> 
>  	/* wait synchro for workers */
>  	if (lcore != rte_get_main_lcore())
> -		while (rte_atomic32_read(&synchro) == 0)
> -			rte_pause();
> +		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
> 
>  	begin = rte_get_timer_cycles();
>  	while (time_diff < hz * TIME_MS / 1000) { @@ -398,12 +397,12 @@
> run_on_all_cores(struct rte_ring *r, const int esize)
>  		param.r = r;
> 
>  		/* clear synchro and start workers */
> -		rte_atomic32_set(&synchro, 0);
> +		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>  		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN)
> < 0)
>  			return -1;
> 
>  		/* start synchro and launch test on main */
> -		rte_atomic32_set(&synchro, 1);
> +		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  		lcore_f(&param);
> 
>  		rte_eal_mp_wait_lcore();
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-09 20:59   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 20:59 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync
> 
> Convert rte_atomic usages to compiler atomic built-ins for lcore_state and
> collisions sync.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_timer.c           | 28 ++++++++++++----------------
>  app/test/test_timer_secondary.c |  1 -
>  2 files changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/app/test/test_timer.c b/app/test/test_timer.c index
> a10b2fe9da..9a3e1b53e4 100644
> --- a/app/test/test_timer.c
> +++ b/app/test/test_timer.c
> @@ -102,7 +102,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_random.h>
>  #include <rte_malloc.h>
> @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
> 
>  /* Need to synchronize worker lcores through multiple steps. */  enum {
> WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING,
> WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
> +static uint16_t lcore_state[RTE_MAX_LCORE];
> 
>  static void
>  main_init_workers(void)
> @@ -211,7 +210,7 @@ main_init_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
> +		__atomic_store_n(&lcore_state[i], WORKER_WAITING,
> __ATOMIC_RELAXED);
>  	}
>  }
> 
> @@ -221,11 +220,10 @@ main_start_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
> +		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL,
> +__ATOMIC_RELAXED);
We need to use RELEASE order here. When this function is called in timer_stress2_main_loop, the main_lcore is releasing memory operations to worker cores.

>  	}
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_RUNNING)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING,
> +__ATOMIC_RELAXED);
This should be ACQUIRE order.

>  	}
>  }
> 
> @@ -235,8 +233,7 @@ main_wait_for_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_FINISHED)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED,
> +__ATOMIC_RELAXED);
We should use ACQUIRE here as the workers are releasing memory updates to the main lcore.

>  	}
>  }
> 
> @@ -245,9 +242,8 @@ worker_wait_to_start(void)  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	while (rte_atomic16_read(&lcore_state[lcore_id]) !=
> WORKER_RUN_SIGNAL)
> -		rte_pause();
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
> +	rte_wait_until_equal_16(&lcore_state[lcore_id],
> WORKER_RUN_SIGNAL, __ATOMIC_RELAXED);
Since the worker threads will use the memory operations that happened in main_lcore, we need ACQUIRE memory ordering here.

> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING,
> +__ATOMIC_RELAXED);
This should be RELEASE memory order.

>  }
> 
>  static void
> @@ -255,7 +251,7 @@ worker_finish(void)
>  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED,
> +__ATOMIC_RELAXED);
Should be RELEASE as workers are releasing updates to the main lcore.

>  }
> 
> 
> @@ -281,12 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  	unsigned int lcore_id = rte_lcore_id();
>  	unsigned int main_lcore = rte_get_main_lcore();
>  	int32_t my_collisions = 0;
> -	static rte_atomic32_t collisions;
> +	static uint32_t collisions;
> 
>  	if (lcore_id == main_lcore) {
>  		cb_count = 0;
>  		test_failed = 0;
> -		rte_atomic32_set(&collisions, 0);
> +		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
>  		main_init_workers();
I think the existing code is probably incorrect in calling 'main_init_workers' in this function. This should be called outside of ' timer_stress2_main_loop' so that lcore_state is guaranteed to be initialized correctly before the threads are launched.
Please call 'main_init_workers()' in ' test_timer' function.

>  		timers = rte_malloc(NULL, sizeof(*timers) *
> NB_STRESS2_TIMERS, 0);
>  		if (timers == NULL) {
> @@ -315,7 +311,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  			my_collisions++;
>  	}
>  	if (my_collisions != 0)
> -		rte_atomic32_add(&collisions, my_collisions);
> +		__atomic_fetch_add(&collisions, my_collisions,
> __ATOMIC_RELAXED);
> 
>  	/* wait long enough for timers to expire */
>  	rte_delay_ms(100);
> @@ -329,7 +325,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
> 
>  	/* now check that we get the right number of callbacks */
>  	if (lcore_id == main_lcore) {
> -		my_collisions = rte_atomic32_read(&collisions);
> +		my_collisions = __atomic_load_n(&collisions,
> __ATOMIC_RELAXED);
>  		if (my_collisions != 0)
>  			printf("- %d timer reset collisions (OK)\n",
> my_collisions);
>  		rte_timer_manage();
> diff --git a/app/test/test_timer_secondary.c
> b/app/test/test_timer_secondary.c index 16a9f1878b..5795c97f07 100644
> --- a/app/test/test_timer_secondary.c
> +++ b/app/test/test_timer_secondary.c
> @@ -9,7 +9,6 @@
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_memzone.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_cycles.h>
>  #include <rte_mempool.h>
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-11-09 21:12   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 21:12 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for lcore sync in
> stack_perf test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Looks good
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_stack_perf.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c index
> 4ee40d5d19..1eae00a334 100644
> --- a/app/test/test_stack_perf.c
> +++ b/app/test/test_stack_perf.c
> @@ -6,7 +6,6 @@
>  #include <stdio.h>
>  #include <inttypes.h>
> 
> -#include <rte_atomic.h>
>  #include <rte_cycles.h>
>  #include <rte_launch.h>
>  #include <rte_pause.h>
> @@ -24,7 +23,7 @@
>   */
>  static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
> 
> -static rte_atomic32_t lcore_barrier;
> +static uint32_t lcore_barrier;
> 
>  struct lcore_pair {
>  	unsigned int c1;
> @@ -144,9 +143,8 @@ bulk_push_pop(void *p)
>  	s = args->s;
>  	size = args->sz;
> 
> -	rte_atomic32_sub(&lcore_barrier, 1);
> -	while (rte_atomic32_read(&lcore_barrier) != 0)
> -		rte_pause();
> +	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
> +	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
> 
>  	uint64_t start = rte_rdtsc();
> 
> @@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct
> rte_stack *s,
>  	unsigned int i;
> 
>  	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
> -		rte_atomic32_set(&lcore_barrier, 2);
> +		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
> 
>  		args[0].sz = args[1].sz = bulk_sizes[i];
>  		args[0].s = args[1].s = s;
> @@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t
> fn, int n)
>  		int cnt = 0;
>  		double avg;
> 
> -		rte_atomic32_set(&lcore_barrier, n);
> +		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
> 
>  		RTE_LCORE_FOREACH_WORKER(lcore_id) {
>  			if (++cnt >= n)
> @@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
>  	struct lcore_pair cores;
>  	struct rte_stack *s;
> 
> -	rte_atomic32_init(&lcore_barrier);
> +	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
> 
>  	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(),
> flags);
>  	if (s == NULL) {
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync Joyce Kong
@ 2021-11-09 21:54   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 21:54 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for data sync in
> func_reentrancy test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Looks good.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
> index 231c99a9eb..c00ecb8110 100644
> --- a/app/test/test_func_reentrancy.c
> +++ b/app/test/test_func_reentrancy.c
> @@ -20,7 +20,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_ring.h>
>  #include <rte_mempool.h>
> @@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
> 
>  #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
> 
> -static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0); -static
> rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
> +static uint32_t obj_count;
> +static uint32_t synchro;
> 
>  #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
>  	if (lcore_self != rte_get_main_lcore())                  \
> -		while (rte_atomic32_read(&synchro) == 0);        \
> +		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
>  } while(0)
> 
>  /*
> @@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
> 
>  	WAIT_SYNCHRO_FOR_WORKERS();
> 
> -	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
> +	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the
> check
> +in the caller */
>  	if (rte_eal_init(0, NULL) != -1)
>  		return -1;
> 
> @@ -112,7 +111,7 @@ ring_create_lookup(__rte_unused void *arg)
>  	for (i = 0; i < MAX_ITER_ONCE; i++) {
>  		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY,
> 0);
>  		if (rp != NULL)
> -			rte_atomic32_inc(&obj_count);
> +			__atomic_fetch_add(&obj_count, 1,
> __ATOMIC_RELAXED);
>  	}
> 
>  	/* create/lookup new ring several times */ @@ -176,7 +175,7 @@
> mempool_create_lookup(__rte_unused void *arg)
>  					my_obj_init, NULL,
>  					SOCKET_ID_ANY, 0);
>  		if (mp != NULL)
> -			rte_atomic32_inc(&obj_count);
> +			__atomic_fetch_add(&obj_count, 1,
> __ATOMIC_RELAXED);
>  	}
> 
>  	/* create/lookup new ring several times */ @@ -239,7 +238,7 @@
> hash_create_free(__rte_unused void *arg)
>  	for (i = 0; i < MAX_ITER_ONCE; i++) {
>  		handle = rte_hash_create(&hash_params);
>  		if (handle != NULL)
> -			rte_atomic32_inc(&obj_count);
> +			__atomic_fetch_add(&obj_count, 1,
> __ATOMIC_RELAXED);
>  	}
> 
>  	/* create mutiple times simultaneously */ @@ -303,7 +302,7 @@
> fbk_create_free(__rte_unused void *arg)
>  	for (i = 0; i < MAX_ITER_ONCE; i++) {
>  		handle = rte_fbk_hash_create(&fbk_params);
>  		if (handle != NULL)
> -			rte_atomic32_inc(&obj_count);
> +			__atomic_fetch_add(&obj_count, 1,
> __ATOMIC_RELAXED);
>  	}
> 
>  	/* create mutiple fbk tables simultaneously */ @@ -365,7 +364,7 @@
> lpm_create_free(__rte_unused void *arg)
>  	for (i = 0; i < MAX_ITER_ONCE; i++) {
>  		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY,
> &config);
>  		if (lpm != NULL)
> -			rte_atomic32_inc(&obj_count);
> +			__atomic_fetch_add(&obj_count, 1,
> __ATOMIC_RELAXED);
>  	}
> 
>  	/* create mutiple fbk tables simultaneously */ @@ -427,8 +426,8 @@
> launch_test(struct test_case *pt_case)
>  	if (pt_case->func == NULL)
>  		return -1;
> 
> -	rte_atomic32_set(&obj_count, 0);
> -	rte_atomic32_set(&synchro, 0);
> +	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
> 
>  	RTE_LCORE_FOREACH_WORKER(lcore_id) {
>  		if (cores == 1)
> @@ -437,7 +436,7 @@ launch_test(struct test_case *pt_case)
>  		rte_eal_remote_launch(pt_case->func, pt_case->arg,
> lcore_id);
>  	}
> 
> -	rte_atomic32_set(&synchro, 1);
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> 
>  	if (pt_case->func(pt_case->arg) < 0)
>  		ret = -1;
> @@ -454,7 +453,7 @@ launch_test(struct test_case *pt_case)
>  			pt_case->clean(lcore_id);
>  	}
> 
> -	count = rte_atomic32_read(&obj_count);
> +	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
>  	if (count != 1) {
>  		printf("%s: common object allocated %d times (should be
> 1)\n",
>  			pt_case->name, count);
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-11-09 22:11   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 22:11 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> 
> Covert rte_atomic_test_and_set usage to compiler atomic CAS operation for
   ^^^^^^ Convert

> display sync in crypto cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
With the above typo fixed,
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
>  app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
>  app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
>  app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
>  4 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-
> perf/cperf_test_latency.c
> index 159fe8492b..5e73c75fba 100644
> --- a/app/test-crypto-perf/cperf_test_latency.c
> +++ b/app/test-crypto-perf/cperf_test_latency.c
> @@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
>  	uint8_t burst_size_idx = 0;
>  	uint32_t imix_idx = 0;
> 
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
> 
>  	if (ctx == NULL)
>  		return 0;
> @@ -308,7 +308,9 @@ cperf_latency_test_runner(void *arg)
>  		time_min = tunit*(double)(tsc_min) / tsc_hz;
> 
>  		if (ctx->options->csv) {
> -			if (rte_atomic16_test_and_set(&display_once))
> +			uint16_t exp = 0;
> +			if (__atomic_compare_exchange_n(&display_once,
> &exp, 1, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  				printf("\n# lcore, Buffer Size, Burst Size, Pakt
> Seq #, "
>  						"cycles, time (us)");
> 
> diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-
> crypto-perf/cperf_test_pmd_cyclecount.c
> index 844659aeca..a1de334efb 100644
> --- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
> +++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
> @@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
>  	state.lcore = rte_lcore_id();
>  	state.linearize = 0;
> 
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	static bool warmup = true;
> 
>  	/*
> @@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
>  			continue;
>  		}
> 
> +		uint16_t exp = 0;
>  		if (!opts->csv) {
> -			if (rte_atomic16_test_and_set(&display_once))
> +			if (__atomic_compare_exchange_n(&display_once,
> &exp, 1, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  				printf(PRETTY_HDR_FMT, "lcore id", "Buf
> Size",
>  						"Burst Size", "Enqueued",
>  						"Dequeued", "Enq Retries",
> @@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
>  					state.cycles_per_enq,
>  					state.cycles_per_deq);
>  		} else {
> -			if (rte_atomic16_test_and_set(&display_once))
> +			if (__atomic_compare_exchange_n(&display_once,
> &exp, 1, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
>  						"Burst Size", "Enqueued",
>  						"Dequeued", "Enq Retries",
> diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-
> perf/cperf_test_throughput.c
> index f6eb8cf259..1407007c6e 100644
> --- a/app/test-crypto-perf/cperf_test_throughput.c
> +++ b/app/test-crypto-perf/cperf_test_throughput.c
> @@ -106,7 +106,7 @@ cperf_throughput_test_runner(void *test_ctx)
>  	uint8_t burst_size_idx = 0;
>  	uint32_t imix_idx = 0;
> 
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
> 
>  	struct rte_crypto_op *ops[ctx->options->max_burst_size];
>  	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
> @@ -272,8 +272,10 @@ cperf_throughput_test_runner(void *test_ctx)
>  		double cycles_per_packet = ((double)tsc_duration /
>  				ctx->options->total_ops);
> 
> +		uint16_t exp = 0;
>  		if (!ctx->options->csv) {
> -			if (rte_atomic16_test_and_set(&display_once))
> +			if (__atomic_compare_exchange_n(&display_once,
> &exp, 1, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
> 
> 	printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
>  					"lcore id", "Buf Size", "Burst Size",
>  					"Enqueued", "Dequeued", "Failed
> Enq", @@ -293,7 +295,8 @@ cperf_throughput_test_runner(void *test_ctx)
>  					throughput_gbps,
>  					cycles_per_packet);
>  		} else {
> -			if (rte_atomic16_test_and_set(&display_once))
> +			if (__atomic_compare_exchange_n(&display_once,
> &exp, 1, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  				printf("#lcore id,Buffer Size(B),"
>  					"Burst
> Size,Enqueued,Dequeued,Failed Enq,"
>  					"Failed
> Deq,Ops(Millions),Throughput(Gbps),"
> diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-
> perf/cperf_test_verify.c
> index 2939aeaa93..0c053ad3c0 100644
> --- a/app/test-crypto-perf/cperf_test_verify.c
> +++ b/app/test-crypto-perf/cperf_test_verify.c
> @@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
>  	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
>  	uint64_t ops_failed = 0;
> 
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
> 
>  	uint64_t i;
>  	uint16_t ops_unused = 0;
> @@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
>  		ops_deqd_total += ops_deqd;
>  	}
> 
> +	uint16_t exp = 0;
>  	if (!ctx->options->csv) {
> -		if (rte_atomic16_test_and_set(&display_once))
> +		if (__atomic_compare_exchange_n(&display_once, &exp, 1,
> 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
> 
> 	printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
>  				"lcore id", "Buf Size", "Burst size",
>  				"Enqueued", "Dequeued", "Failed Enq", @@ -
> 401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
>  				ops_deqd_failed,
>  				ops_failed);
>  	} else {
> -		if (rte_atomic16_test_and_set(&display_once))
> +		if (__atomic_compare_exchange_n(&display_once, &exp, 1,
> 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
>  			printf("\n# lcore id, Buffer Size(B), "
>  				"Burst Size,Enqueued,Dequeued,Failed Enq,"
>  				"Failed Deq,Failed Ops\n");
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 09/12] app/compress: use compiler atomic builtins for display sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 09/12] app/compress: " Joyce Kong
@ 2021-11-09 22:59   ` Honnappa Nagarahalli
  2021-11-11  8:13     ` Joyce Kong
  0 siblings, 1 reply; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 22:59 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> 
> Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for
> display sync.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test-compress-perf/comp_perf_test_common.h     |  2 +-
>  app/test-compress-perf/comp_perf_test_cyclecount.c | 10 +++++++---
> app/test-compress-perf/comp_perf_test_throughput.c | 11 ++++++++---
>  app/test-compress-perf/comp_perf_test_verify.c     |  6 ++++--
>  4 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-
> compress-perf/comp_perf_test_common.h
> index 72705c6a2b..d039e5a29a 100644
> --- a/app/test-compress-perf/comp_perf_test_common.h
> +++ b/app/test-compress-perf/comp_perf_test_common.h
> @@ -14,7 +14,7 @@ struct cperf_mem_resources {
>  	uint16_t qp_id;
>  	uint8_t lcore_id;
> 
> -	rte_atomic16_t print_info_once;
> +	uint16_t print_info_once;
> 
>  	uint32_t total_bufs;
>  	uint8_t *compressed_data;
> diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-
> compress-perf/comp_perf_test_cyclecount.c
> index 55559a7d5a..e002e53bdf 100644
> --- a/app/test-compress-perf/comp_perf_test_cyclecount.c
> +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
> @@ -468,7 +468,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  	struct cperf_cyclecount_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	static rte_spinlock_t print_spinlock;
>  	int i;
> 
> @@ -488,10 +488,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -547,8 +549,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  	duration_setup_per_op = ctx->duration_op /
>  			(ctx->ver.mem.total_bufs * test_data->num_iter);
> 
> +	exp = 0;
>  	/* R E P O R T processing */
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
I think we can re-organize the current code which will avoid using atomic operations on 'display_once'. How about the following code?

rte_spinlock_lock(&print_spinlock);
if (display_once != 1) {
	display_once = 1;
	printf("\nLegend for the table\n".....
	<all other printfs>
}
printf....
printf....
printf....
rte_spinlock_unlock(&print_spinlock);

Rest of the changes look fine.

> 
>  		rte_spinlock_lock(&print_spinlock);
> 
> diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-
> compress-perf/comp_perf_test_throughput.c
> index 13922b658c..f587ad2ec3 100644
> --- a/app/test-compress-perf/comp_perf_test_throughput.c
> +++ b/app/test-compress-perf/comp_perf_test_throughput.c
> @@ -329,15 +329,18 @@ cperf_throughput_test_runner(void *test_ctx)
>  	struct cperf_benchmark_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	int i, ret = EXIT_SUCCESS;
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
> +
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -391,7 +394,9 @@ cperf_throughput_test_runner(void *test_ctx)
>  	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
>  			1000000000;
> 
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	exp = 0;
> +	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  		printf("\n%12s%6s%12s%17s%15s%16s\n",
>  			"lcore id", "Level", "Comp size", "Comp ratio [%]",
>  			"Comp [Gbps]", "Decomp [Gbps]");
> diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-
> compress-perf/comp_perf_test_verify.c
> index 5e13257b79..6a2497985b 100644
> --- a/app/test-compress-perf/comp_perf_test_verify.c
> +++ b/app/test-compress-perf/comp_perf_test_verify.c
> @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
>  	struct cperf_verify_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->options;
>  	int ret = EXIT_SUCCESS;
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	uint32_t lcore = rte_lcore_id();
> 
>  	ctx->mem.lcore_id = lcore;
> @@ -428,7 +428,9 @@ cperf_verify_test_runner(void *test_ctx)
>  			test_data->input_data_sz * 100;
> 
>  	if (!ctx->silent) {
> -		if (rte_atomic16_test_and_set(&display_once)) {
> +		uint16_t exp = 0;
> +		if (__atomic_compare_exchange_n(&display_once, &exp, 1,
> 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  			printf("%12s%6s%12s%17s\n",
>  			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
>  		}
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync Joyce Kong
@ 2021-11-09 23:14   ` Honnappa Nagarahalli
  2021-11-11  8:51     ` Joyce Kong
  0 siblings, 1 reply; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-09 23:14 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu, Yigit, Ferruh
  Cc: dev, nd, Honnappa Nagarahalli, nd

+ Ferruh

<snip>

Hi Joyce/Ferruh,
     I do not think the port_status changes need to be handled atomically. The changes to port_status do not seem to be happening from multiple threads. It seems to be getting modified during initialization or through the test pmd prompt.

Do we really need atomic operations on port_status?

> 
> Replace rte_atomic16_cmpset operation with compiler atomic CAS operation
> for port status sync in testpmd case.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test-pmd/testpmd.c | 75 +++++++++++++++++++++++++++---------------
>  1 file changed, 48 insertions(+), 27 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 6cbe9ba3c8..22579dd438 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -36,7 +36,6 @@
>  #include <rte_alarm.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_malloc.h>
> @@ -2302,6 +2301,7 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  	uint16_t peer_tx_port = pi;
>  	uint32_t manual = 1;
>  	uint32_t tx_exp = hairpin_mode & 0x10;
> +	uint16_t port_exp;
> 
>  	if (!(hairpin_mode & 0xf)) {
>  		peer_rx_port = pi;
> @@ -2347,10 +2347,11 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  		if (diag == 0)
>  			continue;
> 
> +		port_exp = RTE_PORT_HANDLING;
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_STOPPED, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0)
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2370,10 +2371,11 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  		if (diag == 0)
>  			continue;
> 
> +		port_exp = RTE_PORT_HANDLING;
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_STOPPED, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0)
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2444,6 +2446,7 @@ start_port(portid_t pid)
>  	queueid_t qi;
>  	struct rte_port *port;
>  	struct rte_eth_hairpin_cap cap;
> +	uint16_t port_exp;
> 
>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>  		return 0;
> @@ -2454,8 +2457,10 @@ start_port(portid_t pid)
> 
>  		need_check_link_status = 0;
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STOPPED,
> -						 RTE_PORT_HANDLING) == 0)
> {
> +		port_exp = RTE_PORT_STOPPED;
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_HANDLING, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0) {
>  			fprintf(stderr, "Port %d is now not stopped\n", pi);
>  			continue;
>  		}
> @@ -2487,8 +2492,10 @@ start_port(portid_t pid)
>  						     nb_txq + nb_hairpinq,
>  						     &(port->dev_conf));
>  			if (diag != 0) {
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +				port_exp = RTE_PORT_HANDLING;
> +				if (__atomic_compare_exchange_n(&(port-
> >port_status),
> +						&port_exp,
> RTE_PORT_STOPPED, 0,
> +						__ATOMIC_RELAXED,
> __ATOMIC_RELAXED) == 0)
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2518,10 +2525,11 @@ start_port(portid_t pid)
>  				if (diag == 0)
>  					continue;
> 
> +				port_exp = RTE_PORT_HANDLING;
>  				/* Fail to setup tx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (__atomic_compare_exchange_n(&(port-
> >port_status),
> +						&port_exp,
> RTE_PORT_STOPPED, 0,
> +						__ATOMIC_RELAXED,
> __ATOMIC_RELAXED) == 0)
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2570,10 +2578,11 @@ start_port(portid_t pid)
>  				if (diag == 0)
>  					continue;
> 
> +				port_exp = RTE_PORT_HANDLING;
>  				/* Fail to setup rx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (__atomic_compare_exchange_n(&(port-
> >port_status),
> +						&port_exp,
> RTE_PORT_STOPPED, 0,
> +						__ATOMIC_RELAXED,
> __ATOMIC_RELAXED) == 0)
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2607,17 +2616,21 @@ start_port(portid_t pid)
>  			fprintf(stderr, "Fail to start port %d: %s\n",
>  				pi, rte_strerror(-diag));
> 
> +			port_exp = RTE_PORT_HANDLING;
>  			/* Fail to setup rx queue, return */
> -			if (rte_atomic16_cmpset(&(port->port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +			if (__atomic_compare_exchange_n(&(port-
> >port_status),
> +					&port_exp, RTE_PORT_STOPPED, 0,
> +					__ATOMIC_RELAXED,
> __ATOMIC_RELAXED) == 0)
>  				fprintf(stderr,
>  					"Port %d can not be set back to
> stopped\n",
>  					pi);
>  			continue;
>  		}
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
> +		port_exp = RTE_PORT_HANDLING;
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_STARTED, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0)
>  			fprintf(stderr, "Port %d can not be set into started\n",
>  				pi);
> 
> @@ -2697,6 +2710,7 @@ stop_port(portid_t pid)
>  	int need_check_link_status = 0;
>  	portid_t peer_pl[RTE_MAX_ETHPORTS];
>  	int peer_pi;
> +	uint16_t port_exp;
> 
>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>  		return;
> @@ -2722,8 +2736,10 @@ stop_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STARTED,
> -						RTE_PORT_HANDLING) == 0)
> +		port_exp = RTE_PORT_STARTED;
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_HANDLING, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0)
>  			continue;
> 
>  		if (hairpin_mode & 0xf) {
> @@ -2749,8 +2765,10 @@ stop_port(portid_t pid)
>  			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port
> %u\n",
>  				pi);
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
> +		port_exp = RTE_PORT_HANDLING;
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_STOPPED, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)
> == 0)
>  			fprintf(stderr, "Port %d can not be set into
> stopped\n",
>  				pi);
>  		need_check_link_status = 1;
> @@ -2788,6 +2806,7 @@ close_port(portid_t pid)  {
>  	portid_t pi;
>  	struct rte_port *port;
> +	uint16_t port_exp;
> 
>  	if (port_id_is_invalid(pid, ENABLED_WARN))
>  		return;
> @@ -2813,8 +2832,10 @@ close_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
> +		port_exp = RTE_PORT_CLOSED;
> +		if (__atomic_compare_exchange_n(&(port->port_status),
> +				&port_exp, RTE_PORT_CLOSED, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  			fprintf(stderr, "Port %d is already closed\n", pi);
>  			continue;
>  		}
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-08 22:50   ` Honnappa Nagarahalli
@ 2021-11-10  6:10     ` Joyce Kong
  0 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-11-10  6:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

> <snip>
> 
> >
> > Convert rte_atomic usages to compiler atomic built-ins for polling
> > sync in pmd_perf test cases.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  app/test/test_pmd_perf.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index
> > 3a248d512c..41ef001b90 100644
> > --- a/app/test/test_pmd_perf.c
> > +++ b/app/test/test_pmd_perf.c
> > @@ -10,7 +10,6 @@
> >  #include <rte_cycles.h>
> >  #include <rte_ethdev.h>
> >  #include <rte_byteorder.h>
> > -#include <rte_atomic.h>
> >  #include <rte_malloc.h>
> >  #include "packet_burst_generator.h"
> >  #include "test.h"
> > @@ -526,7 +525,7 @@ main_loop(__rte_unused void *args)
> >  	return 0;
> >  }
> >
> > -static rte_atomic64_t start;
> > +static uint64_t start;
> >
> >  static inline int
> >  poll_burst(void *args)
> > @@ -564,8 +563,7 @@ poll_burst(void *args)
> >  		num[portid] = pkt_per_port;
> >  	}
> >
> > -	while (!rte_atomic64_read(&start))
> > -		;
> > +	rte_wait_until_equal_64(&start, 1, __ATOMIC_RELAXED);
> This needs to be ACQUIRE. Please see comments below for function
> 'exec_burst'.
> 
> >
> >  	cur_tsc = rte_rdtsc();
> >  	while (total) {
> > @@ -617,7 +615,7 @@ exec_burst(uint32_t flags, int lcore)
> >  	pkt_per_port = MAX_TRAFFIC_BURST;
> >  	num = pkt_per_port * conf->nb_ports;
> >
> > -	rte_atomic64_init(&start);
> > +	__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
> >
> >  	/* start polling thread, but not actually poll yet */
> >  	rte_eal_remote_launch(poll_burst,
> > @@ -625,7 +623,7 @@ exec_burst(uint32_t flags, int lcore)
> >
> >  	/* Only when polling first */
> >  	if (flags == SC_BURST_POLL_FIRST)
> > -		rte_atomic64_set(&start, 1);
> > +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
> This line can be combined with initialization of 'start'. i.e.
> if (flags == SC_BURST_POLL_FIRST) {
> 	__atomic_store_n(&start, 1, __ATOMIC_RELAXED); } else {
> 	__atomic_store_n(&start, 0, __ATOMIC_RELAXED); } It is ok to use
> RELAXED here as it is part of initialization and 'rte_eal_remote_launch' will
> ensure that the store to 'start' is visible to the worker thread.
> 
> >
> >  	/* start xmit */
> >  	i = 0;
> > @@ -642,7 +640,7 @@ exec_burst(uint32_t flags, int lcore)
> >
> >  	/* only when polling second  */
> >  	if (flags == SC_BURST_XMIT_FIRST)
> > -		rte_atomic64_set(&start, 1);
> > +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
> This store should happen after transmitting the packets. It is better to use
> RELEASE here.

Thanks, Honnappa, will make the changes in next version.
Joyce

> 
> >
> >  	/* wait for polling finished */
> >  	diff_tsc = rte_eal_wait_lcore(lcore);
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync Joyce Kong
@ 2021-11-10 21:25   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-10 21:25 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> Subject: [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync
There are multiple fields which are getting synchronized. How about the following:
app/bbdev: use compiler atomics for synchronizing shared data

> 
> Convert rte_atomic usages to compiler atomic built-ins for thread params
> sync in bbdev cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Otherwise, looks good
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
>  1 file changed, 59 insertions(+), 76 deletions(-)
> 
> diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-
> bbdev/test_bbdev_perf.c
> index 469597b8b3..dc62e16216 100644
> --- a/app/test-bbdev/test_bbdev_perf.c
> +++ b/app/test-bbdev/test_bbdev_perf.c
> @@ -133,7 +133,7 @@ struct test_op_params {
>  	uint16_t num_to_process;
>  	uint16_t num_lcores;
>  	int vector_mask;
> -	rte_atomic16_t sync;
> +	uint16_t sync;
>  	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
>  };
> 
> @@ -148,9 +148,9 @@ struct thread_params {
>  	uint8_t iter_count;
>  	double iter_average;
>  	double bler;
> -	rte_atomic16_t nb_dequeued;
> -	rte_atomic16_t processing_status;
> -	rte_atomic16_t burst_sz;
> +	uint16_t nb_dequeued;
> +	int16_t processing_status;
> +	uint16_t burst_sz;
>  	struct test_op_params *op_params;
>  	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
>  	struct rte_bbdev_enc_op *enc_ops[MAX_BURST]; @@ -2594,46
> +2594,46 @@ dequeue_event_callback(uint16_t dev_id,
>  	}
> 
>  	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
> -		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
> +		__atomic_store_n(&tp->processing_status, TEST_FAILED,
> +__ATOMIC_RELAXED);
>  		printf(
>  			"Dequeue interrupt handler called for incorrect
> event!\n");
>  		return;
>  	}
> 
> -	burst_sz = rte_atomic16_read(&tp->burst_sz);
> +	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
>  	num_ops = tp->op_params->num_to_process;
> 
>  	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
>  		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
>  				&tp->dec_ops[
> -					rte_atomic16_read(&tp-
> >nb_dequeued)],
> +					__atomic_load_n(&tp->nb_dequeued,
> __ATOMIC_RELAXED)],
>  				burst_sz);
>  	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
>  		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
>  				&tp->dec_ops[
> -					rte_atomic16_read(&tp-
> >nb_dequeued)],
> +					__atomic_load_n(&tp->nb_dequeued,
> __ATOMIC_RELAXED)],
>  				burst_sz);
>  	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
>  		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
>  				&tp->enc_ops[
> -					rte_atomic16_read(&tp-
> >nb_dequeued)],
> +					__atomic_load_n(&tp->nb_dequeued,
> __ATOMIC_RELAXED)],
>  				burst_sz);
>  	else /*RTE_BBDEV_OP_TURBO_ENC*/
>  		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
>  				&tp->enc_ops[
> -					rte_atomic16_read(&tp-
> >nb_dequeued)],
> +					__atomic_load_n(&tp->nb_dequeued,
> __ATOMIC_RELAXED)],
>  				burst_sz);
> 
>  	if (deq < burst_sz) {
>  		printf(
>  			"After receiving the interrupt all operations should be
> dequeued. Expected: %u, got: %u\n",
>  			burst_sz, deq);
> -		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
> +		__atomic_store_n(&tp->processing_status, TEST_FAILED,
> +__ATOMIC_RELAXED);
>  		return;
>  	}
> 
> -	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
> -		rte_atomic16_add(&tp->nb_dequeued, deq);
> +	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq
> < num_ops) {
> +		__atomic_fetch_add(&tp->nb_dequeued, deq,
> __ATOMIC_RELAXED);
>  		return;
>  	}
> 
> @@ -2670,7 +2670,7 @@ dequeue_event_callback(uint16_t dev_id,
> 
>  	if (ret) {
>  		printf("Buffers validation failed\n");
> -		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
> +		__atomic_store_n(&tp->processing_status, TEST_FAILED,
> +__ATOMIC_RELAXED);
>  	}
> 
>  	switch (test_vector.op_type) {
> @@ -2691,7 +2691,7 @@ dequeue_event_callback(uint16_t dev_id,
>  		break;
>  	default:
>  		printf("Unknown op type: %d\n", test_vector.op_type);
> -		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
> +		__atomic_store_n(&tp->processing_status, TEST_FAILED,
> +__ATOMIC_RELAXED);
>  		return;
>  	}
> 
> @@ -2700,7 +2700,7 @@ dequeue_event_callback(uint16_t dev_id,
>  	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
>  			((double)total_time / (double)rte_get_tsc_hz());
> 
> -	rte_atomic16_add(&tp->nb_dequeued, deq);
> +	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
>  }
> 
>  static int
> @@ -2738,11 +2738,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	rte_atomic16_clear(&tp->processing_status);
> -	rte_atomic16_clear(&tp->nb_dequeued);
> +	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
>  				num_to_process);
> @@ -2790,17 +2789,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
>  			 * the number of operations is not a multiple of
>  			 * burst size.
>  			 */
> -			rte_atomic16_set(&tp->burst_sz, num_to_enq);
> +			__atomic_store_n(&tp->burst_sz, num_to_enq,
> __ATOMIC_RELAXED);
> 
>  			/* Wait until processing of previous batch is
>  			 * completed
>  			 */
> -			while (rte_atomic16_read(&tp->nb_dequeued) !=
> -					(int16_t) enqueued)
> -				rte_pause();
> +			rte_wait_until_equal_16(&tp->nb_dequeued,
> enqueued,
> +__ATOMIC_RELAXED);
>  		}
>  		if (j != TEST_REPETITIONS - 1)
> -			rte_atomic16_clear(&tp->nb_dequeued);
> +			__atomic_store_n(&tp->nb_dequeued, 0,
> __ATOMIC_RELAXED);
>  	}
> 
>  	return TEST_SUCCESS;
> @@ -2835,11 +2832,10 @@ throughput_intr_lcore_dec(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	rte_atomic16_clear(&tp->processing_status);
> -	rte_atomic16_clear(&tp->nb_dequeued);
> +	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
>  				num_to_process);
> @@ -2880,17 +2876,15 @@ throughput_intr_lcore_dec(void *arg)
>  			 * the number of operations is not a multiple of
>  			 * burst size.
>  			 */
> -			rte_atomic16_set(&tp->burst_sz, num_to_enq);
> +			__atomic_store_n(&tp->burst_sz, num_to_enq,
> __ATOMIC_RELAXED);
> 
>  			/* Wait until processing of previous batch is
>  			 * completed
>  			 */
> -			while (rte_atomic16_read(&tp->nb_dequeued) !=
> -					(int16_t) enqueued)
> -				rte_pause();
> +			rte_wait_until_equal_16(&tp->nb_dequeued,
> enqueued,
> +__ATOMIC_RELAXED);
>  		}
>  		if (j != TEST_REPETITIONS - 1)
> -			rte_atomic16_clear(&tp->nb_dequeued);
> +			__atomic_store_n(&tp->nb_dequeued, 0,
> __ATOMIC_RELAXED);
>  	}
> 
>  	return TEST_SUCCESS;
> @@ -2925,11 +2919,10 @@ throughput_intr_lcore_enc(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	rte_atomic16_clear(&tp->processing_status);
> -	rte_atomic16_clear(&tp->nb_dequeued);
> +	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
>  			num_to_process);
> @@ -2969,17 +2962,15 @@ throughput_intr_lcore_enc(void *arg)
>  			 * the number of operations is not a multiple of
>  			 * burst size.
>  			 */
> -			rte_atomic16_set(&tp->burst_sz, num_to_enq);
> +			__atomic_store_n(&tp->burst_sz, num_to_enq,
> __ATOMIC_RELAXED);
> 
>  			/* Wait until processing of previous batch is
>  			 * completed
>  			 */
> -			while (rte_atomic16_read(&tp->nb_dequeued) !=
> -					(int16_t) enqueued)
> -				rte_pause();
> +			rte_wait_until_equal_16(&tp->nb_dequeued,
> enqueued,
> +__ATOMIC_RELAXED);
>  		}
>  		if (j != TEST_REPETITIONS - 1)
> -			rte_atomic16_clear(&tp->nb_dequeued);
> +			__atomic_store_n(&tp->nb_dequeued, 0,
> __ATOMIC_RELAXED);
>  	}
> 
>  	return TEST_SUCCESS;
> @@ -3015,11 +3006,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	rte_atomic16_clear(&tp->processing_status);
> -	rte_atomic16_clear(&tp->nb_dequeued);
> +	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
>  			num_to_process);
> @@ -3061,17 +3051,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
>  			 * the number of operations is not a multiple of
>  			 * burst size.
>  			 */
> -			rte_atomic16_set(&tp->burst_sz, num_to_enq);
> +			__atomic_store_n(&tp->burst_sz, num_to_enq,
> __ATOMIC_RELAXED);
> 
>  			/* Wait until processing of previous batch is
>  			 * completed
>  			 */
> -			while (rte_atomic16_read(&tp->nb_dequeued) !=
> -					(int16_t) enqueued)
> -				rte_pause();
> +			rte_wait_until_equal_16(&tp->nb_dequeued,
> enqueued,
> +__ATOMIC_RELAXED);
>  		}
>  		if (j != TEST_REPETITIONS - 1)
> -			rte_atomic16_clear(&tp->nb_dequeued);
> +			__atomic_store_n(&tp->nb_dequeued, 0,
> __ATOMIC_RELAXED);
>  	}
> 
>  	return TEST_SUCCESS;
> @@ -3105,8 +3093,7 @@ throughput_pmd_lcore_dec(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq,
> num_ops);
>  	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
> @@ -3209,8 +3196,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq,
> num_ops);
>  	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
> @@ -3339,8 +3325,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq,
> num_ops);
>  	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
> @@ -3456,8 +3441,7 @@ throughput_pmd_lcore_enc(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
>  			num_ops);
> @@ -3547,8 +3531,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
> 
>  	bufs = &tp->op_params-
> >q_bufs[GET_SOCKET(info.socket_id)][queue_id];
> 
> -	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
> -		rte_pause();
> +	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START,
> +__ATOMIC_RELAXED);
> 
>  	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
>  			num_ops);
> @@ -3731,7 +3714,7 @@ bler_test(struct active_device *ad,
>  	else
>  		return TEST_SKIPPED;
> 
> -	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
> +	__atomic_store_n(&op_params->sync, SYNC_WAIT,
> __ATOMIC_RELAXED);
> 
>  	/* Main core is set at first entry */
>  	t_params[0].dev_id = ad->dev_id;
> @@ -3754,7 +3737,7 @@ bler_test(struct active_device *ad,
>  				&t_params[used_cores++], lcore_id);
>  	}
> 
> -	rte_atomic16_set(&op_params->sync, SYNC_START);
> +	__atomic_store_n(&op_params->sync, SYNC_START,
> __ATOMIC_RELAXED);
>  	ret = bler_function(&t_params[0]);
> 
>  	/* Main core is always used */
> @@ -3849,7 +3832,7 @@ throughput_test(struct active_device *ad,
>  			throughput_function = throughput_pmd_lcore_enc;
>  	}
> 
> -	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
> +	__atomic_store_n(&op_params->sync, SYNC_WAIT,
> __ATOMIC_RELAXED);
> 
>  	/* Main core is set at first entry */
>  	t_params[0].dev_id = ad->dev_id;
> @@ -3872,7 +3855,7 @@ throughput_test(struct active_device *ad,
>  				&t_params[used_cores++], lcore_id);
>  	}
> 
> -	rte_atomic16_set(&op_params->sync, SYNC_START);
> +	__atomic_store_n(&op_params->sync, SYNC_START,
> __ATOMIC_RELAXED);
>  	ret = throughput_function(&t_params[0]);
> 
>  	/* Main core is always used */
> @@ -3902,29 +3885,29 @@ throughput_test(struct active_device *ad,
>  	 * Wait for main lcore operations.
>  	 */
>  	tp = &t_params[0];
> -	while ((rte_atomic16_read(&tp->nb_dequeued) <
> -			op_params->num_to_process) &&
> -			(rte_atomic16_read(&tp->processing_status) !=
> -			TEST_FAILED))
> +	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
> +		op_params->num_to_process) &&
> +		(__atomic_load_n(&tp->processing_status,
> __ATOMIC_RELAXED) !=
> +		TEST_FAILED))
>  		rte_pause();
> 
>  	tp->ops_per_sec /= TEST_REPETITIONS;
>  	tp->mbps /= TEST_REPETITIONS;
> -	ret |= (int)rte_atomic16_read(&tp->processing_status);
> +	ret |= (int)__atomic_load_n(&tp->processing_status,
> __ATOMIC_RELAXED);
> 
>  	/* Wait for worker lcores operations */
>  	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
>  		tp = &t_params[used_cores];
> 
> -		while ((rte_atomic16_read(&tp->nb_dequeued) <
> -				op_params->num_to_process) &&
> -				(rte_atomic16_read(&tp->processing_status)
> !=
> -				TEST_FAILED))
> +		while ((__atomic_load_n(&tp->nb_dequeued,
> __ATOMIC_RELAXED) <
> +			op_params->num_to_process) &&
> +			(__atomic_load_n(&tp->processing_status,
> __ATOMIC_RELAXED) !=
> +			TEST_FAILED))
>  			rte_pause();
> 
>  		tp->ops_per_sec /= TEST_REPETITIONS;
>  		tp->mbps /= TEST_REPETITIONS;
> -		ret |= (int)rte_atomic16_read(&tp->processing_status);
> +		ret |= (int)__atomic_load_n(&tp->processing_status,
> +__ATOMIC_RELAXED);
>  	}
> 
>  	/* Print throughput if test passed */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync
  2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync Joyce Kong
@ 2021-11-10 23:19   ` Honnappa Nagarahalli
  2021-11-11  7:27     ` Joyce Kong
  0 siblings, 1 reply; 28+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-10 23:19 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

<snip>

> Subject: [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for
> packets sync
How about the following:
app/eventdev: use compiler atomic builtins for shared data synchronization

> 
> Convert rte_atomic usages to compiler atomic built-ins for outstanding_pkts
> sync in eventdev cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test-eventdev/evt_main.c          | 1 -
>  app/test-eventdev/test_order_atq.c    | 4 ++--
>  app/test-eventdev/test_order_common.c | 4 ++--  app/test-
> eventdev/test_order_common.h | 8 ++++----  app/test-
> eventdev/test_order_queue.c  | 4 ++--
>  5 files changed, 10 insertions(+), 11 deletions(-)
> 

<snip>

> diff --git a/app/test-eventdev/test_order_common.h b/app/test-
> eventdev/test_order_common.h
> index cd9d6009ec..1507265928 100644
> --- a/app/test-eventdev/test_order_common.h
> +++ b/app/test-eventdev/test_order_common.h
> @@ -48,7 +48,7 @@ struct test_order {
>  	 * The atomic_* is an expensive operation,Since it is a functional test,
>  	 * We are using the atomic_ operation to reduce the code complexity.
>  	 */
> -	rte_atomic64_t outstand_pkts;
> +	uint64_t outstand_pkts;
>  	enum evt_test_result result;
>  	uint32_t nb_flows;
>  	uint64_t nb_pkts;
> @@ -95,7 +95,7 @@ static __rte_always_inline void
> order_process_stage_1(struct test_order *const t,
>  		struct rte_event *const ev, const uint32_t nb_flows,
>  		uint32_t *const expected_flow_seq,
> -		rte_atomic64_t *const outstand_pkts)
> +		uint64_t *const outstand_pkts)
>  {
>  	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
>  	/* compare the seqn against expected value */ @@ -113,7 +113,7
> @@ order_process_stage_1(struct test_order *const t,
>  	 */
>  	expected_flow_seq[flow]++;
>  	rte_pktmbuf_free(ev->mbuf);
> -	rte_atomic64_sub(outstand_pkts, 1);
> +	__atomic_fetch_sub(outstand_pkts, 1, __ATOMIC_RELAXED);
>  }
> 
>  static __rte_always_inline void
> @@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const
> t,
>  	const uint8_t port = w->port_id;\
>  	const uint32_t nb_flows = t->nb_flows;\
>  	uint32_t *expected_flow_seq = t->expected_flow_seq;\
> -	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
> +	uint64_t *outstand_pkts = &t->outstand_pkts;\
We could use the atomic built-in to set this? We have been doing that in other places.

>  	if (opt->verbose_level > 1)\
>  		printf("%s(): lcore %d dev_id %d port=%d\n",\
>  			__func__, rte_lcore_id(), dev_id, port) diff --git

<snip>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync
  2021-11-10 23:19   ` Honnappa Nagarahalli
@ 2021-11-11  7:27     ` Joyce Kong
  0 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-11-11  7:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd, nd

> <snip>
> 
> > Subject: [PATCH v1 07/12] app/eventdev: use compiler atomic builtins
> > for packets sync
> How about the following:
> app/eventdev: use compiler atomic builtins for shared data synchronization
> 

Yes, please see next version.

> >
> > Convert rte_atomic usages to compiler atomic built-ins for
> > outstanding_pkts sync in eventdev cases.
> >
> 
> <snip>
> 
> > diff --git a/app/test-eventdev/test_order_common.h b/app/test-
> > eventdev/test_order_common.h index cd9d6009ec..1507265928 100644
> > --- a/app/test-eventdev/test_order_common.h
> > +++ b/app/test-eventdev/test_order_common.h
> > @@ -48,7 +48,7 @@ struct test_order {
> >  	 * The atomic_* is an expensive operation,Since it is a functional test,
> >  	 * We are using the atomic_ operation to reduce the code complexity.
> >  	 */
> > -	rte_atomic64_t outstand_pkts;
> > +	uint64_t outstand_pkts;
> >  	enum evt_test_result result;
> >  	uint32_t nb_flows;
> >  	uint64_t nb_pkts;
> > @@ -95,7 +95,7 @@ static __rte_always_inline void
> > order_process_stage_1(struct test_order *const t,
> >  		struct rte_event *const ev, const uint32_t nb_flows,
> >  		uint32_t *const expected_flow_seq,
> > -		rte_atomic64_t *const outstand_pkts)
> > +		uint64_t *const outstand_pkts)
> >  {
> >  	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
> >  	/* compare the seqn against expected value */ @@ -113,7 +113,7
> @@
> > order_process_stage_1(struct test_order *const t,
> >  	 */
> >  	expected_flow_seq[flow]++;
> >  	rte_pktmbuf_free(ev->mbuf);
> > -	rte_atomic64_sub(outstand_pkts, 1);
> > +	__atomic_fetch_sub(outstand_pkts, 1, __ATOMIC_RELAXED);
> >  }
> >
> >  static __rte_always_inline void
> > @@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order
> > *const t,
> >  	const uint8_t port = w->port_id;\
> >  	const uint32_t nb_flows = t->nb_flows;\
> >  	uint32_t *expected_flow_seq = t->expected_flow_seq;\
> > -	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
> > +	uint64_t *outstand_pkts = &t->outstand_pkts;\
> We could use the atomic built-in to set this? We have been doing that in
> other places.

Here is an address operation, so atomic built-in may not be used. 

> 
> >  	if (opt->verbose_level > 1)\
> >  		printf("%s(): lcore %d dev_id %d port=%d\n",\
> >  			__func__, rte_lcore_id(), dev_id, port) diff --git
> 
> <snip>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v1 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-09 22:59   ` Honnappa Nagarahalli
@ 2021-11-11  8:13     ` Joyce Kong
  0 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-11-11  8:13 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu
  Cc: dev, nd

> <snip>
> 
> >
> > +	exp = 0;
> >  	/* R E P O R T processing */
> > -	if (rte_atomic16_test_and_set(&display_once)) {
> > +	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> > +			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
> I think we can re-organize the current code which will avoid using atomic
> operations on 'display_once'. How about the following code?
> 
> rte_spinlock_lock(&print_spinlock);
> if (display_once != 1) {
> 	display_once = 1;
> 	printf("\nLegend for the table\n".....
> 	<all other printfs>
> }
> printf....
> printf....
> printf....
> rte_spinlock_unlock(&print_spinlock);
> 
> Rest of the changes look fine.
> 

Will send out the next version with the changes.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync
  2021-11-09 23:14   ` Honnappa Nagarahalli
@ 2021-11-11  8:51     ` Joyce Kong
  0 siblings, 0 replies; 28+ messages in thread
From: Joyce Kong @ 2021-11-11  8:51 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, david.marchand, Ruifeng Wang,
	konstantin.ananyev, rsanford, erik.g.carrillo, olivier.matz,
	yipeng1.wang, sameh.gobriel, bruce.richardson,
	vladimir.medvedkin, anatoly.burakov, andrew.rybchenko, jerinj,
	declan.doherty, ciara.power, xiaoyun.li, nicolas.chautru,
	maryam.tahhan, reshma.pattan, cristian.dumitrescu, Yigit, Ferruh
  Cc: dev, nd, nd

> + Ferruh
> 
> <snip>
> 
> Hi Joyce/Ferruh,
>      I do not think the port_status changes need to be handled atomically. The
> changes to port_status do not seem to be happening from multiple threads.
> It seems to be getting modified during initialization or through the test pmd
> prompt.
> 
> Do we really need atomic operations on port_status?
> 

Hi Honnappa,
Seem you are right, I shall remove the atomic operations for port_status.
Joyce

> >
> > Replace rte_atomic16_cmpset operation with compiler atomic CAS
> > operation for port status sync in testpmd case.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  app/test-pmd/testpmd.c | 75
> > +++++++++++++++++++++++++++---------------
> >  1 file changed, 48 insertions(+), 27 deletions(-)
> >


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-11-11  8:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02 10:18 [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
2021-11-08 22:50   ` Honnappa Nagarahalli
2021-11-10  6:10     ` Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
2021-11-09  5:43   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
2021-11-09 20:59   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
2021-11-09 21:12   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 06/12] test/func_reentrancy: use compiler atomic for data sync Joyce Kong
2021-11-09 21:54   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 07/12] app/eventdev: use compiler atomic builtins for packets sync Joyce Kong
2021-11-10 23:19   ` Honnappa Nagarahalli
2021-11-11  7:27     ` Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
2021-11-09 22:11   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 09/12] app/compress: " Joyce Kong
2021-11-09 22:59   ` Honnappa Nagarahalli
2021-11-11  8:13     ` Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 10/12] app/testpmd: use compiler atomic builtins for port sync Joyce Kong
2021-11-09 23:14   ` Honnappa Nagarahalli
2021-11-11  8:51     ` Joyce Kong
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 11/12] app/bbdev: use compiler atomics for thread sync Joyce Kong
2021-11-10 21:25   ` Honnappa Nagarahalli
2021-08-02 10:18 ` [dpdk-dev] [PATCH v1 12/12] app: remove unnecessary include of atomic Joyce Kong
2021-10-21  6:35 ` [dpdk-dev] [PATCH v1 00/12] use compiler atomic builtins for app Joyce Kong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).