DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v2 00/12] use compiler atomic builtins for app modules
@ 2021-11-16  9:41 Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong

Since atomic operations have been adopted in DPDK now[1],
change rte_atomicNN_xxx APIs to compiler's atomic built-ins
in app modules[2].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html

v2:
  By Honnappa Nagarahalli:
  1. Replace the RELAXED barriers with suitable ones for shared
     data sync in pmd_perf and timer test cases.
  2. Avoid unnecessary atomic operations in compress and testpmd
     modules.
  3. Fix some typo.

Joyce Kong (12):
  test/pmd_perf: use compiler atomic builtins for polling sync
  test/ring_perf: use compiler atomic builtins for lcores sync
  test/timer: use compiler atomic builtins for sync
  test/stack_perf: use compiler atomics for lcore sync
  test/bpf: use compiler atomics for calculation
  test/func_reentrancy: use compiler atomics for data sync
  app/eventdev: use compiler atomics for shared data sync
  app/crypto: use compiler atomic builtins for display sync
  app/compress: use compiler atomic builtins for display sync
  app/testpmd: remove atomic operations for port status
  app/bbdev: use compiler atomics for shared data sync
  app: remove unnecessary include of atomic header file

 app/proc-info/main.c                          |   1 -
 app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
 .../comp_perf_test_common.h                   |   2 +-
 .../comp_perf_test_cyclecount.c               |  15 +-
 .../comp_perf_test_throughput.c               |  10 +-
 .../comp_perf_test_verify.c                   |   6 +-
 app/test-crypto-perf/cperf_test_latency.c     |   6 +-
 .../cperf_test_pmd_cyclecount.c               |   9 +-
 app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
 app/test-crypto-perf/cperf_test_verify.c      |   9 +-
 app/test-eventdev/evt_main.c                  |   1 -
 app/test-eventdev/test_order_atq.c            |   4 +-
 app/test-eventdev/test_order_common.c         |   4 +-
 app/test-eventdev/test_order_common.h         |   8 +-
 app/test-eventdev/test_order_queue.c          |   4 +-
 app/test-pipeline/config.c                    |   1 -
 app/test-pipeline/init.c                      |   1 -
 app/test-pipeline/main.c                      |   1 -
 app/test-pipeline/runtime.c                   |   1 -
 app/test-pmd/cmdline.c                        |   1 -
 app/test-pmd/config.c                         |   1 -
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/icmpecho.c                       |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/parameters.c                     |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |  58 ++++----
 app/test-pmd/txonly.c                         |   1 -
 app/test/test_barrier.c                       |   1 -
 app/test/test_bpf.c                           |  28 ++--
 app/test/test_func_reentrancy.c               |  27 ++--
 app/test/test_mbuf.c                          |   1 -
 app/test/test_mp_secondary.c                  |   1 -
 app/test/test_pmd_perf.c                      |  14 +-
 app/test/test_ring.c                          |   1 -
 app/test/test_ring_perf.c                     |   9 +-
 app/test/test_stack_perf.c                    |  14 +-
 app/test/test_timer.c                         |  30 ++--
 app/test/test_timer_secondary.c               |   1 -
 42 files changed, 197 insertions(+), 226 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16 21:30   ` Honnappa Nagarahalli
  2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for polling sync in pmd_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_pmd_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 1df86ce080..546384a50d 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -10,7 +10,6 @@
 #include <rte_cycles.h>
 #include <rte_ethdev.h>
 #include <rte_byteorder.h>
-#include <rte_atomic.h>
 #include <rte_malloc.h>
 #include "packet_burst_generator.h"
 #include "test.h"
@@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
 	return 0;
 }
 
-static rte_atomic64_t start;
+static uint64_t start;
 
 static inline int
 poll_burst(void *args)
@@ -563,8 +562,7 @@ poll_burst(void *args)
 		num[portid] = pkt_per_port;
 	}
 
-	while (!rte_atomic64_read(&start))
-		;
+	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
 
 	cur_tsc = rte_rdtsc();
 	while (total) {
@@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore)
 	pkt_per_port = MAX_TRAFFIC_BURST;
 	num = pkt_per_port * conf->nb_ports;
 
-	rte_atomic64_init(&start);
-
 	/* start polling thread, but not actually poll yet */
 	rte_eal_remote_launch(poll_burst,
 			      (void *)&pkt_per_port, lcore);
 
 	/* Only when polling first */
 	if (flags == SC_BURST_POLL_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
+	else
+		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
 
 	/* start xmit */
 	i = 0;
@@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* only when polling second  */
 	if (flags == SC_BURST_XMIT_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
 
 	/* wait for polling finished */
 	diff_tsc = rte_eal_wait_lcore(lcore);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcores sync in ring_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring_perf.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index fd82e20412..2d8bb675a3 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 	return 0;
 }
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 static uint64_t queue_count[RTE_MAX_LCORE];
 
 #define TIME_MS 100
@@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			rte_pause();
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
@@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize)
 		param.r = r;
 
 		/* clear synchro and start workers */
-		rte_atomic32_set(&synchro, 0);
+		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN) < 0)
 			return -1;
 
 		/* start synchro and launch test on main */
-		rte_atomic32_set(&synchro, 1);
+		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16 19:52   ` Honnappa Nagarahalli
  2021-11-16 20:20   ` David Marchand
  2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic
built-ins for lcore_state and collisions sync.

Also, move 'main_init_workers' outside of
'timer_stress2_main_loop' to guarantee lcore_state
initialized correctly before the threads launched.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_timer.c           | 30 +++++++++++++-----------------
 app/test/test_timer_secondary.c |  1 -
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/app/test/test_timer.c b/app/test/test_timer.c
index a10b2fe9da..c97e5c891c 100644
--- a/app/test/test_timer.c
+++ b/app/test/test_timer.c
@@ -102,7 +102,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_random.h>
 #include <rte_malloc.h>
@@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
 
 /* Need to synchronize worker lcores through multiple steps. */
 enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED };
-static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
+static uint16_t lcore_state[RTE_MAX_LCORE];
 
 static void
 main_init_workers(void)
@@ -211,7 +210,7 @@ main_init_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
+		__atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -221,11 +220,10 @@ main_start_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
+		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE);
 	}
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -235,8 +233,7 @@ main_wait_for_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -245,9 +242,8 @@ worker_wait_to_start(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL)
-		rte_pause();
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
+	rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE);
 }
 
 static void
@@ -255,7 +251,7 @@ worker_finish(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE);
 }
 
 
@@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
 	unsigned int lcore_id = rte_lcore_id();
 	unsigned int main_lcore = rte_get_main_lcore();
 	int32_t my_collisions = 0;
-	static rte_atomic32_t collisions;
+	static uint32_t collisions;
 
 	if (lcore_id == main_lcore) {
 		cb_count = 0;
 		test_failed = 0;
-		rte_atomic32_set(&collisions, 0);
-		main_init_workers();
+		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
 		timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0);
 		if (timers == NULL) {
 			printf("Test Failed\n");
@@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 			my_collisions++;
 	}
 	if (my_collisions != 0)
-		rte_atomic32_add(&collisions, my_collisions);
+		__atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED);
 
 	/* wait long enough for timers to expire */
 	rte_delay_ms(100);
@@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 
 	/* now check that we get the right number of callbacks */
 	if (lcore_id == main_lcore) {
-		my_collisions = rte_atomic32_read(&collisions);
+		my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED);
 		if (my_collisions != 0)
 			printf("- %d timer reset collisions (OK)\n", my_collisions);
 		rte_timer_manage();
@@ -573,6 +568,7 @@ test_timer(void)
 	/* run a second, slightly different set of stress tests */
 	printf("\nStart timer stress tests 2\n");
 	test_failed = 0;
+	main_init_workers();
 	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN);
 	rte_eal_mp_wait_lcore();
 	if (test_failed)
diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
index 16a9f1878b..5795c97f07 100644
--- a/app/test/test_timer_secondary.c
+++ b/app/test/test_timer_secondary.c
@@ -9,7 +9,6 @@
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_memzone.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_cycles.h>
 #include <rte_mempool.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (2 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcore sync in stack_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_stack_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c
index 4ee40d5d19..1eae00a334 100644
--- a/app/test/test_stack_perf.c
+++ b/app/test/test_stack_perf.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <inttypes.h>
 
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_launch.h>
 #include <rte_pause.h>
@@ -24,7 +23,7 @@
  */
 static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
 
-static rte_atomic32_t lcore_barrier;
+static uint32_t lcore_barrier;
 
 struct lcore_pair {
 	unsigned int c1;
@@ -144,9 +143,8 @@ bulk_push_pop(void *p)
 	s = args->s;
 	size = args->sz;
 
-	rte_atomic32_sub(&lcore_barrier, 1);
-	while (rte_atomic32_read(&lcore_barrier) != 0)
-		rte_pause();
+	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	uint64_t start = rte_rdtsc();
 
@@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s,
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
-		rte_atomic32_set(&lcore_barrier, 2);
+		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
 
 		args[0].sz = args[1].sz = bulk_sizes[i];
 		args[0].s = args[1].s = s;
@@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n)
 		int cnt = 0;
 		double avg;
 
-		rte_atomic32_set(&lcore_barrier, n);
+		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
 
 		RTE_LCORE_FOREACH_WORKER(lcore_id) {
 			if (++cnt >= n)
@@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
 	struct lcore_pair cores;
 	struct rte_stack *s;
 
-	rte_atomic32_init(&lcore_barrier);
+	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 05/12] test/bpf: use compiler atomics for calculation
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (3 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for calculation in bpf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_bpf.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e3e9a1b0b5..b8be1e3d30 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg)
 	memset(&dfe, 0, sizeof(dfe));
 
 	rv = 1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = -1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = (int32_t)TEST_FILL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_3;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (4 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
@ 2021-11-16  9:41 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:41 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko, Bruce Richardson,
	Vladimir Medvedkin, Yipeng Wang, Sameh Gobriel, Anatoly Burakov,
	Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in func_reentrancy test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 838ab6f0f9..7825c6cb86 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
@@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
 
 #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
 
-static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
-static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
+static uint32_t obj_count;
+static uint32_t synchro;
 
 #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
 	if (lcore_self != rte_get_main_lcore())                  \
-		while (rte_atomic32_read(&synchro) == 0);        \
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
 } while(0)
 
 /*
@@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
 
 	WAIT_SYNCHRO_FOR_WORKERS();
 
-	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
+	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */
 	if (rte_eal_init(0, NULL) != -1)
 		return -1;
 
@@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
 		if (rp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg)
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
 		if (mp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_hash_create(&hash_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple times simultaneously */
@@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_fbk_hash_create(&fbk_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, &config);
 		if (lpm != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case)
 	if (pt_case->func == NULL)
 		return -1;
 
-	rte_atomic32_set(&obj_count, 0);
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	cores = RTE_MIN(rte_lcore_count(), MAX_LCORES);
 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
@@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case)
 		rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
 	}
 
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	if (pt_case->func(pt_case->arg) < 0)
 		ret = -1;
@@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case)
 			pt_case->clean(lcore_id);
 	}
 
-	count = rte_atomic32_read(&obj_count);
+	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
 	if (count != 1) {
 		printf("%s: common object allocated %d times (should be 1)\n",
 			pt_case->name, count);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 07/12] app/eventdev: use compiler atomics for shared data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (5 preceding siblings ...)
  2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in eventdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/evt_main.c          | 1 -
 app/test-eventdev/test_order_atq.c    | 4 ++--
 app/test-eventdev/test_order_common.c | 4 ++--
 app/test-eventdev/test_order_common.h | 8 ++++----
 app/test-eventdev/test_order_queue.c  | 4 ++--
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index 3534aabca7..194c980c7a 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -6,7 +6,6 @@
 #include <unistd.h>
 #include <signal.h>
 
-#include <rte_atomic.h>
 #include <rte_debug.h>
 #include <rte_eal.h>
 #include <rte_eventdev.h>
diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c
index 71215a07b6..2fee4b4daa 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c
index d7760061ba..ff7813f9c2 100644
--- a/app/test-eventdev/test_order_common.c
+++ b/app/test-eventdev/test_order_common.c
@@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt)
 		evt_err("failed to allocate t->expected_flow_seq memory");
 		goto exp_nomem;
 	}
-	rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts);
+	__atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED);
 	t->err = false;
 	t->nb_pkts = opt->nb_pkts;
 	t->nb_flows = opt->nb_flows;
@@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt,
 
 	while (t->err == false) {
 		uint64_t new_cycles = rte_get_timer_cycles();
-		int64_t remaining = rte_atomic64_read(&t->outstand_pkts);
+		int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED);
 
 		if (remaining <= 0) {
 			t->result = EVT_TEST_SUCCESS;
diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h
index cd9d6009ec..92781d9587 100644
--- a/app/test-eventdev/test_order_common.h
+++ b/app/test-eventdev/test_order_common.h
@@ -48,7 +48,7 @@ struct test_order {
 	 * The atomic_* is an expensive operation,Since it is a functional test,
 	 * We are using the atomic_ operation to reduce the code complexity.
 	 */
-	rte_atomic64_t outstand_pkts;
+	uint64_t outstand_pkts;
 	enum evt_test_result result;
 	uint32_t nb_flows;
 	uint64_t nb_pkts;
@@ -95,7 +95,7 @@ static __rte_always_inline void
 order_process_stage_1(struct test_order *const t,
 		struct rte_event *const ev, const uint32_t nb_flows,
 		uint32_t *const expected_flow_seq,
-		rte_atomic64_t *const outstand_pkts)
+		uint64_t *const outstand_pkts)
 {
 	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
 	/* compare the seqn against expected value */
@@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t,
 	 */
 	expected_flow_seq[flow]++;
 	rte_pktmbuf_free(ev->mbuf);
-	rte_atomic64_sub(outstand_pkts, 1);
+	__atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline void
@@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t,
 	const uint8_t port = w->port_id;\
 	const uint32_t nb_flows = t->nb_flows;\
 	uint32_t *expected_flow_seq = t->expected_flow_seq;\
-	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
+	uint64_t *outstand_pkts = &t->outstand_pkts;\
 	if (opt->verbose_level > 1)\
 		printf("%s(): lcore %d dev_id %d port=%d\n",\
 			__func__, rte_lcore_id(), dev_id, port)
diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c
index 621367805a..80eaea5cf5 100644
--- a/app/test-eventdev/test_order_queue.c
+++ b/app/test-eventdev/test_order_queue.c
@@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (6 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Declan Doherty, Ciara Power
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync in crypto cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
 app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
 app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index 69f55de50a..ce49feaba9 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	if (ctx == NULL)
 		return 0;
@@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg)
 		time_max = tunit*(double)(tsc_max) / tsc_hz;
 		time_min = tunit*(double)(tsc_min) / tsc_hz;
 
+		uint16_t exp = 0;
 		if (ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, "
 						"cycles, time (us)");
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index fda97e8ab9..ba1f104f72 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 	state.lcore = rte_lcore_id();
 	state.linearize = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static bool warmup = true;
 
 	/*
@@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 			continue;
 		}
 
+		uint16_t exp = 0;
 		if (!opts->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(PRETTY_HDR_FMT, "lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
@@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 					state.cycles_per_enq,
 					state.cycles_per_deq);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 739ed9e573..51512af2ad 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	struct rte_crypto_op *ops[ctx->options->max_burst_size];
 	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
@@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx)
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
 
+		uint16_t exp = 0;
 		if (!ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 					"lcore id", "Buf Size", "Burst Size",
 					"Enqueued", "Dequeued", "Failed Enq",
@@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx)
 					throughput_gbps,
 					cycles_per_packet);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("#lcore id,Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 1962438034..496eb0de00 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
 	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 	uint64_t ops_failed = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	uint64_t i;
 	uint16_t ops_unused = 0;
@@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
 		ops_deqd_total += ops_deqd;
 	}
 
+	uint16_t exp = 0;
 	if (!ctx->options->csv) {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 				"lcore id", "Buf Size", "Burst size",
 				"Enqueued", "Dequeued", "Failed Enq",
@@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
 				ops_deqd_failed,
 				ops_failed);
 	} else {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("\n# lcore id, Buffer Size(B), "
 				"Burst Size,Enqueued,Dequeued,Failed Enq,"
 				"Failed Deq,Failed Ops\n");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (7 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 20:15   ` Honnappa Nagarahalli
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-compress-perf/comp_perf_test_common.h    |  2 +-
 .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
 .../comp_perf_test_throughput.c                   | 10 +++++++---
 app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h
index 72705c6a2b..d039e5a29a 100644
--- a/app/test-compress-perf/comp_perf_test_common.h
+++ b/app/test-compress-perf/comp_perf_test_common.h
@@ -14,7 +14,7 @@ struct cperf_mem_resources {
 	uint16_t qp_id;
 	uint8_t lcore_id;
 
-	rte_atomic16_t print_info_once;
+	uint16_t print_info_once;
 
 	uint32_t total_bufs;
 	uint8_t *compressed_data;
diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c
index c875ddbdac..da55b02b74 100644
--- a/app/test-compress-perf/comp_perf_test_cyclecount.c
+++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
@@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	struct cperf_cyclecount_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static rte_spinlock_t print_spinlock;
 	int i;
 
@@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED,  __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			(ctx->ver.mem.total_bufs * test_data->num_iter);
 
 	/* R E P O R T processing */
-	if (rte_atomic16_test_and_set(&display_once)) {
+	rte_spinlock_lock(&print_spinlock);
 
-		rte_spinlock_lock(&print_spinlock);
+	if (display_once == 0) {
+		display_once = 1;
 
 		printf("\nLegend for the table\n"
 		"  - Retries section: number of retries for the following operations:\n"
@@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			"setup/op",
 			"[C-e]", "[C-d]",
 			"[D-e]", "[D-d]");
-
-		rte_spinlock_unlock(&print_spinlock);
 	}
 
-	rte_spinlock_lock(&print_spinlock);
-
 	printf("%12u"
 	       "%6u"
 	       "%12zu"
diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c
index 13922b658c..d3dff070b0 100644
--- a/app/test-compress-perf/comp_perf_test_throughput.c
+++ b/app/test-compress-perf/comp_perf_test_throughput.c
@@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
 	struct cperf_benchmark_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	int i, ret = EXIT_SUCCESS;
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
 	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
 			1000000000;
 
-	if (rte_atomic16_test_and_set(&display_once)) {
+	exp = 0;
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 		printf("\n%12s%6s%12s%17s%15s%16s\n",
 			"lcore id", "Level", "Comp size", "Comp ratio [%]",
 			"Comp [Gbps]", "Decomp [Gbps]");
diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c
index 5e13257b79..f6e21368e8 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
 	struct cperf_verify_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->options;
 	int ret = EXIT_SUCCESS;
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	uint32_t lcore = rte_lcore_id();
 
 	ctx->mem.lcore_id = lcore;
@@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
 	ctx->ratio = (double) ctx->comp_data_sz /
 			test_data->input_data_sz * 100;
 
+	uint16_t exp = 0;
 	if (!ctx->silent) {
-		if (rte_atomic16_test_and_set(&display_once)) {
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			printf("%12s%6s%12s%17s\n",
 			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 10/12] app/testpmd: remove atomic operations for port status
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (8 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 21:34   ` Honnappa Nagarahalli
  2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

The port_status changes do not need to be handled
atomically, as they are modified during initialization
or through the testpmd prompt instead of multiple
threads.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a66dfb297c..ed472cacd2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -36,7 +36,6 @@
 #include <rte_alarm.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
@@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2729,8 +2728,9 @@ start_port(portid_t pid)
 
 		need_check_link_status = 0;
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED,
-						 RTE_PORT_HANDLING) == 0) {
+		if (port->port_status == RTE_PORT_STOPPED)
+			port->port_status = RTE_PORT_HANDLING;
+		else {
 			fprintf(stderr, "Port %d is now not stopped\n", pi);
 			continue;
 		}
@@ -2766,8 +2766,9 @@ start_port(portid_t pid)
 						     nb_txq + nb_hairpinq,
 						     &(port->dev_conf));
 			if (diag != 0) {
-				if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2828,9 +2829,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup tx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2880,9 +2881,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup rx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2917,16 +2918,18 @@ start_port(portid_t pid)
 				pi, rte_strerror(-diag));
 
 			/* Fail to setup rx queue, return */
-			if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+			if (port->port_status == RTE_PORT_HANDLING)
+				port->port_status = RTE_PORT_STOPPED;
+			else
 				fprintf(stderr,
 					"Port %d can not be set back to stopped\n",
 					pi);
 			continue;
 		}
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STARTED;
+		else
 			fprintf(stderr, "Port %d can not be set into started\n",
 				pi);
 
@@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED,
-						RTE_PORT_HANDLING) == 0)
+		if (port->port_status == RTE_PORT_STARTED)
+			port->port_status = RTE_PORT_HANDLING;
+		else
 			continue;
 
 		if (hairpin_mode & 0xf) {
@@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
 			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
 				pi);
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr, "Port %d can not be set into stopped\n",
 				pi);
 		need_check_link_status = 1;
@@ -3119,8 +3124,7 @@ close_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
+		if (port->port_status == RTE_PORT_CLOSED) {
 			fprintf(stderr, "Port %d is already closed\n", pi);
 			continue;
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (9 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in bbdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 7b4529789b..0fa119a502 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -133,7 +133,7 @@ struct test_op_params {
 	uint16_t num_to_process;
 	uint16_t num_lcores;
 	int vector_mask;
-	rte_atomic16_t sync;
+	uint16_t sync;
 	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
 };
 
@@ -148,9 +148,9 @@ struct thread_params {
 	uint8_t iter_count;
 	double iter_average;
 	double bler;
-	rte_atomic16_t nb_dequeued;
-	rte_atomic16_t processing_status;
-	rte_atomic16_t burst_sz;
+	uint16_t nb_dequeued;
+	int16_t processing_status;
+	uint16_t burst_sz;
 	struct test_op_params *op_params;
 	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
 	struct rte_bbdev_enc_op *enc_ops[MAX_BURST];
@@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id,
 	}
 
 	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		printf(
 			"Dequeue interrupt handler called for incorrect event!\n");
 		return;
 	}
 
-	burst_sz = rte_atomic16_read(&tp->burst_sz);
+	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
 	num_ops = tp->op_params->num_to_process;
 
 	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
 		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
 		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
 		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else /*RTE_BBDEV_OP_TURBO_ENC*/
 		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 
 	if (deq < burst_sz) {
 		printf(
 			"After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n",
 			burst_sz, deq);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
-	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
-		rte_atomic16_add(&tp->nb_dequeued, deq);
+	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) {
+		__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id,
 
 	if (ret) {
 		printf("Buffers validation failed\n");
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 	}
 
 	switch (test_vector.op_type) {
@@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id,
 		break;
 	default:
 		printf("Unknown op type: %d\n", test_vector.op_type);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id,
 	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
 			((double)total_time / (double)rte_get_tsc_hz());
 
-	rte_atomic16_add(&tp->nb_dequeued, deq);
+	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 }
 
 static int
@@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad,
 	else
 		return TEST_SKIPPED;
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = bler_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad,
 			throughput_function = throughput_pmd_lcore_enc;
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = throughput_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad,
 	 * Wait for main lcore operations.
 	 */
 	tp = &t_params[0];
-	while ((rte_atomic16_read(&tp->nb_dequeued) <
-			op_params->num_to_process) &&
-			(rte_atomic16_read(&tp->processing_status) !=
-			TEST_FAILED))
+	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+		op_params->num_to_process) &&
+		(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+		TEST_FAILED))
 		rte_pause();
 
 	tp->ops_per_sec /= TEST_REPETITIONS;
 	tp->mbps /= TEST_REPETITIONS;
-	ret |= (int)rte_atomic16_read(&tp->processing_status);
+	ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 
 	/* Wait for worker lcores operations */
 	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
 		tp = &t_params[used_cores];
 
-		while ((rte_atomic16_read(&tp->nb_dequeued) <
-				op_params->num_to_process) &&
-				(rte_atomic16_read(&tp->processing_status) !=
-				TEST_FAILED))
+		while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+			op_params->num_to_process) &&
+			(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+			TEST_FAILED))
 			rte_pause();
 
 		tp->ops_per_sec /= TEST_REPETITIONS;
 		tp->mbps /= TEST_REPETITIONS;
-		ret |= (int)rte_atomic16_read(&tp->processing_status);
+		ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 	}
 
 	/* Print throughput if test passed */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (10 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
@ 2021-11-16  9:42 ` Joyce Kong
  2021-11-16 20:23   ` David Marchand
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  12 siblings, 1 reply; 36+ messages in thread
From: Joyce Kong @ 2021-11-16  9:42 UTC (permalink / raw)
  To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Remove the unnecessary rte_atomic.h included in app modules.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/proc-info/main.c         | 1 -
 app/test-pipeline/config.c   | 1 -
 app/test-pipeline/init.c     | 1 -
 app/test-pipeline/main.c     | 1 -
 app/test-pipeline/runtime.c  | 1 -
 app/test-pmd/cmdline.c       | 1 -
 app/test-pmd/config.c        | 1 -
 app/test-pmd/csumonly.c      | 1 -
 app/test-pmd/flowgen.c       | 1 -
 app/test-pmd/icmpecho.c      | 1 -
 app/test-pmd/iofwd.c         | 1 -
 app/test-pmd/macfwd.c        | 1 -
 app/test-pmd/macswap.c       | 1 -
 app/test-pmd/parameters.c    | 1 -
 app/test-pmd/rxonly.c        | 1 -
 app/test-pmd/txonly.c        | 1 -
 app/test/test_barrier.c      | 1 -
 app/test/test_mbuf.c         | 1 -
 app/test/test_mp_secondary.c | 1 -
 app/test/test_ring.c         | 1 -
 20 files changed, 20 deletions(-)

diff --git a/app/proc-info/main.c b/app/proc-info/main.c
index a4271047e6..ebe2d77264 100644
--- a/app/proc-info/main.c
+++ b/app/proc-info/main.c
@@ -27,7 +27,6 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
 #include <rte_metrics.h>
diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c
index 33f3f1c827..daf838948b 100644
--- a/app/test-pipeline/config.c
+++ b/app/test-pipeline/config.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index c738019041..eee0719b67 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c
index 72e4797ff2..1e16794183 100644
--- a/app/test-pipeline/main.c
+++ b/app/test-pipeline/main.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 159192bcd8..d939a85d7e 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f51b259fe..4e93f535ff 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 26cadf39f7..d8b5032b58 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -27,7 +27,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8526d9158a..e0b00abe8c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index 5737eaa105..9ceef3b54a 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 8f1d68a83a..3a85ec3dd1 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -20,7 +20,6 @@
 #include <rte_cycles.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 83d098adcb..19cd920f70 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -23,7 +23,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memcpy.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index ac50d0b9f8..812a0c721f 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 310bca06af..4627ff83e9 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 0974b0a38f..2f4f944efa 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -30,7 +30,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_interrupts.h>
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index c78fc4609a..d1a579d8d8 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 34bb538379..b8497e733d 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c
index c27f8a0742..898c2516ed 100644
--- a/app/test/test_barrier.c
+++ b/app/test/test_barrier.c
@@ -24,7 +24,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index f93bcef8a9..d53126710f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 5b6f05dbb1..021ca0547f 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -28,7 +28,6 @@
 #include <rte_lcore.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
-#include <rte_atomic.h>
 #include <rte_ring.h>
 #include <rte_debug.h>
 #include <rte_log.h>
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fb8532a409..bde33ab4a1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-16 19:52   ` Honnappa Nagarahalli
  2021-11-16 20:20   ` David Marchand
  1 sibling, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 19:52 UTC (permalink / raw)
  To: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for lcore_state and
> collisions sync.
> 
> Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> guarantee lcore_state initialized correctly before the threads launched.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_timer.c           | 30 +++++++++++++-----------------
>  app/test/test_timer_secondary.c |  1 -
>  2 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/app/test/test_timer.c b/app/test/test_timer.c index
> a10b2fe9da..c97e5c891c 100644
> --- a/app/test/test_timer.c
> +++ b/app/test/test_timer.c
> @@ -102,7 +102,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_random.h>
>  #include <rte_malloc.h>
> @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
> 
>  /* Need to synchronize worker lcores through multiple steps. */  enum {
> WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING,
> WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
> +static uint16_t lcore_state[RTE_MAX_LCORE];
> 
>  static void
>  main_init_workers(void)
> @@ -211,7 +210,7 @@ main_init_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
> +		__atomic_store_n(&lcore_state[i], WORKER_WAITING,
> __ATOMIC_RELAXED);
>  	}
>  }
> 
> @@ -221,11 +220,10 @@ main_start_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
> +		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL,
> +__ATOMIC_RELEASE);
>  	}
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_RUNNING)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING,
> +__ATOMIC_ACQUIRE);
>  	}
>  }
> 
> @@ -235,8 +233,7 @@ main_wait_for_workers(void)
>  	unsigned i;
> 
>  	RTE_LCORE_FOREACH_WORKER(i) {
> -		while (rte_atomic16_read(&lcore_state[i]) !=
> WORKER_FINISHED)
> -			rte_pause();
> +		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED,
> +__ATOMIC_ACQUIRE);
>  	}
>  }
> 
> @@ -245,9 +242,8 @@ worker_wait_to_start(void)  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	while (rte_atomic16_read(&lcore_state[lcore_id]) !=
> WORKER_RUN_SIGNAL)
> -		rte_pause();
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
> +	rte_wait_until_equal_16(&lcore_state[lcore_id],
> WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING,
> +__ATOMIC_RELEASE);
>  }
> 
>  static void
> @@ -255,7 +251,7 @@ worker_finish(void)
>  {
>  	unsigned lcore_id = rte_lcore_id();
> 
> -	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
> +	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED,
> +__ATOMIC_RELEASE);
>  }
> 
> 
> @@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  	unsigned int lcore_id = rte_lcore_id();
>  	unsigned int main_lcore = rte_get_main_lcore();
>  	int32_t my_collisions = 0;
> -	static rte_atomic32_t collisions;
> +	static uint32_t collisions;
> 
>  	if (lcore_id == main_lcore) {
>  		cb_count = 0;
>  		test_failed = 0;
> -		rte_atomic32_set(&collisions, 0);
> -		main_init_workers();
> +		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
>  		timers = rte_malloc(NULL, sizeof(*timers) *
> NB_STRESS2_TIMERS, 0);
>  		if (timers == NULL) {
>  			printf("Test Failed\n");
> @@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
>  			my_collisions++;
>  	}
>  	if (my_collisions != 0)
> -		rte_atomic32_add(&collisions, my_collisions);
> +		__atomic_fetch_add(&collisions, my_collisions,
> __ATOMIC_RELAXED);
> 
>  	/* wait long enough for timers to expire */
>  	rte_delay_ms(100);
> @@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
> 
>  	/* now check that we get the right number of callbacks */
>  	if (lcore_id == main_lcore) {
> -		my_collisions = rte_atomic32_read(&collisions);
> +		my_collisions = __atomic_load_n(&collisions,
> __ATOMIC_RELAXED);
>  		if (my_collisions != 0)
>  			printf("- %d timer reset collisions (OK)\n",
> my_collisions);
>  		rte_timer_manage();
> @@ -573,6 +568,7 @@ test_timer(void)
>  	/* run a second, slightly different set of stress tests */
>  	printf("\nStart timer stress tests 2\n");
>  	test_failed = 0;
> +	main_init_workers();
>  	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL,
> CALL_MAIN);
>  	rte_eal_mp_wait_lcore();
>  	if (test_failed)
> diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
> index 16a9f1878b..5795c97f07 100644
> --- a/app/test/test_timer_secondary.c
> +++ b/app/test/test_timer_secondary.c
> @@ -9,7 +9,6 @@
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_memzone.h>
> -#include <rte_atomic.h>
>  #include <rte_timer.h>
>  #include <rte_cycles.h>
>  #include <rte_mempool.h>
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
@ 2021-11-16 20:15   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 20:15 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for
> display sync.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-compress-perf/comp_perf_test_common.h    |  2 +-
>  .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
>  .../comp_perf_test_throughput.c                   | 10 +++++++---
>  app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
>  4 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-
> compress-perf/comp_perf_test_common.h
> index 72705c6a2b..d039e5a29a 100644
> --- a/app/test-compress-perf/comp_perf_test_common.h
> +++ b/app/test-compress-perf/comp_perf_test_common.h
> @@ -14,7 +14,7 @@ struct cperf_mem_resources {
>  	uint16_t qp_id;
>  	uint8_t lcore_id;
> 
> -	rte_atomic16_t print_info_once;
> +	uint16_t print_info_once;
> 
>  	uint32_t total_bufs;
>  	uint8_t *compressed_data;
> diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-
> compress-perf/comp_perf_test_cyclecount.c
> index c875ddbdac..da55b02b74 100644
> --- a/app/test-compress-perf/comp_perf_test_cyclecount.c
> +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
> @@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  	struct cperf_cyclecount_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	static rte_spinlock_t print_spinlock;
>  	int i;
> 
> @@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  			(ctx->ver.mem.total_bufs * test_data->num_iter);
> 
>  	/* R E P O R T processing */
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	rte_spinlock_lock(&print_spinlock);
> 
> -		rte_spinlock_lock(&print_spinlock);
> +	if (display_once == 0) {
> +		display_once = 1;
> 
>  		printf("\nLegend for the table\n"
>  		"  - Retries section: number of retries for the following
> operations:\n"
> @@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
>  			"setup/op",
>  			"[C-e]", "[C-d]",
>  			"[D-e]", "[D-d]");
> -
> -		rte_spinlock_unlock(&print_spinlock);
>  	}
> 
> -	rte_spinlock_lock(&print_spinlock);
> -
>  	printf("%12u"
>  	       "%6u"
>  	       "%12zu"
> diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-
> compress-perf/comp_perf_test_throughput.c
> index 13922b658c..d3dff070b0 100644
> --- a/app/test-compress-perf/comp_perf_test_throughput.c
> +++ b/app/test-compress-perf/comp_perf_test_throughput.c
> @@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
>  	struct cperf_benchmark_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->ver.options;
>  	uint32_t lcore = rte_lcore_id();
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	int i, ret = EXIT_SUCCESS;
> 
>  	ctx->ver.mem.lcore_id = lcore;
> 
> +	uint16_t exp = 0;
>  	/*
>  	 * printing information about current compression thread
>  	 */
> -	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
> +	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once,
> &exp,
> +				1, 0, __ATOMIC_RELAXED,
> __ATOMIC_RELAXED))
>  		printf("    lcore: %u,"
>  				" driver name: %s,"
>  				" device name: %s,"
> @@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
>  	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
>  			1000000000;
> 
> -	if (rte_atomic16_test_and_set(&display_once)) {
> +	exp = 0;
> +	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  		printf("\n%12s%6s%12s%17s%15s%16s\n",
>  			"lcore id", "Level", "Comp size", "Comp ratio [%]",
>  			"Comp [Gbps]", "Decomp [Gbps]");
> diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-
> compress-perf/comp_perf_test_verify.c
> index 5e13257b79..f6e21368e8 100644
> --- a/app/test-compress-perf/comp_perf_test_verify.c
> +++ b/app/test-compress-perf/comp_perf_test_verify.c
> @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
>  	struct cperf_verify_ctx *ctx = test_ctx;
>  	struct comp_test_data *test_data = ctx->options;
>  	int ret = EXIT_SUCCESS;
> -	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
> +	static uint16_t display_once;
>  	uint32_t lcore = rte_lcore_id();
> 
>  	ctx->mem.lcore_id = lcore;
> @@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
>  	ctx->ratio = (double) ctx->comp_data_sz /
>  			test_data->input_data_sz * 100;
> 
> +	uint16_t exp = 0;
>  	if (!ctx->silent) {
> -		if (rte_atomic16_test_and_set(&display_once)) {
> +		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
> +				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
>  			printf("%12s%6s%12s%17s\n",
>  			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
>  		}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
  2021-11-16 19:52   ` Honnappa Nagarahalli
@ 2021-11-16 20:20   ` David Marchand
  2021-11-16 21:21     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 36+ messages in thread
From: David Marchand @ 2021-11-16 20:20 UTC (permalink / raw)
  To: Joyce Kong, Honnappa Nagarahalli
  Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang

Joyce, Honnappa,

On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Convert rte_atomic usages to compiler atomic
> built-ins for lcore_state and collisions sync.
>
> Also, move 'main_init_workers' outside of
> 'timer_stress2_main_loop' to guarantee lcore_state
> initialized correctly before the threads launched.

Is this "also" part actually related to the change?
Or is it a separate fix?


>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>



-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-16 20:23   ` David Marchand
  2021-11-17  7:05     ` Joyce Kong
  0 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2021-11-16 20:23 UTC (permalink / raw)
  To: Joyce Kong
  Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev, dev, nd, Ruifeng Wang

On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Remove the unnecessary rte_atomic.h included in app modules.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

After patch, I still see:

$ git grep rte_atomic.h app/
app/test/commands.c:#include <rte_atomic.h>
app/test/test_atomic.c:#include <rte_atomic.h>
app/test/test_event_timer_adapter.c:#include <rte_atomic.h>

I can undertand why the test_atomic would depend on rte_atomic.h :-)
but not the rest.
Is there a reason? or is it just a miss?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16 20:20   ` David Marchand
@ 2021-11-16 21:21     ` Honnappa Nagarahalli
  2021-11-17  9:29       ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:21 UTC (permalink / raw)
  To: David Marchand, Joyce Kong
  Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang, nd

<snip>

> 
> Joyce, Honnappa,
> 
> On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
> >
> > Convert rte_atomic usages to compiler atomic built-ins for lcore_state
> > and collisions sync.
> >
> > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> > guarantee lcore_state initialized correctly before the threads
> > launched.
> 
> Is this "also" part actually related to the change?
> Or is it a separate fix?
'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple.

> 
> 
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-16 21:30   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:30 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>

> 
> Convert rte_atomic usages to compiler atomic built-ins for polling sync in
> pmd_perf test cases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_pmd_perf.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index
> 1df86ce080..546384a50d 100644
> --- a/app/test/test_pmd_perf.c
> +++ b/app/test/test_pmd_perf.c
> @@ -10,7 +10,6 @@
>  #include <rte_cycles.h>
>  #include <rte_ethdev.h>
>  #include <rte_byteorder.h>
> -#include <rte_atomic.h>
>  #include <rte_malloc.h>
>  #include "packet_burst_generator.h"
>  #include "test.h"
> @@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
>  	return 0;
>  }
> 
> -static rte_atomic64_t start;
> +static uint64_t start;
> 
>  static inline int
>  poll_burst(void *args)
> @@ -563,8 +562,7 @@ poll_burst(void *args)
>  		num[portid] = pkt_per_port;
>  	}
> 
> -	while (!rte_atomic64_read(&start))
> -		;
> +	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
> 
>  	cur_tsc = rte_rdtsc();
>  	while (total) {
> @@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore)
>  	pkt_per_port = MAX_TRAFFIC_BURST;
>  	num = pkt_per_port * conf->nb_ports;
> 
> -	rte_atomic64_init(&start);
> -
>  	/* start polling thread, but not actually poll yet */
>  	rte_eal_remote_launch(poll_burst,
>  			      (void *)&pkt_per_port, lcore);
> 
>  	/* Only when polling first */
>  	if (flags == SC_BURST_POLL_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
> +	else
> +		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
These lines need to be moved up before calling rte_eal_remote_launch, so that update to start is visible to the worker threads.

> 
>  	/* start xmit */
>  	i = 0;
> @@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore)
> 
>  	/* only when polling second  */
>  	if (flags == SC_BURST_XMIT_FIRST)
> -		rte_atomic64_set(&start, 1);
> +		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
> 
>  	/* wait for polling finished */
>  	diff_tsc = rte_eal_wait_lcore(lcore);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 10/12] app/testpmd: remove atomic operations for port status
  2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-16 21:34   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2021-11-16 21:34 UTC (permalink / raw)
  To: Joyce Kong, Xiaoyun Li; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd

<snip>
> 
> The port_status changes do not need to be handled atomically, as they are
> modified during initialization or through the testpmd prompt instead of
> multiple threads.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
>  1 file changed, 31 insertions(+), 27 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> a66dfb297c..ed472cacd2 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -36,7 +36,6 @@
>  #include <rte_alarm.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_malloc.h>
> @@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  			continue;
> 
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi,
> uint16_t cnt_pi)
>  			continue;
> 
>  		/* Fail to setup rx queue, return */
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -					RTE_PORT_HANDLING,
> -					RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr,
>  				"Port %d can not be set back to stopped\n",
> pi);
>  		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
> @@ -2729,8 +2728,9 @@ start_port(portid_t pid)
> 
>  		need_check_link_status = 0;
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STOPPED,
> -						 RTE_PORT_HANDLING) == 0)
> {
> +		if (port->port_status == RTE_PORT_STOPPED)
> +			port->port_status = RTE_PORT_HANDLING;
> +		else {
>  			fprintf(stderr, "Port %d is now not stopped\n", pi);
>  			continue;
>  		}
> @@ -2766,8 +2766,9 @@ start_port(portid_t pid)
>  						     nb_txq + nb_hairpinq,
>  						     &(port->dev_conf));
>  			if (diag != 0) {
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2828,9 +2829,9 @@ start_port(portid_t pid)
>  					continue;
> 
>  				/* Fail to setup tx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2880,9 +2881,9 @@ start_port(portid_t pid)
>  					continue;
> 
>  				/* Fail to setup rx queue, return */
> -				if (rte_atomic16_cmpset(&(port-
> >port_status),
> -
> 	RTE_PORT_HANDLING,
> -							RTE_PORT_STOPPED)
> == 0)
> +				if (port->port_status ==
> RTE_PORT_HANDLING)
> +					port->port_status =
> RTE_PORT_STOPPED;
> +				else
>  					fprintf(stderr,
>  						"Port %d can not be set back
> to stopped\n",
>  						pi);
> @@ -2917,16 +2918,18 @@ start_port(portid_t pid)
>  				pi, rte_strerror(-diag));
> 
>  			/* Fail to setup rx queue, return */
> -			if (rte_atomic16_cmpset(&(port->port_status),
> -				RTE_PORT_HANDLING, RTE_PORT_STOPPED)
> == 0)
> +			if (port->port_status == RTE_PORT_HANDLING)
> +				port->port_status = RTE_PORT_STOPPED;
> +			else
>  				fprintf(stderr,
>  					"Port %d can not be set back to
> stopped\n",
>  					pi);
>  			continue;
>  		}
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STARTED;
> +		else
>  			fprintf(stderr, "Port %d can not be set into started\n",
>  				pi);
> 
> @@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> RTE_PORT_STARTED,
> -						RTE_PORT_HANDLING) == 0)
> +		if (port->port_status == RTE_PORT_STARTED)
> +			port->port_status = RTE_PORT_HANDLING;
> +		else
>  			continue;
> 
>  		if (hairpin_mode & 0xf) {
> @@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
>  			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port
> %u\n",
>  				pi);
> 
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
> +		if (port->port_status == RTE_PORT_HANDLING)
> +			port->port_status = RTE_PORT_STOPPED;
> +		else
>  			fprintf(stderr, "Port %d can not be set into
> stopped\n",
>  				pi);
>  		need_check_link_status = 1;
> @@ -3119,8 +3124,7 @@ close_port(portid_t pid)
>  		}
> 
>  		port = &ports[pi];
> -		if (rte_atomic16_cmpset(&(port->port_status),
> -			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
> +		if (port->port_status == RTE_PORT_CLOSED) {
>  			fprintf(stderr, "Port %d is already closed\n", pi);
>  			continue;
>  		}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 12/12] app: remove unnecessary include of atomic header file
  2021-11-16 20:23   ` David Marchand
@ 2021-11-17  7:05     ` Joyce Kong
  0 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  7:05 UTC (permalink / raw)
  To: David Marchand
  Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli,
	Konstantin Ananyev, dev, nd, Ruifeng Wang

<snip>

> Subject: Re: [PATCH v2 12/12] app: remove unnecessary include of atomic
> header file
> 
> On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote:
> >
> > Remove the unnecessary rte_atomic.h included in app modules.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> After patch, I still see:
> 
> $ git grep rte_atomic.h app/
> app/test/commands.c:#include <rte_atomic.h>
> app/test/test_atomic.c:#include <rte_atomic.h>
> app/test/test_event_timer_adapter.c:#include <rte_atomic.h>
> 
> I can undertand why the test_atomic would depend on rte_atomic.h :-) but
> not the rest.
> Is there a reason? or is it just a miss?
> 
> --
> David Marchand

Hi David, I checked the rest and it was a miss. Thanks for the remind, would update in v3.

Joyce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 00/12] use compiler atomic builtins for app modules
  2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
                   ` (11 preceding siblings ...)
  2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-17  8:21 ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
                     ` (12 more replies)
  12 siblings, 13 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong

Since atomic operations have been adopted in DPDK now[1],
change rte_atomicNN_xxx APIs to compiler atomic built-ins
in app modules[2].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
[2] https://doc.dpdk.org/guides/rel_notes/deprecation.html

v3:
  1. In pmd_perf test case, move the initialization of polling
     start before calling rte_eal_remote_launch, so the update
     is visible to the worker threads.(Honnappa Nagarahalli)
  2. Remove the rest rte_atomic.h which miss in v2.(David Marchand)

v2:
  By Honnappa Nagarahalli:
  1. Replace the RELAXED barriers with suitable ones for shared
     data sync in pmd_perf and timer test cases.
  2. Avoid unnecessary atomic operations in compress and testpmd
     modules.
  3. Fix some typo.

Joyce Kong (12):
  test/pmd_perf: use compiler atomic builtins for polling sync
  test/ring_perf: use compiler atomic builtins for lcores sync
  test/timer: use compiler atomic builtins for sync
  test/stack_perf: use compiler atomics for lcore sync
  test/bpf: use compiler atomics for calculation
  test/func_reentrancy: use compiler atomics for data sync
  app/eventdev: use compiler atomics for shared data sync
  app/crypto: use compiler atomic builtins for display sync
  app/compress: use compiler atomic builtins for display sync
  app/testpmd: remove atomic operations for port status
  app/bbdev: use compiler atomics for shared data sync
  app: remove unnecessary include of atomic header file

 app/proc-info/main.c                          |   1 -
 app/test-bbdev/test_bbdev_perf.c              | 135 ++++++++----------
 .../comp_perf_test_common.h                   |   2 +-
 .../comp_perf_test_cyclecount.c               |  15 +-
 .../comp_perf_test_throughput.c               |  10 +-
 .../comp_perf_test_verify.c                   |   6 +-
 app/test-crypto-perf/cperf_test_latency.c     |   6 +-
 .../cperf_test_pmd_cyclecount.c               |   9 +-
 app/test-crypto-perf/cperf_test_throughput.c  |   9 +-
 app/test-crypto-perf/cperf_test_verify.c      |   9 +-
 app/test-eventdev/evt_main.c                  |   1 -
 app/test-eventdev/test_order_atq.c            |   4 +-
 app/test-eventdev/test_order_common.c         |   4 +-
 app/test-eventdev/test_order_common.h         |   8 +-
 app/test-eventdev/test_order_queue.c          |   4 +-
 app/test-pipeline/config.c                    |   1 -
 app/test-pipeline/init.c                      |   1 -
 app/test-pipeline/main.c                      |   1 -
 app/test-pipeline/runtime.c                   |   1 -
 app/test-pmd/cmdline.c                        |   1 -
 app/test-pmd/config.c                         |   1 -
 app/test-pmd/csumonly.c                       |   1 -
 app/test-pmd/flowgen.c                        |   1 -
 app/test-pmd/icmpecho.c                       |   1 -
 app/test-pmd/iofwd.c                          |   1 -
 app/test-pmd/macfwd.c                         |   1 -
 app/test-pmd/macswap.c                        |   1 -
 app/test-pmd/parameters.c                     |   1 -
 app/test-pmd/rxonly.c                         |   1 -
 app/test-pmd/testpmd.c                        |  58 ++++----
 app/test-pmd/txonly.c                         |   1 -
 app/test/commands.c                           |   1 -
 app/test/test_barrier.c                       |   1 -
 app/test/test_bpf.c                           |  28 ++--
 app/test/test_event_timer_adapter.c           |   1 -
 app/test/test_func_reentrancy.c               |  27 ++--
 app/test/test_mbuf.c                          |   1 -
 app/test/test_mp_secondary.c                  |   1 -
 app/test/test_pmd_perf.c                      |  23 +--
 app/test/test_ring.c                          |   1 -
 app/test/test_ring_perf.c                     |   9 +-
 app/test/test_stack_perf.c                    |  14 +-
 app/test/test_timer.c                         |  30 ++--
 app/test/test_timer_secondary.c               |   1 -
 44 files changed, 203 insertions(+), 231 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for polling sync in pmd_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_pmd_perf.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 1df86ce080..a6bac9d45e 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -10,7 +10,6 @@
 #include <rte_cycles.h>
 #include <rte_ethdev.h>
 #include <rte_byteorder.h>
-#include <rte_atomic.h>
 #include <rte_malloc.h>
 #include "packet_burst_generator.h"
 #include "test.h"
@@ -525,7 +524,7 @@ main_loop(__rte_unused void *args)
 	return 0;
 }
 
-static rte_atomic64_t start;
+static uint64_t start;
 
 static inline int
 poll_burst(void *args)
@@ -563,8 +562,7 @@ poll_burst(void *args)
 		num[portid] = pkt_per_port;
 	}
 
-	while (!rte_atomic64_read(&start))
-		;
+	rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE);
 
 	cur_tsc = rte_rdtsc();
 	while (total) {
@@ -616,16 +614,19 @@ exec_burst(uint32_t flags, int lcore)
 	pkt_per_port = MAX_TRAFFIC_BURST;
 	num = pkt_per_port * conf->nb_ports;
 
-	rte_atomic64_init(&start);
+	/* only when polling first */
+	if (flags == SC_BURST_POLL_FIRST)
+		__atomic_store_n(&start, 1, __ATOMIC_RELAXED);
+	else
+		__atomic_store_n(&start, 0, __ATOMIC_RELAXED);
 
-	/* start polling thread, but not actually poll yet */
+	/* start polling thread
+	 * if in POLL_FIRST mode, poll once launched;
+	 * otherwise, not actually poll yet
+	 */
 	rte_eal_remote_launch(poll_burst,
 			      (void *)&pkt_per_port, lcore);
 
-	/* Only when polling first */
-	if (flags == SC_BURST_POLL_FIRST)
-		rte_atomic64_set(&start, 1);
-
 	/* start xmit */
 	i = 0;
 	while (num) {
@@ -641,7 +642,7 @@ exec_burst(uint32_t flags, int lcore)
 
 	/* only when polling second  */
 	if (flags == SC_BURST_XMIT_FIRST)
-		rte_atomic64_set(&start, 1);
+		__atomic_store_n(&start, 1, __ATOMIC_RELEASE);
 
 	/* wait for polling finished */
 	diff_tsc = rte_eal_wait_lcore(lcore);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcores sync in ring_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring_perf.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index fd82e20412..2d8bb675a3 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize)
 	return 0;
 }
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 static uint64_t queue_count[RTE_MAX_LCORE];
 
 #define TIME_MS 100
@@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			rte_pause();
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (time_diff < hz * TIME_MS / 1000) {
@@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize)
 		param.r = r;
 
 		/* clear synchro and start workers */
-		rte_atomic32_set(&synchro, 0);
+		__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 		if (rte_eal_mp_remote_launch(lcore_f, &param, SKIP_MAIN) < 0)
 			return -1;
 
 		/* start synchro and launch test on main */
-		rte_atomic32_set(&synchro, 1);
+		__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 		lcore_f(&param);
 
 		rte_eal_mp_wait_lcore();
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Robert Sanford, Erik Gabriel Carrillo
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic
built-ins for lcore_state and collisions sync.

Also, move 'main_init_workers' outside of
'timer_stress2_main_loop' to guarantee lcore_state
initialized correctly before the threads launched.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_timer.c           | 30 +++++++++++++-----------------
 app/test/test_timer_secondary.c |  1 -
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/app/test/test_timer.c b/app/test/test_timer.c
index a10b2fe9da..c97e5c891c 100644
--- a/app/test/test_timer.c
+++ b/app/test/test_timer.c
@@ -102,7 +102,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_random.h>
 #include <rte_malloc.h>
@@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg)
 
 /* Need to synchronize worker lcores through multiple steps. */
 enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED };
-static rte_atomic16_t lcore_state[RTE_MAX_LCORE];
+static uint16_t lcore_state[RTE_MAX_LCORE];
 
 static void
 main_init_workers(void)
@@ -211,7 +210,7 @@ main_init_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_WAITING);
+		__atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED);
 	}
 }
 
@@ -221,11 +220,10 @@ main_start_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL);
+		__atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE);
 	}
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -235,8 +233,7 @@ main_wait_for_workers(void)
 	unsigned i;
 
 	RTE_LCORE_FOREACH_WORKER(i) {
-		while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED)
-			rte_pause();
+		rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE);
 	}
 }
 
@@ -245,9 +242,8 @@ worker_wait_to_start(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL)
-		rte_pause();
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING);
+	rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE);
 }
 
 static void
@@ -255,7 +251,7 @@ worker_finish(void)
 {
 	unsigned lcore_id = rte_lcore_id();
 
-	rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED);
+	__atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE);
 }
 
 
@@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg)
 	unsigned int lcore_id = rte_lcore_id();
 	unsigned int main_lcore = rte_get_main_lcore();
 	int32_t my_collisions = 0;
-	static rte_atomic32_t collisions;
+	static uint32_t collisions;
 
 	if (lcore_id == main_lcore) {
 		cb_count = 0;
 		test_failed = 0;
-		rte_atomic32_set(&collisions, 0);
-		main_init_workers();
+		__atomic_store_n(&collisions, 0, __ATOMIC_RELAXED);
 		timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0);
 		if (timers == NULL) {
 			printf("Test Failed\n");
@@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 			my_collisions++;
 	}
 	if (my_collisions != 0)
-		rte_atomic32_add(&collisions, my_collisions);
+		__atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED);
 
 	/* wait long enough for timers to expire */
 	rte_delay_ms(100);
@@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg)
 
 	/* now check that we get the right number of callbacks */
 	if (lcore_id == main_lcore) {
-		my_collisions = rte_atomic32_read(&collisions);
+		my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED);
 		if (my_collisions != 0)
 			printf("- %d timer reset collisions (OK)\n", my_collisions);
 		rte_timer_manage();
@@ -573,6 +568,7 @@ test_timer(void)
 	/* run a second, slightly different set of stress tests */
 	printf("\nStart timer stress tests 2\n");
 	test_failed = 0;
+	main_init_workers();
 	rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN);
 	rte_eal_mp_wait_lcore();
 	if (test_failed)
diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c
index 16a9f1878b..5795c97f07 100644
--- a/app/test/test_timer_secondary.c
+++ b/app/test/test_timer_secondary.c
@@ -9,7 +9,6 @@
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_memzone.h>
-#include <rte_atomic.h>
 #include <rte_timer.h>
 #include <rte_cycles.h>
 #include <rte_mempool.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (2 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for lcore sync in stack_perf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_stack_perf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c
index 4ee40d5d19..1eae00a334 100644
--- a/app/test/test_stack_perf.c
+++ b/app/test/test_stack_perf.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <inttypes.h>
 
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_launch.h>
 #include <rte_pause.h>
@@ -24,7 +23,7 @@
  */
 static volatile unsigned int bulk_sizes[] = {8, MAX_BURST};
 
-static rte_atomic32_t lcore_barrier;
+static uint32_t lcore_barrier;
 
 struct lcore_pair {
 	unsigned int c1;
@@ -144,9 +143,8 @@ bulk_push_pop(void *p)
 	s = args->s;
 	size = args->sz;
 
-	rte_atomic32_sub(&lcore_barrier, 1);
-	while (rte_atomic32_read(&lcore_barrier) != 0)
-		rte_pause();
+	__atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	uint64_t start = rte_rdtsc();
 
@@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s,
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(bulk_sizes); i++) {
-		rte_atomic32_set(&lcore_barrier, 2);
+		__atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED);
 
 		args[0].sz = args[1].sz = bulk_sizes[i];
 		args[0].s = args[1].s = s;
@@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n)
 		int cnt = 0;
 		double avg;
 
-		rte_atomic32_set(&lcore_barrier, n);
+		__atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED);
 
 		RTE_LCORE_FOREACH_WORKER(lcore_id) {
 			if (++cnt >= n)
@@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags)
 	struct lcore_pair cores;
 	struct rte_stack *s;
 
-	rte_atomic32_init(&lcore_barrier);
+	__atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED);
 
 	s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags);
 	if (s == NULL) {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 05/12] test/bpf: use compiler atomics for calculation
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (3 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for calculation in bpf test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_bpf.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e3e9a1b0b5..b8be1e3d30 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg)
 	memset(&dfe, 0, sizeof(dfe));
 
 	rv = 1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = -1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = (int32_t)TEST_FILL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_1;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_MUL_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_2;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	rv = TEST_JCC_3;
-	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
-	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+	__atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED);
 
 	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (4 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko, Bruce Richardson,
	Vladimir Medvedkin, Honnappa Nagarahalli, Konstantin Ananyev,
	Anatoly Burakov, Yipeng Wang, Sameh Gobriel
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in func_reentrancy test cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_func_reentrancy.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index 838ab6f0f9..7825c6cb86 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
@@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id);
 
 #define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
 
-static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0);
-static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0);
+static uint32_t obj_count;
+static uint32_t synchro;
 
 #define WAIT_SYNCHRO_FOR_WORKERS()   do { \
 	if (lcore_self != rte_get_main_lcore())                  \
-		while (rte_atomic32_read(&synchro) == 0);        \
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \
 } while(0)
 
 /*
@@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg)
 
 	WAIT_SYNCHRO_FOR_WORKERS();
 
-	rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */
+	__atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */
 	if (rte_eal_init(0, NULL) != -1)
 		return -1;
 
@@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0);
 		if (rp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg)
 					my_obj_init, NULL,
 					SOCKET_ID_ANY, 0);
 		if (mp != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create/lookup new ring several times */
@@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_hash_create(&hash_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple times simultaneously */
@@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		handle = rte_fbk_hash_create(&fbk_params);
 		if (handle != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg)
 	for (i = 0; i < MAX_ITER_ONCE; i++) {
 		lpm = rte_lpm_create("fr_test_once",  SOCKET_ID_ANY, &config);
 		if (lpm != NULL)
-			rte_atomic32_inc(&obj_count);
+			__atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED);
 	}
 
 	/* create mutiple fbk tables simultaneously */
@@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case)
 	if (pt_case->func == NULL)
 		return -1;
 
-	rte_atomic32_set(&obj_count, 0);
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	cores = RTE_MIN(rte_lcore_count(), MAX_LCORES);
 	RTE_LCORE_FOREACH_WORKER(lcore_id) {
@@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case)
 		rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
 	}
 
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	if (pt_case->func(pt_case->arg) < 0)
 		ret = -1;
@@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case)
 			pt_case->clean(lcore_id);
 	}
 
-	count = rte_atomic32_read(&obj_count);
+	count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED);
 	if (count != 1) {
 		printf("%s: common object allocated %d times (should be 1)\n",
 			pt_case->name, count);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 07/12] app/eventdev: use compiler atomics for shared data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (5 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in eventdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/evt_main.c          | 1 -
 app/test-eventdev/test_order_atq.c    | 4 ++--
 app/test-eventdev/test_order_common.c | 4 ++--
 app/test-eventdev/test_order_common.h | 8 ++++----
 app/test-eventdev/test_order_queue.c  | 4 ++--
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c
index 3534aabca7..194c980c7a 100644
--- a/app/test-eventdev/evt_main.c
+++ b/app/test-eventdev/evt_main.c
@@ -6,7 +6,6 @@
 #include <unistd.h>
 #include <signal.h>
 
-#include <rte_atomic.h>
 #include <rte_debug.h>
 #include <rte_eal.h>
 #include <rte_eventdev.h>
diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c
index 71215a07b6..2fee4b4daa 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c
index d7760061ba..ff7813f9c2 100644
--- a/app/test-eventdev/test_order_common.c
+++ b/app/test-eventdev/test_order_common.c
@@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt)
 		evt_err("failed to allocate t->expected_flow_seq memory");
 		goto exp_nomem;
 	}
-	rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts);
+	__atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED);
 	t->err = false;
 	t->nb_pkts = opt->nb_pkts;
 	t->nb_flows = opt->nb_flows;
@@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt,
 
 	while (t->err == false) {
 		uint64_t new_cycles = rte_get_timer_cycles();
-		int64_t remaining = rte_atomic64_read(&t->outstand_pkts);
+		int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED);
 
 		if (remaining <= 0) {
 			t->result = EVT_TEST_SUCCESS;
diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h
index cd9d6009ec..92781d9587 100644
--- a/app/test-eventdev/test_order_common.h
+++ b/app/test-eventdev/test_order_common.h
@@ -48,7 +48,7 @@ struct test_order {
 	 * The atomic_* is an expensive operation,Since it is a functional test,
 	 * We are using the atomic_ operation to reduce the code complexity.
 	 */
-	rte_atomic64_t outstand_pkts;
+	uint64_t outstand_pkts;
 	enum evt_test_result result;
 	uint32_t nb_flows;
 	uint64_t nb_pkts;
@@ -95,7 +95,7 @@ static __rte_always_inline void
 order_process_stage_1(struct test_order *const t,
 		struct rte_event *const ev, const uint32_t nb_flows,
 		uint32_t *const expected_flow_seq,
-		rte_atomic64_t *const outstand_pkts)
+		uint64_t *const outstand_pkts)
 {
 	const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows;
 	/* compare the seqn against expected value */
@@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t,
 	 */
 	expected_flow_seq[flow]++;
 	rte_pktmbuf_free(ev->mbuf);
-	rte_atomic64_sub(outstand_pkts, 1);
+	__atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline void
@@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t,
 	const uint8_t port = w->port_id;\
 	const uint32_t nb_flows = t->nb_flows;\
 	uint32_t *expected_flow_seq = t->expected_flow_seq;\
-	rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\
+	uint64_t *outstand_pkts = &t->outstand_pkts;\
 	if (opt->verbose_level > 1)\
 		printf("%s(): lcore %d dev_id %d port=%d\n",\
 			__func__, rte_lcore_id(), dev_id, port)
diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c
index 621367805a..80eaea5cf5 100644
--- a/app/test-eventdev/test_order_queue.c
+++ b/app/test-eventdev/test_order_queue.c
@@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap)
 		uint16_t event = rte_event_dequeue_burst(dev_id, port,
 					&ev, 1, 0);
 		if (!event) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
@@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap)
 				BURST_SIZE, 0);
 
 		if (nb_rx == 0) {
-			if (rte_atomic64_read(outstand_pkts) <= 0)
+			if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0)
 				break;
 			rte_pause();
 			continue;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (6 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Declan Doherty, Ciara Power
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync in crypto cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-crypto-perf/cperf_test_latency.c        | 6 ++++--
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++---
 app/test-crypto-perf/cperf_test_throughput.c     | 9 ++++++---
 app/test-crypto-perf/cperf_test_verify.c         | 9 ++++++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index 69f55de50a..ce49feaba9 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	if (ctx == NULL)
 		return 0;
@@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg)
 		time_max = tunit*(double)(tsc_max) / tsc_hz;
 		time_min = tunit*(double)(tsc_min) / tsc_hz;
 
+		uint16_t exp = 0;
 		if (ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, "
 						"cycles, time (us)");
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index fda97e8ab9..ba1f104f72 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 	state.lcore = rte_lcore_id();
 	state.linearize = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static bool warmup = true;
 
 	/*
@@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 			continue;
 		}
 
+		uint16_t exp = 0;
 		if (!opts->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(PRETTY_HDR_FMT, "lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
@@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx)
 					state.cycles_per_enq,
 					state.cycles_per_deq);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf(CSV_HDR_FMT, "# lcore id", "Buf Size",
 						"Burst Size", "Enqueued",
 						"Dequeued", "Enq Retries",
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 739ed9e573..51512af2ad 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx)
 	uint8_t burst_size_idx = 0;
 	uint32_t imix_idx = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	struct rte_crypto_op *ops[ctx->options->max_burst_size];
 	struct rte_crypto_op *ops_processed[ctx->options->max_burst_size];
@@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx)
 		double cycles_per_packet = ((double)tsc_duration /
 				ctx->options->total_ops);
 
+		uint16_t exp = 0;
 		if (!ctx->options->csv) {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 					"lcore id", "Buf Size", "Burst Size",
 					"Enqueued", "Dequeued", "Failed Enq",
@@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx)
 					throughput_gbps,
 					cycles_per_packet);
 		} else {
-			if (rte_atomic16_test_and_set(&display_once))
+			if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 				printf("#lcore id,Buffer Size(B),"
 					"Burst Size,Enqueued,Dequeued,Failed Enq,"
 					"Failed Deq,Ops(Millions),Throughput(Gbps),"
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 1962438034..496eb0de00 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx)
 	uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0;
 	uint64_t ops_failed = 0;
 
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 
 	uint64_t i;
 	uint16_t ops_unused = 0;
@@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx)
 		ops_deqd_total += ops_deqd;
 	}
 
+	uint16_t exp = 0;
 	if (!ctx->options->csv) {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n",
 				"lcore id", "Buf Size", "Burst size",
 				"Enqueued", "Dequeued", "Failed Enq",
@@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx)
 				ops_deqd_failed,
 				ops_failed);
 	} else {
-		if (rte_atomic16_test_and_set(&display_once))
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED))
 			printf("\n# lcore id, Buffer Size(B), "
 				"Burst Size,Enqueued,Dequeued,Failed Enq,"
 				"Failed Deq,Failed Ops\n");
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 09/12] app/compress: use compiler atomic builtins for display sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (7 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic_test_and_set usage to compiler atomic
CAS operation for display sync.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-compress-perf/comp_perf_test_common.h    |  2 +-
 .../comp_perf_test_cyclecount.c                   | 15 +++++++--------
 .../comp_perf_test_throughput.c                   | 10 +++++++---
 app/test-compress-perf/comp_perf_test_verify.c    |  6 ++++--
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h
index 72705c6a2b..d039e5a29a 100644
--- a/app/test-compress-perf/comp_perf_test_common.h
+++ b/app/test-compress-perf/comp_perf_test_common.h
@@ -14,7 +14,7 @@ struct cperf_mem_resources {
 	uint16_t qp_id;
 	uint8_t lcore_id;
 
-	rte_atomic16_t print_info_once;
+	uint16_t print_info_once;
 
 	uint32_t total_bufs;
 	uint8_t *compressed_data;
diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c
index c875ddbdac..da55b02b74 100644
--- a/app/test-compress-perf/comp_perf_test_cyclecount.c
+++ b/app/test-compress-perf/comp_perf_test_cyclecount.c
@@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx)
 	struct cperf_cyclecount_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	static rte_spinlock_t print_spinlock;
 	int i;
 
@@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx)
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED,  __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			(ctx->ver.mem.total_bufs * test_data->num_iter);
 
 	/* R E P O R T processing */
-	if (rte_atomic16_test_and_set(&display_once)) {
+	rte_spinlock_lock(&print_spinlock);
 
-		rte_spinlock_lock(&print_spinlock);
+	if (display_once == 0) {
+		display_once = 1;
 
 		printf("\nLegend for the table\n"
 		"  - Retries section: number of retries for the following operations:\n"
@@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx)
 			"setup/op",
 			"[C-e]", "[C-d]",
 			"[D-e]", "[D-d]");
-
-		rte_spinlock_unlock(&print_spinlock);
 	}
 
-	rte_spinlock_lock(&print_spinlock);
-
 	printf("%12u"
 	       "%6u"
 	       "%12zu"
diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c
index 13922b658c..d3dff070b0 100644
--- a/app/test-compress-perf/comp_perf_test_throughput.c
+++ b/app/test-compress-perf/comp_perf_test_throughput.c
@@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx)
 	struct cperf_benchmark_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->ver.options;
 	uint32_t lcore = rte_lcore_id();
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	int i, ret = EXIT_SUCCESS;
 
 	ctx->ver.mem.lcore_id = lcore;
 
+	uint16_t exp = 0;
 	/*
 	 * printing information about current compression thread
 	 */
-	if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once))
+	if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp,
+				1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
 		printf("    lcore: %u,"
 				" driver name: %s,"
 				" device name: %s,"
@@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx)
 	ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 /
 			1000000000;
 
-	if (rte_atomic16_test_and_set(&display_once)) {
+	exp = 0;
+	if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 		printf("\n%12s%6s%12s%17s%15s%16s\n",
 			"lcore id", "Level", "Comp size", "Comp ratio [%]",
 			"Comp [Gbps]", "Decomp [Gbps]");
diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c
index 5e13257b79..f6e21368e8 100644
--- a/app/test-compress-perf/comp_perf_test_verify.c
+++ b/app/test-compress-perf/comp_perf_test_verify.c
@@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx)
 	struct cperf_verify_ctx *ctx = test_ctx;
 	struct comp_test_data *test_data = ctx->options;
 	int ret = EXIT_SUCCESS;
-	static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0);
+	static uint16_t display_once;
 	uint32_t lcore = rte_lcore_id();
 
 	ctx->mem.lcore_id = lcore;
@@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx)
 	ctx->ratio = (double) ctx->comp_data_sz /
 			test_data->input_data_sz * 100;
 
+	uint16_t exp = 0;
 	if (!ctx->silent) {
-		if (rte_atomic16_test_and_set(&display_once)) {
+		if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0,
+				__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
 			printf("%12s%6s%12s%17s\n",
 			    "lcore id", "Level", "Comp size", "Comp ratio [%]");
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 10/12] app/testpmd: remove atomic operations for port status
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (8 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

The port_status changes do not need to be handled
atomically, as they are modified during initialization
or through the testpmd prompt instead of multiple
threads.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index a66dfb297c..ed472cacd2 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -36,7 +36,6 @@
 #include <rte_alarm.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
@@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi)
 			continue;
 
 		/* Fail to setup rx queue, return */
-		if (rte_atomic16_cmpset(&(port->port_status),
-					RTE_PORT_HANDLING,
-					RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr,
 				"Port %d can not be set back to stopped\n", pi);
 		fprintf(stderr, "Fail to configure port %d hairpin queues\n",
@@ -2729,8 +2728,9 @@ start_port(portid_t pid)
 
 		need_check_link_status = 0;
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED,
-						 RTE_PORT_HANDLING) == 0) {
+		if (port->port_status == RTE_PORT_STOPPED)
+			port->port_status = RTE_PORT_HANDLING;
+		else {
 			fprintf(stderr, "Port %d is now not stopped\n", pi);
 			continue;
 		}
@@ -2766,8 +2766,9 @@ start_port(portid_t pid)
 						     nb_txq + nb_hairpinq,
 						     &(port->dev_conf));
 			if (diag != 0) {
-				if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2828,9 +2829,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup tx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2880,9 +2881,9 @@ start_port(portid_t pid)
 					continue;
 
 				/* Fail to setup rx queue, return */
-				if (rte_atomic16_cmpset(&(port->port_status),
-							RTE_PORT_HANDLING,
-							RTE_PORT_STOPPED) == 0)
+				if (port->port_status == RTE_PORT_HANDLING)
+					port->port_status = RTE_PORT_STOPPED;
+				else
 					fprintf(stderr,
 						"Port %d can not be set back to stopped\n",
 						pi);
@@ -2917,16 +2918,18 @@ start_port(portid_t pid)
 				pi, rte_strerror(-diag));
 
 			/* Fail to setup rx queue, return */
-			if (rte_atomic16_cmpset(&(port->port_status),
-				RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+			if (port->port_status == RTE_PORT_HANDLING)
+				port->port_status = RTE_PORT_STOPPED;
+			else
 				fprintf(stderr,
 					"Port %d can not be set back to stopped\n",
 					pi);
 			continue;
 		}
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STARTED;
+		else
 			fprintf(stderr, "Port %d can not be set into started\n",
 				pi);
 
@@ -3028,8 +3031,9 @@ stop_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED,
-						RTE_PORT_HANDLING) == 0)
+		if (port->port_status == RTE_PORT_STARTED)
+			port->port_status = RTE_PORT_HANDLING;
+		else
 			continue;
 
 		if (hairpin_mode & 0xf) {
@@ -3055,8 +3059,9 @@ stop_port(portid_t pid)
 			RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
 				pi);
 
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
+		if (port->port_status == RTE_PORT_HANDLING)
+			port->port_status = RTE_PORT_STOPPED;
+		else
 			fprintf(stderr, "Port %d can not be set into stopped\n",
 				pi);
 		need_check_link_status = 1;
@@ -3119,8 +3124,7 @@ close_port(portid_t pid)
 		}
 
 		port = &ports[pi];
-		if (rte_atomic16_cmpset(&(port->port_status),
-			RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) {
+		if (port->port_status == RTE_PORT_CLOSED) {
 			fprintf(stderr, "Port %d is already closed\n", pi);
 			continue;
 		}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (9 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
@ 2021-11-17  8:21   ` Joyce Kong
  2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
  2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:21 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang

Convert rte_atomic usages to compiler atomic built-ins
for shared data sync in bbdev cases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++-----------------
 1 file changed, 59 insertions(+), 76 deletions(-)

diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c
index 7b4529789b..0fa119a502 100644
--- a/app/test-bbdev/test_bbdev_perf.c
+++ b/app/test-bbdev/test_bbdev_perf.c
@@ -133,7 +133,7 @@ struct test_op_params {
 	uint16_t num_to_process;
 	uint16_t num_lcores;
 	int vector_mask;
-	rte_atomic16_t sync;
+	uint16_t sync;
 	struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES];
 };
 
@@ -148,9 +148,9 @@ struct thread_params {
 	uint8_t iter_count;
 	double iter_average;
 	double bler;
-	rte_atomic16_t nb_dequeued;
-	rte_atomic16_t processing_status;
-	rte_atomic16_t burst_sz;
+	uint16_t nb_dequeued;
+	int16_t processing_status;
+	uint16_t burst_sz;
 	struct test_op_params *op_params;
 	struct rte_bbdev_dec_op *dec_ops[MAX_BURST];
 	struct rte_bbdev_enc_op *enc_ops[MAX_BURST];
@@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id,
 	}
 
 	if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) {
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		printf(
 			"Dequeue interrupt handler called for incorrect event!\n");
 		return;
 	}
 
-	burst_sz = rte_atomic16_read(&tp->burst_sz);
+	burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED);
 	num_ops = tp->op_params->num_to_process;
 
 	if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC)
 		deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC)
 		deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id,
 				&tp->dec_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC)
 		deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 	else /*RTE_BBDEV_OP_TURBO_ENC*/
 		deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id,
 				&tp->enc_ops[
-					rte_atomic16_read(&tp->nb_dequeued)],
+					__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)],
 				burst_sz);
 
 	if (deq < burst_sz) {
 		printf(
 			"After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n",
 			burst_sz, deq);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
-	if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) {
-		rte_atomic16_add(&tp->nb_dequeued, deq);
+	if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) {
+		__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id,
 
 	if (ret) {
 		printf("Buffers validation failed\n");
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 	}
 
 	switch (test_vector.op_type) {
@@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id,
 		break;
 	default:
 		printf("Unknown op type: %d\n", test_vector.op_type);
-		rte_atomic16_set(&tp->processing_status, TEST_FAILED);
+		__atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED);
 		return;
 	}
 
@@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id,
 	tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) /
 			((double)total_time / (double)rte_get_tsc_hz());
 
-	rte_atomic16_add(&tp->nb_dequeued, deq);
+	__atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED);
 }
 
 static int
@@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops,
 				num_to_process);
@@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	rte_atomic16_clear(&tp->processing_status);
-	rte_atomic16_clear(&tp->nb_dequeued);
+	__atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops,
 			num_to_process);
@@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg)
 			 * the number of operations is not a multiple of
 			 * burst size.
 			 */
-			rte_atomic16_set(&tp->burst_sz, num_to_enq);
+			__atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED);
 
 			/* Wait until processing of previous batch is
 			 * completed
 			 */
-			while (rte_atomic16_read(&tp->nb_dequeued) !=
-					(int16_t) enqueued)
-				rte_pause();
+			rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED);
 		}
 		if (j != TEST_REPETITIONS - 1)
-			rte_atomic16_clear(&tp->nb_dequeued);
+			__atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED);
 	}
 
 	return TEST_SUCCESS;
@@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops);
 	TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops);
@@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg)
 
 	bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id];
 
-	while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT)
-		rte_pause();
+	rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 
 	ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq,
 			num_ops);
@@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad,
 	else
 		return TEST_SKIPPED;
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = bler_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad,
 			throughput_function = throughput_pmd_lcore_enc;
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_WAIT);
+	__atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED);
 
 	/* Main core is set at first entry */
 	t_params[0].dev_id = ad->dev_id;
@@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad,
 				&t_params[used_cores++], lcore_id);
 	}
 
-	rte_atomic16_set(&op_params->sync, SYNC_START);
+	__atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED);
 	ret = throughput_function(&t_params[0]);
 
 	/* Main core is always used */
@@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad,
 	 * Wait for main lcore operations.
 	 */
 	tp = &t_params[0];
-	while ((rte_atomic16_read(&tp->nb_dequeued) <
-			op_params->num_to_process) &&
-			(rte_atomic16_read(&tp->processing_status) !=
-			TEST_FAILED))
+	while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+		op_params->num_to_process) &&
+		(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+		TEST_FAILED))
 		rte_pause();
 
 	tp->ops_per_sec /= TEST_REPETITIONS;
 	tp->mbps /= TEST_REPETITIONS;
-	ret |= (int)rte_atomic16_read(&tp->processing_status);
+	ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 
 	/* Wait for worker lcores operations */
 	for (used_cores = 1; used_cores < num_lcores; used_cores++) {
 		tp = &t_params[used_cores];
 
-		while ((rte_atomic16_read(&tp->nb_dequeued) <
-				op_params->num_to_process) &&
-				(rte_atomic16_read(&tp->processing_status) !=
-				TEST_FAILED))
+		while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) <
+			op_params->num_to_process) &&
+			(__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) !=
+			TEST_FAILED))
 			rte_pause();
 
 		tp->ops_per_sec /= TEST_REPETITIONS;
 		tp->mbps /= TEST_REPETITIONS;
-		ret |= (int)rte_atomic16_read(&tp->processing_status);
+		ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED);
 	}
 
 	/* Print throughput if test passed */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 12/12] app: remove unnecessary include of atomic header file
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (10 preceding siblings ...)
  2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
@ 2021-11-17  8:22   ` Joyce Kong
  2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: Joyce Kong @ 2021-11-17  8:22 UTC (permalink / raw)
  To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li,
	Erik Gabriel Carrillo, Olivier Matz, Anatoly Burakov,
	Honnappa Nagarahalli, Konstantin Ananyev
  Cc: dev, nd, Joyce Kong, Ruifeng Wang

Remove the unnecessary rte_atomic.h included in app modules.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/proc-info/main.c                | 1 -
 app/test-pipeline/config.c          | 1 -
 app/test-pipeline/init.c            | 1 -
 app/test-pipeline/main.c            | 1 -
 app/test-pipeline/runtime.c         | 1 -
 app/test-pmd/cmdline.c              | 1 -
 app/test-pmd/config.c               | 1 -
 app/test-pmd/csumonly.c             | 1 -
 app/test-pmd/flowgen.c              | 1 -
 app/test-pmd/icmpecho.c             | 1 -
 app/test-pmd/iofwd.c                | 1 -
 app/test-pmd/macfwd.c               | 1 -
 app/test-pmd/macswap.c              | 1 -
 app/test-pmd/parameters.c           | 1 -
 app/test-pmd/rxonly.c               | 1 -
 app/test-pmd/txonly.c               | 1 -
 app/test/commands.c                 | 1 -
 app/test/test_barrier.c             | 1 -
 app/test/test_event_timer_adapter.c | 1 -
 app/test/test_mbuf.c                | 1 -
 app/test/test_mp_secondary.c        | 1 -
 app/test/test_ring.c                | 1 -
 22 files changed, 22 deletions(-)

diff --git a/app/proc-info/main.c b/app/proc-info/main.c
index a4271047e6..ebe2d77264 100644
--- a/app/proc-info/main.c
+++ b/app/proc-info/main.c
@@ -27,7 +27,6 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_string_fns.h>
 #include <rte_metrics.h>
diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c
index 33f3f1c827..daf838948b 100644
--- a/app/test-pipeline/config.c
+++ b/app/test-pipeline/config.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index c738019041..eee0719b67 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c
index 72e4797ff2..1e16794183 100644
--- a/app/test-pipeline/main.c
+++ b/app/test-pipeline/main.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_lcore.h>
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 159192bcd8..d939a85d7e 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f51b259fe..4e93f535ff 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 26cadf39f7..d8b5032b58 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -27,7 +27,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 8526d9158a..e0b00abe8c 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index 5737eaa105..9ceef3b54a 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 8f1d68a83a..3a85ec3dd1 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -20,7 +20,6 @@
 #include <rte_cycles.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memory.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c
index 83d098adcb..19cd920f70 100644
--- a/app/test-pmd/iofwd.c
+++ b/app/test-pmd/iofwd.c
@@ -23,7 +23,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_memcpy.h>
 #include <rte_mempool.h>
diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c
index ac50d0b9f8..812a0c721f 100644
--- a/app/test-pmd/macfwd.c
+++ b/app/test-pmd/macfwd.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 310bca06af..4627ff83e9 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 0974b0a38f..2f4f944efa 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -30,7 +30,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_interrupts.h>
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index c78fc4609a..d1a579d8d8 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 34bb538379..b8497e733d 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -24,7 +24,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/app/test/commands.c b/app/test/commands.c
index 76f6ee5d23..2dced3bc44 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -25,7 +25,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_malloc.h>
diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c
index c27f8a0742..898c2516ed 100644
--- a/app/test/test_barrier.c
+++ b/app/test/test_barrier.c
@@ -24,7 +24,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
diff --git a/app/test/test_event_timer_adapter.c b/app/test/test_event_timer_adapter.c
index 12c00e678e..25bac2d155 100644
--- a/app/test/test_event_timer_adapter.c
+++ b/app/test/test_event_timer_adapter.c
@@ -5,7 +5,6 @@
 
 #include <math.h>
 
-#include <rte_atomic.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_debug.h>
diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index f93bcef8a9..d53126710f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -21,7 +21,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
 #include <rte_mempool.h>
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 5b6f05dbb1..021ca0547f 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -28,7 +28,6 @@
 #include <rte_lcore.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
-#include <rte_atomic.h>
 #include <rte_ring.h>
 #include <rte_debug.h>
 #include <rte_log.h>
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fb8532a409..bde33ab4a1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync
  2021-11-16 21:21     ` Honnappa Nagarahalli
@ 2021-11-17  9:29       ` David Marchand
  0 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2021-11-17  9:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang

On Tue, Nov 16, 2021 at 10:21 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > Joyce, Honnappa,
> >
> > On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote:
> > >
> > > Convert rte_atomic usages to compiler atomic built-ins for lcore_state
> > > and collisions sync.
> > >
> > > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to
> > > guarantee lcore_state initialized correctly before the threads
> > > launched.
> >
> > Is this "also" part actually related to the change?
> > Or is it a separate fix?
> 'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple.

This is indeed better this way.
Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 00/12] use compiler atomic builtins for app modules
  2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
                     ` (11 preceding siblings ...)
  2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
@ 2021-11-17 10:02   ` David Marchand
  12 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2021-11-17 10:02 UTC (permalink / raw)
  To: Joyce Kong; +Cc: dev, Honnappa Nagarahalli, nd

On Wed, Nov 17, 2021 at 9:22 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Since atomic operations have been adopted in DPDK now[1],
> change rte_atomicNN_xxx APIs to compiler atomic built-ins
> in app modules[2].
>
> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
> [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html
>
> v3:
>   1. In pmd_perf test case, move the initialization of polling
>      start before calling rte_eal_remote_launch, so the update
>      is visible to the worker threads.(Honnappa Nagarahalli)
>   2. Remove the rest rte_atomic.h which miss in v2.(David Marchand)
>
> v2:
>   By Honnappa Nagarahalli:
>   1. Replace the RELAXED barriers with suitable ones for shared
>      data sync in pmd_perf and timer test cases.
>   2. Avoid unnecessary atomic operations in compress and testpmd
>      modules.
>   3. Fix some typo.
>
> Joyce Kong (12):
>   test/pmd_perf: use compiler atomic builtins for polling sync
>   test/ring_perf: use compiler atomic builtins for lcores sync
>   test/timer: use compiler atomic builtins for sync
>   test/stack_perf: use compiler atomics for lcore sync
>   test/bpf: use compiler atomics for calculation
>   test/func_reentrancy: use compiler atomics for data sync
>   app/eventdev: use compiler atomics for shared data sync
>   app/crypto: use compiler atomic builtins for display sync
>   app/compress: use compiler atomic builtins for display sync
>   app/testpmd: remove atomic operations for port status
>   app/bbdev: use compiler atomics for shared data sync
>   app: remove unnecessary include of atomic header file

There were cleanups of unneeded rte_atomic.h inclusion along the series:
I moved all of them to the last patch so that patches focus on what
their commitlog describes.

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-11-17 10:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong
2021-11-16  9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
2021-11-16 21:30   ` Honnappa Nagarahalli
2021-11-16  9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
2021-11-16  9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
2021-11-16 19:52   ` Honnappa Nagarahalli
2021-11-16 20:20   ` David Marchand
2021-11-16 21:21     ` Honnappa Nagarahalli
2021-11-17  9:29       ` David Marchand
2021-11-16  9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
2021-11-16  9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
2021-11-16  9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
2021-11-16  9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong
2021-11-16 20:15   ` Honnappa Nagarahalli
2021-11-16  9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
2021-11-16 21:34   ` Honnappa Nagarahalli
2021-11-16  9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
2021-11-16  9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong
2021-11-16 20:23   ` David Marchand
2021-11-17  7:05     ` Joyce Kong
2021-11-17  8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong
2021-11-17  8:21   ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong
2021-11-17  8:21   ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong
2021-11-17  8:21   ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong
2021-11-17  8:21   ` [PATCH v3 09/12] app/compress: " Joyce Kong
2021-11-17  8:21   ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong
2021-11-17  8:21   ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong
2021-11-17  8:22   ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong
2021-11-17 10:02   ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).